Normalized to: Maitra, R.
[1]
oai:arXiv.org:2003.05777 [pdf] - 2064571
Characterising hot stellar systems with confidence
Submitted: 2020-03-12, last modified: 2020-03-16
Hot stellar systems (HSS) are a collection of stars bound together by
gravitational attraction. These systems hold clues to many mysteries of outer
space so understanding their origin, evolution and physical properties is
important but remains a huge challenge. We used multivariate $t$-mixtures
model-based clustering to analyze 13456 hot stellar systems from Misgeld &
Hilker (2011) that included 12763 candidate globular clusters and found eight
homogeneous groups using the Bayesian Information Criterion (BIC). A
nonparametric bootstrap procedure was used to estimate the confidence of each
of our clustering assignments. The eight obtained groups can be characterized
in terms of the correlation, mass, effective radius and surface density. Using
conventional correlation-mass-effective radius-surface density notation, the
largest group, Group 1, can be described as having positive-low-low-moderate
characteristics. The other groups, numbered in decreasing sizes are similarly
characterised, with Group 2 having positive-low-low-high characteristics, Group
3 displaying positive-low-low-moderate characteristics, Group 4 having
positive-low-low-high characteristic, Group 5 displaying
positive-low-moderate-moderate characteristic and Group 6 showing
positive-moderate-low-high characteristic. The smallest group (Group 8) shows
negative-low-moderate-moderate characteristic. Group 7 has no candidate
clusters and so cannot be similarly labeled but the mass, effective radius
correlation for these non-candidates indicates that they zare larger than
typical globular clusters. Assertions drawn for each group are ambiguous for a
few HSS having low confidence in classification. Our analysis identifies
distinct kinds of HSS with varying confidence and provides novel insight into
their physical and evolutionary properties.
[2]
oai:arXiv.org:1904.09609 [pdf] - 1885634
TiK-means: $K$-means clustering for skewed groups
Submitted: 2019-04-21
The $K$-means algorithm is extended to allow for partitioning of skewed
groups. Our algorithm is called TiK-Means and contributes a $K$-means type
algorithm that assigns observations to groups while estimating their
skewness-transformation parameters. The resulting groups and transformation
reveal general-structured clusters that can be explained by inverting the
estimated transformation. Further, a modification of the jump statistic chooses
the number of groups. Our algorithm is evaluated on simulated and real-life
datasets and then applied to a long-standing astronomical dispute regarding the
distinct kinds of gamma ray bursts.
[3]
oai:arXiv.org:1802.08363 [pdf] - 1747013
An efficient $k$-means-type algorithm for clustering datasets with
incomplete records
Submitted: 2018-02-22, last modified: 2018-09-08
The $k$-means algorithm is arguably the most popular nonparametric clustering
method but cannot generally be applied to datasets with incomplete records. The
usual practice then is to either impute missing values under an assumed
missing-completely-at-random mechanism or to ignore the incomplete records, and
apply the algorithm on the resulting dataset. We develop an efficient version
of the $k$-means algorithm that allows for clustering in the presence of
incomplete records. Our extension is called $k_m$-means and reduces to the
$k$-means algorithm when all records are complete. We also provide
initialization strategies for our algorithm and methods to estimate the number
of groups in the dataset. Illustrations and simulations demonstrate the
efficacy of our approach in a variety of settings and patterns of missing data.
Our methods are also applied to the analysis of activation images obtained from
a functional Magnetic Resonance Imaging experiment.
[4]
oai:arXiv.org:1712.08123 [pdf] - 1757643
Multivariate $t$-Mixtures-Model-based Cluster Analysis of BATSE Catalog
Establishes Importance of All Observed Parameters, Confirms Five Distinct
Ellipsoidal Sub-populations of Gamma Ray Bursts
Submitted: 2017-12-21, last modified: 2018-09-08
Determining the kinds of gamma-ray bursts (GRBs) has been of interest to
astronomers for many years. We analyzed 1599 GRBs from the Burst and Transient
Source Experiment (BATSE) 4Br catalogue using $t$-mixtures-model-based
clustering on all nine observed parameters ($T_{50}$, $T_{90}$, $F_1$, $F_2$,
$F_3$, $F_4$, $P_{64}$, $P_{256}$, $P_{1024}$) and found evidence of five types
of GRBs. Our results further refine the findings of Chattopadhyay and Maitra
(2017) by providing groups that are more distinct. Using the Mukherjee et al.
(1998) classification scheme, also used by Chattopadhyay and Maitra (2017), of
duration, total fluence ($F_t = F_1 + F_2 + F_3 + F_4$)) and spectrum (using
Hardness Ratio $H_{321} = F_3/(F_1 + F_2)$) our five groups are classified as
long-intermediate-intermediate, short-faint-intermediate, short-faint-soft,
long-bright-hard, and long-intermediate-hard. We also classify 374 GRBs in the
BATSE catalogue that have incomplete information in some of the observed
variables (mainly the four time integrated fluences $F_1$, $F_2$, $F_3$ and
$F_4$) to the five groups obtained, using the 1599 GRBs having complete
information in all the observed variables. Our classification scheme puts 138
GRBs in the first group, 52 GRBs in the second group, 33 GRBs in the third
group, 127 GRBs in the fourth group and 24 GRBs in the fifth group.
[5]
oai:arXiv.org:1703.07338 [pdf] - 1582053
Gaussian-Mixture-Model-based Cluster Analysis Finds Five Kinds of Gamma
Ray Bursts in the BATSE Catalog
Submitted: 2017-03-21, last modified: 2017-05-02
Clustering methods are an important tool to enumerate and describe the
different coherent kinds of Gamma Ray Bursts (GRBs). But their performance can
be affected by a number of factors such as the choice of clustering algorithm
and inherent associated assumptions, the inclusion of variables in clustering,
nature of initialization methods used or the iterative algorithm or the
criterion used to judge the optimal number of groups supported by the data. We
analyzed GRBs from the BATSE 4Br catalog using $k$-means and Gaussian Mixture
Models-based clustering methods and found that after accounting for all the
above factors, all six variables -- different subsets of which have been used
in the literature -- and that are, namely, the flux duration variables
($T_{50}$, $T_{90}$), the peak flux ($P_{256}$) measured in 256-millisecond
bins, the total fluence ($F_t$) and the spectral hardness ratios ($H_{32}$ and
$H_{321}$) contain information on clustering. Further, our analysis found
evidence of five different kinds of GRBs and that these groups have different
kinds of dispersions in terms of shape, size and orientation. In terms of
duration, fluence and spectrum, the five types of GRBs were characterized as
intermediate/faint/intermediate, long/intermediate/soft,
intermediate/intermediate/intermediate, short/faint/hard and
long/bright/intermediate.