Normalized to: Lisboa, P.
[1]
oai:arXiv.org:1810.00887 [pdf] - 1767713
Reproducible $k$-means clustering in galaxy feature data from the GAMA
survey
Submitted: 2018-10-01
A fundamental bimodality of galaxies in the local Universe is apparent in
many of the features used to describe them. Multiple sub-populations exist
within this framework, each representing galaxies following distinct
evolutionary pathways. Accurately identifying and characterising these
sub-populations requires that a large number of galaxy features be analysed
simultaneously. Future galaxy surveys such as LSST and Euclid will yield data
volumes for which traditional approaches to galaxy classification will become
unfeasible. To address this, we apply a robust $k$-means unsupervised
clustering method to feature data derived from a sample of 7338 local-Universe
galaxies selected from the Galaxy And Mass Assembly (GAMA) survey. This allows
us to partition our sample into $k$ clusters without the need for training on
pre-labelled data, facilitating a full census of our high dimensionality
feature space and guarding against stochastic effects. We find that the local
galaxy population natively splits into $2$, $3$, $5$ and a maximum of $6$
sub-populations, with each corresponding to a distinct ongoing evolutionary
mechanism. Notably, the impact of the local environment appears strongly linked
with the evolution of low-mass ($M_{*} < 10^{10}$ M$_{\odot}$) galaxies, with
more massive systems appearing to evolve more passively from the blue cloud
onto the red sequence. With a typical run time of $\sim3$ minutes per value of
$k$ for our galaxy sample, we show how $k$-means unsupervised clustering is an
ideal tool for future analysis of large extragalactic datasets, being scalable,
adaptable, and providing crucial insight into the fundamental properties of the
local galaxy population.