Normalized to: Davey, N.
[1]
oai:arXiv.org:1709.05834 [pdf] - 1588494
An automatic taxonomy of galaxy morphology using unsupervised machine
learning
Submitted: 2017-09-18
We present an unsupervised machine learning technique that automatically
segments and labels galaxies in astronomical imaging surveys using only pixel
data. Distinct from previous unsupervised machine learning approaches used in
astronomy we use no pre-selection or pre-filtering of target galaxy type to
identify galaxies that are similar. We demonstrate the technique on the HST
Frontier Fields. By training the algorithm using galaxies from one field (Abell
2744) and applying the result to another (MACS0416.1-2403), we show how the
algorithm can cleanly separate early and late type galaxies without any form of
pre-directed training for what an 'early' or 'late' type galaxy is. We then
apply the technique to the HST CANDELS fields, creating a catalogue of
approximately 60,000 classifications. We show how the automatic classification
groups galaxies of similar morphological (and photometric) type, and make the
classifications public via a catalogue, a visual catalogue and galaxy
similarity search. We compare the CANDELS machine-based classifications to
human-based classifications from the Galaxy Zoo: CANDELS project. Although
there is not a direct mapping between Galaxy Zoo and our hierarchical
labelling, we demonstrate a good level of concordance between human and machine
classifications. Finally, we show how the technique can be used to identify
rarer objects and present new lensed galaxy candidates from the CANDELS
imaging.
[2]
oai:arXiv.org:1507.01589 [pdf] - 1242307
Teaching a machine to see: unsupervised image segmentation and
categorisation using growing neural gas and hierarchical clustering
Submitted: 2015-07-06
We present a novel unsupervised learning approach to automatically segment
and label images in astronomical surveys. Automation of this procedure will be
essential as next-generation surveys enter the petabyte scale: data volumes
will exceed the capability of even large crowd-sourced analyses. We demonstrate
how a growing neural gas (GNG) can be used to encode the feature space of
imaging data. When coupled with a technique called hierarchical clustering,
imaging data can be automatically segmented and labelled by organising nodes in
the GNG. The key distinction of unsupervised learning is that these labels need
not be known prior to training, rather they are determined by the algorithm
itself. Importantly, after training a network can be be presented with images
it has never 'seen' before and provide consistent categorisation of features.
As a proof-of-concept we demonstrate application on data from the Hubble Space
Telescope Frontier Fields: images of clusters of galaxies containing a mixture
of galaxy types that would easily be recognised and classified by a human
inspector. By training the algorithm using one field (Abell 2744) and applying
the result to another (MACS0416.1-2403), we show how the algorithm can cleanly
separate image features that a human would associate with early and late type
galaxies. We suggest that the algorithm has potential as a tool in the
automatic analysis and data mining of next-generation imaging and spectral
surveys, and could also find application beyond astronomy.
[3]
oai:arXiv.org:0910.4393 [pdf] - 1018056
Photometric redshift estimation using Gaussian processes
Submitted: 2009-10-22, last modified: 2010-02-17
We present a comparison between Gaussian processes (GPs) and artificial
neural networks (ANNs) as methods for determining photometric redshifts for
galaxies, given training set data. In particular, we compare their degradation
in performance as the training set size is degraded in ways which might be
caused by the observational limitations of spectroscopy. We find that
performance with large, complete training sets is very similar, although the
ANN achieves slightly smaller root mean square errors. If the size of the
training set is reduced by random sampling, the RMS errors of both methods
increase, but they do so to a lesser extent and in a much smoother manner for
the case of GP regression. When training objects are removed at redshifts
1.3<z<1.7, to simulate the effects of the "redshift desert" of optical
spectroscopy, the GP regression is successful at interpolating across the
redshift gap, while the ANN suffers from strong bias for test objects in this
redshift range. Overall, GP regression has attractive properties for
photometric redshift estimation, particularly for deep, high-redshift surveys
where it is difficult to obtain a large, complete training set. At present,
unlike the ANN code, public GP regression codes do not take account of
inhomogeneous measurement errors on the photometric data, and thus cannot
estimate reliable uncertainties on the predicted redshifts. However, a better
treatment of errors is in principle possible, and the promising results in this
paper suggest that such improved GP algorithms should be pursued. (abridged)