Normalized to: Kuminski, E.
[1]
oai:arXiv.org:1810.11283 [pdf] - 1774150
A hybrid approach to machine learning annotation of large galaxy image
databases
Submitted: 2018-10-26
Modern astronomy relies on massive databases collected by robotic telescopes
and digital sky surveys, acquiring data in a much faster pace than what manual
analysis can support. Among other data, these sky surveys collect information
about millions and sometimes billions of extra-galactic objects. Since the very
large number of objects makes manual observation impractical, automatic methods
that can analyze and annotate extra-galactic objects are required to fully
utilize the discovery power of these databases. Machine learning methods for
annotation of celestial objects can be separated broadly into methods that use
the photometric information collected by digital sky surveys, and methods that
analyze the image of the object. Here we describe a hybrid method that combines
photometry and image data to annotate galaxies by their morphology, and a
method that uses that information to identify objects that are visually similar
to a query object (query-by-example). The results are compared to using just
photometric information from SDSS, and to using just the morphological
descriptors extracted directly from the images. The comparison shows that for
automatic classification the image data provide marginal addition to the
information provided by the photometry data. For query-by-example, however, the
analysis of the image data provides more information that improves the
automatic detection substantially. The source code and binaries of the method
can be downloaded through the Astrophysics Source Code Library.
[2]
oai:arXiv.org:1602.06854 [pdf] - 1392864
Computer-generated visual morphology catalog of ~3,000,000 SDSS galaxies
Submitted: 2016-02-22, last modified: 2016-03-27
We applied computer analysis to classify the broad morphological type of
~3,000,000 SDSS galaxies. The catalog provides for each galaxy the DR8 object
ID, right ascension, declination, and the certainty of the automatic
classification to spiral or elliptical. The certainty of the classification
allows controlling the accuracy of a subset of galaxies by sacrificing some of
the least certain classifications. The accuracy of the catalog was tested using
galaxies that were classified by the manually annotated Galaxy Zoo catalog. The
results show that the catalog contains ~900,000 spiral galaxies and ~600,000
elliptical galaxies with classification certainty that has a statistical
agreement rate of ~98% with Galaxy Zoo debiased 'superclean' dataset. That also
demonstrates the ability of computers to turn large datasets of galaxy images
into structured catalogs of galaxy morphology. The catalog can be downloaded at
http://vfacstaff.ltu.edu/lshamir/data/morph_catalog , and can be accessed
through public tables on CAS: public.broadMorph.LargeGM,
public.broadMorph.LargeWnnGM, and public.broadMorph.SpectraGM. The image
analysis software that was used to create the catalog is also publicly
available.
[3]
oai:arXiv.org:1409.7935 [pdf] - 1222285
Combining human and machine learning for morphological analysis of
galaxy images
Submitted: 2014-09-28
The increasing importance of digital sky surveys collecting many millions of
galaxy images has reinforced the need for robust methods that can perform
morphological analysis of large galaxy image databases. Citizen science
initiatives such as Galaxy Zoo showed that large datasets of galaxy images can
be analyzed effectively by non-scientist volunteers, but since databases
generated by robotic telescopes grow much faster than the processing power of
any group of citizen scientists, it is clear that computer analysis is
required. Here we propose to use citizen science data for training machine
learning systems, and show experimental results demonstrating that machine
learning systems can be trained with citizen science data. Our findings show
that the performance of machine learning depends on the quality of the data,
which can be improved by using samples that have a high degree of agreement
between the citizen scientists. The source code of the method is publicly
available.