Full-text search for arXiv

Kuminski, Evan

Normalized to: Kuminski, E.

3 article(s) in total. 3 co-authors, from 1 to 3 common article(s). Median position in authors list is 1,0.

[1] oai:arXiv.org:1810.11283 [pdf] - 1774150

A hybrid approach to machine learning annotation of large galaxy image databases

Comments: A&C, accepted

Submitted: 2018-10-26

Modern astronomy relies on massive databases collected by robotic telescopes and digital sky surveys, acquiring data in a much faster pace than what manual analysis can support. Among other data, these sky surveys collect information about millions and sometimes billions of extra-galactic objects. Since the very large number of objects makes manual observation impractical, automatic methods that can analyze and annotate extra-galactic objects are required to fully utilize the discovery power of these databases. Machine learning methods for annotation of celestial objects can be separated broadly into methods that use the photometric information collected by digital sky surveys, and methods that analyze the image of the object. Here we describe a hybrid method that combines photometry and image data to annotate galaxies by their morphology, and a method that uses that information to identify objects that are visually similar to a query object (query-by-example). The results are compared to using just photometric information from SDSS, and to using just the morphological descriptors extracted directly from the images. The comparison shows that for automatic classification the image data provide marginal addition to the information provided by the photometry data. For query-by-example, however, the analysis of the image data provides more information that improves the automatic detection substantially. The source code and binaries of the method can be downloaded through the Astrophysics Source Code Library.

[2] oai:arXiv.org:1602.06854 [pdf] - 1392864

Computer-generated visual morphology catalog of ~3,000,000 SDSS galaxies

Kuminski, Evan; Shamir, Lior

Comments: ApJS, accepted. CAS public tables added

Submitted: 2016-02-22, last modified: 2016-03-27

We applied computer analysis to classify the broad morphological type of ~3,000,000 SDSS galaxies. The catalog provides for each galaxy the DR8 object ID, right ascension, declination, and the certainty of the automatic classification to spiral or elliptical. The certainty of the classification allows controlling the accuracy of a subset of galaxies by sacrificing some of the least certain classifications. The accuracy of the catalog was tested using galaxies that were classified by the manually annotated Galaxy Zoo catalog. The results show that the catalog contains ~900,000 spiral galaxies and ~600,000 elliptical galaxies with classification certainty that has a statistical agreement rate of ~98% with Galaxy Zoo debiased 'superclean' dataset. That also demonstrates the ability of computers to turn large datasets of galaxy images into structured catalogs of galaxy morphology. The catalog can be downloaded at http://vfacstaff.ltu.edu/lshamir/data/morph_catalog , and can be accessed through public tables on CAS: public.broadMorph.LargeGM, public.broadMorph.LargeWnnGM, and public.broadMorph.SpectraGM. The image analysis software that was used to create the catalog is also publicly available.

[3] oai:arXiv.org:1409.7935 [pdf] - 1222285

Combining human and machine learning for morphological analysis of galaxy images

Kuminski, Evan; George, Joe; Wallin, John; Shamir, Lior

Comments: PASP, accepted

Submitted: 2014-09-28

The increasing importance of digital sky surveys collecting many millions of galaxy images has reinforced the need for robust methods that can perform morphological analysis of large galaxy image databases. Citizen science initiatives such as Galaxy Zoo showed that large datasets of galaxy images can be analyzed effectively by non-scientist volunteers, but since databases generated by robotic telescopes grow much faster than the processing power of any group of citizen scientists, it is clear that computer analysis is required. Here we propose to use citizen science data for training machine learning systems, and show experimental results demonstrating that machine learning systems can be trained with citizen science data. Our findings show that the performance of machine learning depends on the quality of the data, which can be improved by using samples that have a high degree of agreement between the citizen scientists. The source code of the method is publicly available.