Full-text search for arXiv

2 article(s) in total. 10 co-authors, from 1 to 2 common article(s). Median position in authors list is 6,5.

[1] oai:arXiv.org:1204.4180 [pdf] - 1118094

Construction of a Calibrated Probabilistic Classification Catalog: Application to 50k Variable Sources in the All-Sky Automated Survey

Richards, Joseph W.; Starr, Dan L.; Miller, Adam A.; Bloom, Joshua S.; Butler, Nathaniel R.; Brink, Henrik; Crellin-Quick, Arien

Comments: 56 pages, 15 figures, 8 tables, submitted. The Machine-learned ASAS Classification Catalog is available at http://www.bigmacc.info

Submitted: 2012-04-18, last modified: 2012-04-24

With growing data volumes from synoptic surveys, astronomers must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities, and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All Sky Automated Survey (ASAS), and unveil the Machine-learned ASAS Classification Catalog (MACC), which is a 28-class probabilistic classification catalog of 50,124 ASAS sources. We estimate that MACC achieves a sub-20% classification error rate, and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes. The MACC is publicly available at http://www.bigmacc.info.

[2] oai:arXiv.org:1101.1959 [pdf] - 955957

On Machine-Learned Classification of Variable Stars with Sparse and Noisy Time-Series Data

Richards, Joseph W.; Starr, Dan L.; Butler, Nathaniel R.; Bloom, Joshua S.; Brewer, John M.; Crellin-Quick, Arien; Higgins, Justin; Kennedy, Rachel; Rischard, Maxime

Comments: 23 pages, 9 figures

Submitted: 2011-01-10

With the coming data deluge from synoptic surveys, there is a growing need for frameworks that can quickly and automatically produce calibrated classification probabilities for newly-observed variables based on a small number of time-series measurements. In this paper, we introduce a methodology for variable-star classification, drawing from modern machine-learning techniques. We describe how to homogenize the information gleaned from light curves by selection and computation of real-numbered metrics ("feature"), detail methods to robustly estimate periodic light-curve features, introduce tree-ensemble methods for accurate variable star classification, and show how to rigorously evaluate the classification results using cross validation. On a 25-class data set of 1542 well-studied variable stars, we achieve a 22.8% overall classification error using the random forest classifier; this represents a 24% improvement over the best previous classifier on these data. This methodology is effective for identifying samples of specific science classes: for pulsational variables used in Milky Way tomography we obtain a discovery efficiency of 98.2% and for eclipsing systems we find an efficiency of 99.1%, both at 95% purity. We show that the random forest (RF) classifier is superior to other machine-learned methods in terms of accuracy, speed, and relative immunity to features with no useful class information; the RF classifier can also be used to estimate the importance of each feature in classification. Additionally, we present the first astronomical use of hierarchical classification methods to incorporate a known class taxonomy in the classifier, which further reduces the catastrophic error rate to 7.8%. Excluding low-amplitude sources, our overall error rate improves to 14%, with a catastrophic error rate of 3.5%.