Normalized to: Crellin-Quick, A.
[1]
oai:arXiv.org:1204.4180 [pdf] - 1118094
Construction of a Calibrated Probabilistic Classification Catalog:
Application to 50k Variable Sources in the All-Sky Automated Survey
Submitted: 2012-04-18, last modified: 2012-04-24
With growing data volumes from synoptic surveys, astronomers must become more
abstracted from the discovery and introspection processes. Given the scarcity
of follow-up resources, there is a particularly sharp onus on the frameworks
that replace these human roles to provide accurate and well-calibrated
probabilistic classification catalogs. Such catalogs inform the subsequent
follow-up, allowing consumers to optimize the selection of specific sources for
further study and permitting rigorous treatment of purities and efficiencies
for population studies. Here, we describe a process to produce a probabilistic
classification catalog of variability with machine learning from a multi-epoch
photometric survey. In addition to producing accurate classifications, we show
how to estimate calibrated class probabilities, and motivate the importance of
probability calibration. We also introduce a methodology for feature-based
anomaly detection, which allows discovery of objects in the survey that do not
fit within the predefined class taxonomy. Finally, we apply these methods to
sources observed by the All Sky Automated Survey (ASAS), and unveil the
Machine-learned ASAS Classification Catalog (MACC), which is a 28-class
probabilistic classification catalog of 50,124 ASAS sources. We estimate that
MACC achieves a sub-20% classification error rate, and demonstrate that the
class posterior probabilities are reasonably calibrated. MACC classifications
compare favorably to the classifications of several previous domain-specific
ASAS papers and to the ASAS Catalog of Variable Stars, which had classified
only 24% of those sources into one of 12 science classes. The MACC is publicly
available at http://www.bigmacc.info.
[2]
oai:arXiv.org:1101.1959 [pdf] - 955957
On Machine-Learned Classification of Variable Stars with Sparse and
Noisy Time-Series Data
Submitted: 2011-01-10
With the coming data deluge from synoptic surveys, there is a growing need
for frameworks that can quickly and automatically produce calibrated
classification probabilities for newly-observed variables based on a small
number of time-series measurements. In this paper, we introduce a methodology
for variable-star classification, drawing from modern machine-learning
techniques. We describe how to homogenize the information gleaned from light
curves by selection and computation of real-numbered metrics ("feature"),
detail methods to robustly estimate periodic light-curve features, introduce
tree-ensemble methods for accurate variable star classification, and show how
to rigorously evaluate the classification results using cross validation. On a
25-class data set of 1542 well-studied variable stars, we achieve a 22.8%
overall classification error using the random forest classifier; this
represents a 24% improvement over the best previous classifier on these data.
This methodology is effective for identifying samples of specific science
classes: for pulsational variables used in Milky Way tomography we obtain a
discovery efficiency of 98.2% and for eclipsing systems we find an efficiency
of 99.1%, both at 95% purity. We show that the random forest (RF) classifier is
superior to other machine-learned methods in terms of accuracy, speed, and
relative immunity to features with no useful class information; the RF
classifier can also be used to estimate the importance of each feature in
classification. Additionally, we present the first astronomical use of
hierarchical classification methods to incorporate a known class taxonomy in
the classifier, which further reduces the catastrophic error rate to 7.8%.
Excluding low-amplitude sources, our overall error rate improves to 14%, with a
catastrophic error rate of 3.5%.