Normalized to: Negahban, S.
[1]
oai:arXiv.org:1209.3775 [pdf] - 1151446
Using Machine Learning for Discovery in Synoptic Survey Imaging
Submitted: 2012-09-17
Modern time-domain surveys continuously monitor large swaths of the sky to
look for astronomical variability. Astrophysical discovery in such data sets is
complicated by the fact that detections of real transient and variable sources
are highly outnumbered by bogus detections caused by imperfect subtractions,
atmospheric effects and detector artefacts. In this work we present a machine
learning (ML) framework for discovery of variability in time-domain imaging
surveys. Our ML methods provide probabilistic statements, in near real time,
about the degree to which each newly observed source is astrophysically
relevant source of variable brightness. We provide details about each of the
analysis steps involved, including compilation of the training and testing
sets, construction of descriptive image-based and contextual features, and
optimization of the feature subset and model tuning parameters. Using a
validation set of nearly 30,000 objects from the Palomar Transient Factory, we
demonstrate a missed detection rate of at most 7.7% at our chosen
false-positive rate of 1% for an optimized ML classifier of 23 features,
selected to avoid feature correlation and over-fitting from an initial library
of 42 attributes. Importantly, we show that our classification methodology is
insensitive to mis-labelled training data up to a contamination of nearly 10%,
making it easier to compile sufficient training sets for accurate performance
in future surveys. This ML framework, if so adopted, should enable the
maximization of scientific gain from future synoptic survey and enable fast
follow-up decisions on the vast amounts of streaming data produced by such
experiments.