Normalized to: Brink, H.
[1]
oai:arXiv.org:1209.3775 [pdf] - 1151446
Using Machine Learning for Discovery in Synoptic Survey Imaging
Submitted: 2012-09-17
Modern time-domain surveys continuously monitor large swaths of the sky to
look for astronomical variability. Astrophysical discovery in such data sets is
complicated by the fact that detections of real transient and variable sources
are highly outnumbered by bogus detections caused by imperfect subtractions,
atmospheric effects and detector artefacts. In this work we present a machine
learning (ML) framework for discovery of variability in time-domain imaging
surveys. Our ML methods provide probabilistic statements, in near real time,
about the degree to which each newly observed source is astrophysically
relevant source of variable brightness. We provide details about each of the
analysis steps involved, including compilation of the training and testing
sets, construction of descriptive image-based and contextual features, and
optimization of the feature subset and model tuning parameters. Using a
validation set of nearly 30,000 objects from the Palomar Transient Factory, we
demonstrate a missed detection rate of at most 7.7% at our chosen
false-positive rate of 1% for an optimized ML classifier of 23 features,
selected to avoid feature correlation and over-fitting from an initial library
of 42 attributes. Importantly, we show that our classification methodology is
insensitive to mis-labelled training data up to a contamination of nearly 10%,
making it easier to compile sufficient training sets for accurate performance
in future surveys. This ML framework, if so adopted, should enable the
maximization of scientific gain from future synoptic survey and enable fast
follow-up decisions on the vast amounts of streaming data produced by such
experiments.
[2]
oai:arXiv.org:1204.4180 [pdf] - 1118094
Construction of a Calibrated Probabilistic Classification Catalog:
Application to 50k Variable Sources in the All-Sky Automated Survey
Submitted: 2012-04-18, last modified: 2012-04-24
With growing data volumes from synoptic surveys, astronomers must become more
abstracted from the discovery and introspection processes. Given the scarcity
of follow-up resources, there is a particularly sharp onus on the frameworks
that replace these human roles to provide accurate and well-calibrated
probabilistic classification catalogs. Such catalogs inform the subsequent
follow-up, allowing consumers to optimize the selection of specific sources for
further study and permitting rigorous treatment of purities and efficiencies
for population studies. Here, we describe a process to produce a probabilistic
classification catalog of variability with machine learning from a multi-epoch
photometric survey. In addition to producing accurate classifications, we show
how to estimate calibrated class probabilities, and motivate the importance of
probability calibration. We also introduce a methodology for feature-based
anomaly detection, which allows discovery of objects in the survey that do not
fit within the predefined class taxonomy. Finally, we apply these methods to
sources observed by the All Sky Automated Survey (ASAS), and unveil the
Machine-learned ASAS Classification Catalog (MACC), which is a 28-class
probabilistic classification catalog of 50,124 ASAS sources. We estimate that
MACC achieves a sub-20% classification error rate, and demonstrate that the
class posterior probabilities are reasonably calibrated. MACC classifications
compare favorably to the classifications of several previous domain-specific
ASAS papers and to the ASAS Catalog of Variable Stars, which had classified
only 24% of those sources into one of 12 science classes. The MACC is publicly
available at http://www.bigmacc.info.
[3]
oai:arXiv.org:1106.2832 [pdf] - 1077288
Active Learning to Overcome Sample Selection Bias: Application to
Photometric Variable Star Classification
Submitted: 2011-06-14, last modified: 2011-06-17
Despite the great promise of machine-learning algorithms to classify and
predict astrophysical parameters for the vast numbers of astrophysical sources
and transients observed in large-scale surveys, the peculiarities of the
training data often manifest as strongly biased predictions on the data of
interest. Typically, training sets are derived from historical surveys of
brighter, more nearby objects than those from more extensive, deeper surveys
(testing data). This sample selection bias can cause catastrophic errors in
predictions on the testing data because a) standard assumptions for
machine-learned model selection procedures break down and b) dense regions of
testing space might be completely devoid of training data. We explore possible
remedies to sample selection bias, including importance weighting (IW),
co-training (CT), and active learning (AL). We argue that AL---where the data
whose inclusion in the training set would most improve predictions on the
testing set are queried for manual follow-up---is an effective approach and is
appropriate for many astronomical applications. For a variable star
classification problem on a well-studied set of stars from Hipparcos and OGLE,
AL is the optimal method in terms of error rate on the testing data, beating
the off-the-shelf classifier by 3.4% and the other proposed methods by at least
3.0%. To aid with manual labeling of variable stars, we developed a web
interface which allows for easy light curve visualization and querying of
external databases. Finally, we apply active learning to classify variable
stars in the ASAS survey, finding dramatic improvement in our agreement with
the ACVS catalog, from 65.5% to 79.5%, and a significant increase in the
classifier's average confidence for the testing set, from 14.6% to 42.9%, after
a few AL iterations.
[4]
oai:arXiv.org:0802.4292 [pdf] - 10536
Strong Lensing in Abell 1703: Constraints on the Slope of the Inner Dark
Matter Distribution
Limousin, M.;
Richard, J.;
Kneib, J. -P.;
Brink, H.;
Pello, R.;
Jullo, E.;
Tu, H.;
Sommer-Larsen, J.;
Egami, E.;
Michalowski, M. J.;
Cabanac, R.;
Stark, D. P.
Submitted: 2008-02-28, last modified: 2008-07-17
In this article, we apply strong lensing techniques in Abell 1703, a massive
X-ray luminous galaxy cluster at z=0.28. Our analysis is based on imaging data
both from space and ground in 8 bands, complemented with a spectroscopic
survey. Abell 1703 looks rather circular from the general shape of its multiply
imaged systems and present a dominant giant elliptical cD galaxy in its centre.
This cluster exhibits a remarkable bright 'central ring' formed by 4 bright
images at z_{spec}=0.888 located very close to the cD galaxy, providing
observational constraints that are potentially very interesting to probe the
central mass distribution. The stellar contribution from the cD galaxy (~1.25
10^{12} M_{sun} within 7") is accounted for in our parametric mass modelling,
and the underlying smooth dark matter component distribution is described using
a generalized NFW profile parametrized with a central logarithmic slope \alpha.
We find that within the range where observational constraints are present (from
~5" to ~50"), the slope of the dark matter distribution in Abell1703 is equal
to 1.09^{+0.05}_{-0.11} (3\sigma confidence level). The concentration parameter
is equal to c_{200} ~ 3.5, and the scale radius is constrained to be larger
than the region where observational constraints are available. Within this
radius, the 2D mass is equal to M(50")=2.4 10^{14} M_{\sun}. We cannot draw any
conclusions on cosmological models at this point since we lack results from
realistic numerical simulations containing baryons to make a proper comparison.
We advocate the need for a sample of observed and simulated unimodal relaxed
galaxy clusters in order to make reliable comparisons, and potentially provide
a test of cosmological models.