Full-text search for arXiv

Brink, H.

Normalized to: Brink, H.

4 article(s) in total. 23 co-authors, from 1 to 3 common article(s). Median position in authors list is 3,5.

[1] oai:arXiv.org:1209.3775 [pdf] - 1151446

Using Machine Learning for Discovery in Synoptic Survey Imaging

Brink, Henrik; Richards, Joseph W.; Poznanski, Dovi; Bloom, Joshua S.; Rice, John; Negahban, Sahand; Wainwright, Martin

Comments: 16 pages, 14 figures

Submitted: 2012-09-17

Modern time-domain surveys continuously monitor large swaths of the sky to look for astronomical variability. Astrophysical discovery in such data sets is complicated by the fact that detections of real transient and variable sources are highly outnumbered by bogus detections caused by imperfect subtractions, atmospheric effects and detector artefacts. In this work we present a machine learning (ML) framework for discovery of variability in time-domain imaging surveys. Our ML methods provide probabilistic statements, in near real time, about the degree to which each newly observed source is astrophysically relevant source of variable brightness. We provide details about each of the analysis steps involved, including compilation of the training and testing sets, construction of descriptive image-based and contextual features, and optimization of the feature subset and model tuning parameters. Using a validation set of nearly 30,000 objects from the Palomar Transient Factory, we demonstrate a missed detection rate of at most 7.7% at our chosen false-positive rate of 1% for an optimized ML classifier of 23 features, selected to avoid feature correlation and over-fitting from an initial library of 42 attributes. Importantly, we show that our classification methodology is insensitive to mis-labelled training data up to a contamination of nearly 10%, making it easier to compile sufficient training sets for accurate performance in future surveys. This ML framework, if so adopted, should enable the maximization of scientific gain from future synoptic survey and enable fast follow-up decisions on the vast amounts of streaming data produced by such experiments.

[2] oai:arXiv.org:1204.4180 [pdf] - 1118094

Construction of a Calibrated Probabilistic Classification Catalog: Application to 50k Variable Sources in the All-Sky Automated Survey

Richards, Joseph W.; Starr, Dan L.; Miller, Adam A.; Bloom, Joshua S.; Butler, Nathaniel R.; Brink, Henrik; Crellin-Quick, Arien

Comments: 56 pages, 15 figures, 8 tables, submitted. The Machine-learned ASAS Classification Catalog is available at http://www.bigmacc.info

Submitted: 2012-04-18, last modified: 2012-04-24

With growing data volumes from synoptic surveys, astronomers must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities, and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All Sky Automated Survey (ASAS), and unveil the Machine-learned ASAS Classification Catalog (MACC), which is a 28-class probabilistic classification catalog of 50,124 ASAS sources. We estimate that MACC achieves a sub-20% classification error rate, and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes. The MACC is publicly available at http://www.bigmacc.info.

[3] oai:arXiv.org:1106.2832 [pdf] - 1077288

Active Learning to Overcome Sample Selection Bias: Application to Photometric Variable Star Classification

Richards, Joseph W.; Starr, Dan L.; Brink, Henrik; Miller, Adam A.; Bloom, Joshua S.; Butler, Nathaniel R.; James, J. Berian; Long, James P.; Rice, John

Comments: 43 pages, 11 figures, submitted to ApJ

Submitted: 2011-06-14, last modified: 2011-06-17

Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because a) standard assumptions for machine-learned model selection procedures break down and b) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting (IW), co-training (CT), and active learning (AL). We argue that AL---where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up---is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and OGLE, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply active learning to classify variable stars in the ASAS survey, finding dramatic improvement in our agreement with the ACVS catalog, from 65.5% to 79.5%, and a significant increase in the classifier's average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.

[4] oai:arXiv.org:0802.4292 [pdf] - 10536

Strong Lensing in Abell 1703: Constraints on the Slope of the Inner Dark Matter Distribution

Limousin, M.; Richard, J.; Kneib, J. -P.; Brink, H.; Pello, R.; Jullo, E.; Tu, H.; Sommer-Larsen, J.; Egami, E.; Michalowski, M. J.; Cabanac, R.; Stark, D. P.

Comments: Accepted for publication, conclusion unchanged but strengthened by a serie of tests suggested by a constructive referee report. Higher resolution available at http://www.dark-cosmology.dk/~marceau/1703.php.html

Submitted: 2008-02-28, last modified: 2008-07-17

In this article, we apply strong lensing techniques in Abell 1703, a massive X-ray luminous galaxy cluster at z=0.28. Our analysis is based on imaging data both from space and ground in 8 bands, complemented with a spectroscopic survey. Abell 1703 looks rather circular from the general shape of its multiply imaged systems and present a dominant giant elliptical cD galaxy in its centre. This cluster exhibits a remarkable bright 'central ring' formed by 4 bright images at z_{spec}=0.888 located very close to the cD galaxy, providing observational constraints that are potentially very interesting to probe the central mass distribution. The stellar contribution from the cD galaxy (~1.25 10^{12} M_{sun} within 7") is accounted for in our parametric mass modelling, and the underlying smooth dark matter component distribution is described using a generalized NFW profile parametrized with a central logarithmic slope \alpha. We find that within the range where observational constraints are present (from ~5" to ~50"), the slope of the dark matter distribution in Abell1703 is equal to 1.09^{+0.05}_{-0.11} (3\sigma confidence level). The concentration parameter is equal to c_{200} ~ 3.5, and the scale radius is constrained to be larger than the region where observational constraints are available. Within this radius, the 2D mass is equal to M(50")=2.4 10^{14} M_{\sun}. We cannot draw any conclusions on cosmological models at this point since we lack results from realistic numerical simulations containing baryons to make a proper comparison. We advocate the need for a sample of observed and simulated unimodal relaxed galaxy clusters in order to make reliable comparisons, and potentially provide a test of cosmological models.