Normalized to: Turmon, M.
[1]
oai:arXiv.org:1601.04385 [pdf] - 1342128
Real-Time Data Mining of Massive Data Streams from Synoptic Sky Surveys
Submitted: 2016-01-17
The nature of scientific and technological data collection is evolving
rapidly: data volumes and rates grow exponentially, with increasing complexity
and information content, and there has been a transition from static data sets
to data streams that must be analyzed in real time. Interesting or anomalous
phenomena must be quickly characterized and followed up with additional
measurements via optimal deployment of limited assets. Modern astronomy
presents a variety of such phenomena in the form of transient events in digital
synoptic sky surveys, including cosmic explosions (supernovae, gamma ray
bursts), relativistic phenomena (black hole formation, jets), potentially
hazardous asteroids, etc. We have been developing a set of machine learning
tools to detect, classify and plan a response to transient events for astronomy
applications, using the Catalina Real-time Transient Survey (CRTS) as a
scientific and methodological testbed. The ability to respond rapidly to the
potentially most interesting events is a key bottleneck that limits the
scientific returns from the current and anticipated synoptic sky surveys.
Similar challenge arise in other contexts, from environmental monitoring using
sensor networks to autonomous spacecraft systems. Given the exponential growth
of data rates, and the time-critical response, we need a fully automated and
robust approach. We describe the results obtained to date, and the possible
future developments.
[2]
oai:arXiv.org:1407.3502 [pdf] - 1515683
Automated Real-Time Classification and Decision Making in Massive Data
Streams from Synoptic Sky Surveys
Submitted: 2014-07-13
The nature of scientific and technological data collection is evolving
rapidly: data volumes and rates grow exponentially, with increasing complexity
and information content, and there has been a transition from static data sets
to data streams that must be analyzed in real time. Interesting or anomalous
phenomena must be quickly characterized and followed up with additional
measurements via optimal deployment of limited assets. Modern astronomy
presents a variety of such phenomena in the form of transient events in digital
synoptic sky surveys, including cosmic explosions (supernovae, gamma ray
bursts), relativistic phenomena (black hole formation, jets), potentially
hazardous asteroids, etc. We have been developing a set of machine learning
tools to detect, classify and plan a response to transient events for astronomy
applications, using the Catalina Real-time Transient Survey (CRTS) as a
scientific and methodological testbed. The ability to respond rapidly to the
potentially most interesting events is a key bottleneck that limits the
scientific returns from the current and anticipated synoptic sky surveys.
Similar challenge arise in other contexts, from environmental monitoring using
sensor networks to autonomous spacecraft systems. Given the exponential growth
of data rates, and the time-critical response, we need a fully automated and
robust approach. We describe the results obtained to date, and the possible
future developments.
[3]
oai:arXiv.org:1404.1879 [pdf] - 806920
The Helioseismic and Magnetic Imager (HMI) Vector Magnetic Field
Pipeline: SHARPs -- Space-weather HMI Active Region Patches
Submitted: 2014-04-07
A new data product from the Helioseismic and Magnetic Imager (HMI) onboard
the Solar Dynamics Observatory (SDO) called Space-weather HMI Active Region
Patches (SHARPs) is now available. SDO/HMI is the first space-based instrument
to map the full-disk photospheric vector magnetic field with high cadence and
continuity. The SHARP data series provide maps in patches that encompass
automatically tracked magnetic concentrations for their entire lifetime; map
quantities include the photospheric vector magnetic field and its uncertainty,
along with Doppler velocity, continuum intensity, and line-of-sight magnetic
field. Furthermore, keywords in the SHARP data series provide several
parameters that concisely characterize the magnetic-field distribution and its
deviation from a potential-field configuration. These indices may be useful for
active-region event forecasting and for identifying regions of interest. The
indices are calculated per patch and are available on a twelve-minute cadence.
Quick-look data are available within approximately three hours of observation;
definitive science products are produced approximately five weeks later. SHARP
data are available at http://jsoc.stanford.edu and maps are available in either
of two different coordinate systems. This article describes the SHARP data
products and presents examples of SHARP data and parameters.
[4]
oai:arXiv.org:1404.1881 [pdf] - 806922
The Helioseismic and Magnetic Imager (HMI) Vector Magnetic Field
Pipeline: Overview and Performance
Hoeksema, J. Todd;
Liu, Yang;
Hayashi, Keiji;
Sun, Xudong;
Schou, Jesper;
Couvidat, Sebastien;
Norton, Aimee;
Bobra, Monica;
Centeno, Rebecca;
Leka, K. D.;
Barnes, Graham;
Turmon, Michael J.
Submitted: 2014-04-07
The Helioseismic and Magnetic Imager (HMI) began near-continuous full-disk
solar measurements on 1 May 2010 from the Solar Dynamics Observatory (SDO). An
automated processing pipeline keeps pace with observations to produce
observable quantities, including the photospheric vector magnetic field, from
sequences of filtergrams. The primary 720s observables were released in mid
2010, including Stokes polarization parameters measured at six wavelengths as
well as intensity, Doppler velocity, and the line-of-sight magnetic field. More
advanced products, including the full vector magnetic field, are now available.
Automatically identified HMI Active Region Patches (HARPs) track the location
and shape of magnetic regions throughout their lifetime.
The vector field is computed using the Very Fast Inversion of the Stokes
Vector (VFISV) code optimized for the HMI pipeline; the remaining 180 degree
azimuth ambiguity is resolved with the Minimum Energy (ME0) code. The
Milne-Eddington inversion is performed on all full-disk HMI observations. The
disambiguation, until recently run only on HARP regions, is now implemented for
the full disk. Vector and scalar quantities in the patches are used to derive
active region indices potentially useful for forecasting; the data maps and
indices are collected in the SHARP data series, hmi.sharp_720s. Patches are
provided in both CCD and heliographic coordinates.
HMI provides continuous coverage of the vector field, but has modest spatial,
spectral, and temporal resolution. Coupled with limitations of the analysis and
interpretation techniques, effects of the orbital velocity, and instrument
performance, the resulting measurements have a certain dynamic range and
sensitivity and are subject to systematic errors and uncertainties that are
characterized in this report.
[5]
oai:arXiv.org:1310.1976 [pdf] - 1516225
Feature Selection Strategies for Classifying High Dimensional
Astronomical Data Sets
Donalek, Ciro;
A., Arun Kumar;
Djorgovski, S. G.;
Mahabal, Ashish A.;
Graham, Matthew J.;
Fuchs, Thomas J.;
Turmon, Michael J.;
Philip, N. Sajeeth;
Yang, Michael Ting-Chang;
Longo, Giuseppe
Submitted: 2013-10-07
The amount of collected data in many scientific fields is increasing, all of
them requiring a common task: extract knowledge from massive, multi parametric
data sets, as rapidly and efficiently possible. This is especially true in
astronomy where synoptic sky surveys are enabling new research frontiers in the
time domain astronomy and posing several new object classification challenges
in multi dimensional spaces; given the high number of parameters available for
each object, feature selection is quickly becoming a crucial task in analyzing
astronomical data sets. Using data sets extracted from the ongoing Catalina
Real-Time Transient Surveys (CRTS) and the Kepler Mission we illustrate a
variety of feature selection strategies used to identify the subsets that give
the most information and the results achieved applying these techniques to
three major astronomical problems.
[6]
oai:arXiv.org:1209.1681 [pdf] - 1515667
Flashes in a Star Stream: Automated Classification of Astronomical
Transient Events
Submitted: 2012-09-07
An automated, rapid classification of transient events detected in the modern
synoptic sky surveys is essential for their scientific utility and effective
follow-up using scarce resources. This presents some unusual challenges: the
data are sparse, heterogeneous and incomplete; evolving in time; and most of
the relevant information comes not from the data stream itself, but from a
variety of archival data and contextual information (spatial, temporal, and
multi-wavelength). We are exploring a variety of novel techniques, mostly
Bayesian, to respond to these challenges, using the ongoing CRTS sky survey as
a testbed. The current surveys are already overwhelming our ability to
effectively follow all of the potentially interesting events, and these
challenges will grow by orders of magnitude over the next decade as the more
ambitious sky surveys get under way. While we focus on an application in a
specific domain (astrophysics), these challenges are more broadly relevant for
event or anomaly detection and knowledge discovery in massive data streams.
[7]
oai:arXiv.org:1111.3699 [pdf] - 1091687
Real Time Classification of Transient Events in Synoptic Sky Surveys
Submitted: 2011-11-15
An automated, rapid classification of transient events detected in the modern
synoptic sky surveys is essential for their scientific utility and effective
follow-up using scarce resources. This problem will grow by orders of magnitude
with the next generation of surveys. We are exploring a variety of novel
automated classification techniques, mostly Bayesian, to respond to these
challenges, using the ongoing CRTS sky survey as a testbed. We describe briefly
some of the methods used.
[8]
oai:arXiv.org:1111.0313 [pdf] - 433619
Discovery, classification, and scientific exploration of transient
events from the Catalina Real-time Transient Survey
Mahabal, A. A.;
Djorgovski, S. G.;
Drake, A. J.;
Donalek, C.;
Graham, M. J.;
Williams, R. D.;
Chen, Y.;
Moghaddam, B.;
Turmon, M.;
Beshore, E.;
Larson, S.
Submitted: 2011-11-01
Exploration of the time domain - variable and transient objects and phenomena
- is rapidly becoming a vibrant research frontier, touching on essentially
every field of astronomy and astrophysics, from the Solar system to cosmology.
Time domain astronomy is being enabled by the advent of the new generation of
synoptic sky surveys that cover large areas on the sky repeatedly, and
generating massive data streams. Their scientific exploration poses many
challenges, driven mainly by the need for a real-time discovery,
classification, and follow-up of the interesting events. Here we describe the
Catalina Real-Time Transient Survey (CRTS), that discovers and publishes
transient events at optical wavelengths in real time, thus benefiting the
entire community. We describe some of the scientific results to date, and then
focus on the challenges of the automated classification and prioritization of
transient events. CRTS represents a scientific and a technological testbed and
precursor for the larger surveys in the future, including the Large Synoptic
Survey Telescope (LSST) and the Square Kilometer Array (SKA).
[9]
oai:arXiv.org:1110.4655 [pdf] - 428693
Towards an Automated Classification of Transient Events in Synoptic Sky
Surveys
Submitted: 2011-10-20
We describe the development of a system for an automated, iterative,
real-time classification of transient events discovered in synoptic sky
surveys. The system under development incorporates a number of Machine Learning
techniques, mostly using Bayesian approaches, due to the sparse nature,
heterogeneity, and variable incompleteness of the available data. The
classifications are improved iteratively as the new measurements are obtained.
One novel feature is the development of an automated follow-up recommendation
engine, that suggest those measurements that would be the most advantageous in
terms of resolving classification ambiguities and/or characterization of the
astrophysically most interesting objects, given a set of available follow-up
assets and their cost functions. This illustrates the symbiotic relationship of
astronomy and applied computer science through the emerging discipline of
AstroInformatics.
[10]
oai:arXiv.org:0810.4527 [pdf] - 17807
Towards Real-time Classification of Astronomical Transients
Mahabal, A.;
Djorgovski, S. G.;
Williams, R.;
Drake, A.;
Donalek, C.;
Graham, M.;
Moghaddam, B.;
Turmon, M.;
Jewell, J.;
Khosla, A.;
Hensley, B.
Submitted: 2008-10-24
Exploration of time domain is now a vibrant area of research in astronomy,
driven by the advent of digital synoptic sky surveys. While panoramic surveys
can detect variable or transient events, typically some follow-up observations
are needed; for short-lived phenomena, a rapid response is essential. Ability
to automatically classify and prioritize transient events for follow-up studies
becomes critical as the data rates increase. We have been developing such
methods using the data streams from the Palomar-Quest survey, the Catalina Sky
Survey and others, using the VOEventNet framework. The goal is to automatically
classify transient events, using the new measurements, combined with archival
data (previous and multi-wavelength measurements), and contextual information
(e.g., Galactic or ecliptic latitude, presence of a possible host galaxy
nearby, etc.); and to iterate them dynamically as the follow-up data come in
(e.g., light curves or colors). We have been investigating Bayesian
methodologies for classification, as well as discriminated follow-up to
optimize the use of available resources, including Naive Bayesian approach, and
the non-parametric Gaussian process regression. We will also be deploying
variants of the traditional machine learning techniques such as Neural Nets and
Support Vector Machines on datasets of reliably classified transients as they
build up.
[11]
oai:arXiv.org:0802.3199 [pdf] - 1937497
Automated Probabilistic Classification of Transients and Variables
Submitted: 2008-02-21
There is an increasing number of large, digital, synoptic sky surveys, in
which repeated observations are obtained over large areas of the sky in
multiple epochs. Likewise, there is a growth in the number of (often automated
or robotic) follow-up facilities with varied capabilities in terms of
instruments, depth, cadence, wavelengths, etc., most of which are geared toward
some specific astrophysical phenomenon. As the number of detected transient
events grows, an automated, probabilistic classification of the detected
variables and transients becomes increasingly important, so that an optimal use
can be made of follow-up facilities, without unnecessary duplication of effort.
We describe a methodology now under development for a prototype event
classification system; it involves Bayesian and Machine Learning classifiers,
automated incorporation of feedback from follow-up observations, and
discriminated or directed follow-up requests. This type of methodology may be
essential for the massive synoptic sky surveys in the future.