Normalized to: Protopapas, P.
[1]
oai:arXiv.org:2002.00994 [pdf] - 2046524
Scalable End-to-end Recurrent Neural Network for Variable star
classification
Submitted: 2020-02-03
During the last decade, considerable effort has been made to perform
automatic classification of variable stars using machine learning techniques.
Traditionally, light curves are represented as a vector of descriptors or
features used as input for many algorithms. Some features are computationally
expensive, cannot be updated quickly and hence for large datasets such as the
LSST cannot be applied. Previous work has been done to develop alternative
unsupervised feature extraction algorithms for light curves, but the cost of
doing so still remains high. In this work, we propose an end-to-end algorithm
that automatically learns the representation of light curves that allows an
accurate automatic classification. We study a series of deep learning
architectures based on Recurrent Neural Networks and test them in automated
classification scenarios. Our method uses minimal data preprocessing, can be
updated with a low computational cost for new observations and light curves,
and can scale up to massive datasets. We transform each light curve into an
input matrix representation whose elements are the differences in time and
magnitude, and the outputs are classification probabilities. We test our method
in three surveys: OGLE-III, Gaia and WISE. We obtain accuracies of about $95\%$
in the main classes and $75\%$ in the majority of subclasses. We compare our
results with the Random Forest classifier and obtain competitive accuracies
while being faster and scalable. The analysis shows that the computational
complexity of our approach grows up linearly with the light curve size, while
the traditional approach cost grows as $N\log{(N)}$.
[2]
oai:arXiv.org:1912.02235 [pdf] - 2026504
Streaming Classification of Variable Stars
Submitted: 2019-12-04
In the last years, automatic classification of variable stars has received
substantial attention. Using machine learning techniques for this task has
proven to be quite useful. Typically, machine learning classifiers used for
this task require to have a fixed training set, and the training process is
performed offline. Upcoming surveys such as the Large Synoptic Survey Telescope
(LSST) will generate new observations daily, where an automatic classification
system able to create alerts online will be mandatory. A system with those
characteristics must be able to update itself incrementally. Unfortunately,
after training, most machine learning classifiers do not support the inclusion
of new observations in light curves, they need to re-train from scratch.
Naively re-training from scratch is not an option in streaming settings, mainly
because of the expensive pre-processing routines required to obtain a vector
representation of light curves (features) each time we include new
observations. In this work, we propose a streaming probabilistic classification
model; it uses a set of newly designed features that work incrementally. With
this model, we can have a machine learning classifier that updates itself in
real time with new observations. To test our approach, we simulate a streaming
scenario with light curves from CoRot, OGLE and MACHO catalogs. Results show
that our model achieves high classification performance, staying an order of
magnitude faster than traditional classification approaches.
[3]
oai:arXiv.org:1911.02444 [pdf] - 2026263
An Information Theory Approach on Deciding Spectroscopic Follow Ups
Submitted: 2019-11-06
Classification and characterization of variable phenomena and transient
phenomena are critical for astrophysics and cosmology. These objects are
commonly studied using photometric time series or spectroscopic data. Given
that many ongoing and future surveys are in time-domain and given that adding
spectra provide further insights but requires more observational resources, it
would be valuable to know which objects should we prioritize to have spectrum
in addition to time series. We propose a methodology in a probabilistic setting
that determines a-priory which objects are worth taking spectrum to obtain
better insights, where we focus 'insight' as the type of the object
(classification). Objects for which we query its spectrum are reclassified
using their full spectrum information. We first train two classifiers, one that
uses photometric data and another that uses photometric and spectroscopic data
together. Then for each photometric object we estimate the probability of each
possible spectrum outcome. We combine these models in various probabilistic
frameworks (strategies) which are used to guide the selection of follow up
observations. The best strategy depends on the intended use, whether it is
getting more confidence or accuracy. For a given number of candidate objects
(127, equal to 5% of the dataset) for taking spectra, we improve 37% class
prediction accuracy as opposed to 20% of a non-naive (non-random) best
base-line strategy. Our approach provides a general framework for follow-up
strategies and can be extended beyond classification and to include other forms
of follow-ups beyond spectroscopy.
[4]
oai:arXiv.org:1903.03254 [pdf] - 1846983
An Algorithm for the Visualization of Relevant Patterns in Astronomical
Light Curves
Submitted: 2019-03-07
Within the last years, the classification of variable stars with Machine
Learning has become a mainstream area of research. Recently, visualization of
time series is attracting more attention in data science as a tool to visually
help scientists to recognize significant patterns in complex dynamics. Within
the Machine Learning literature, dictionary-based methods have been widely used
to encode relevant parts of image data. These methods intrinsically assign a
degree of importance to patches in pictures, according to their contribution in
the image reconstruction. Inspired by dictionary-based techniques, we present
an approach that naturally provides the visualization of salient parts in
astronomical light curves, making the analogy between image patches and
relevant pieces in time series. Our approach encodes the most meaningful
patterns such that we can approximately reconstruct light curves by just using
the encoded information. We test our method in light curves from the OGLE-III
and StarLight databases. Our results show that the proposed model delivers an
automatic and intuitive visualization of relevant light curve parts, such as
local peaks and drops in magnitude.
[5]
oai:arXiv.org:1807.03869 [pdf] - 1957097
Deep Learning for Image Sequence Classification of Astronomical Events
Submitted: 2018-07-10, last modified: 2018-11-07
We propose a new sequential classification model for astronomical objects
based on a recurrent convolutional neural network (RCNN) which uses sequences
of images as inputs. This approach avoids the computation of light curves or
difference images. This is the first time that sequences of images are used
directly for the classification of variable objects in astronomy. The second
contribution of this work is the image simulation process. We generate
synthetic image sequences that take into account the instrumental and observing
conditions, obtaining a realistic, set of movies for each astronomical object.
The simulated dataset is used to train our RCNN classifier. This approach
allows us to generate datasets to train and test our RCNN model for different
astronomical surveys and telescopes. We aim at building a simulated dataset
whose distribution is close enough to the real dataset, so that a fine tuning
could match the distributions between real and simulated dataset. To test the
RCNN classifier trained with the synthetic dataset, we used real-world data
from the High cadence Transient Survey (HiTS) obtaining an average recall of
85%, improved to 94% after performing fine tuning with 10 real samples per
class. We compare the results of our model with those of a light curve random
forest classifier. The proposed RCNN with fine tuning has a similar performance
on the HiTS dataset compared to the light curve classifier, trained on an
augmented training set with 10 real samples per class. The RCNN approach
presents several advantages in an alert stream classification scenario, such as
a reduction of the data pre-processing, faster online evaluation and easier
performance improvement using a few real data samples. These results encourage
us to use this method for alert brokers systems that will process alert streams
generated by new telescopes such as the Large Synoptic Survey Telescope.
[6]
oai:arXiv.org:1810.07857 [pdf] - 1953345
Multiband galaxy morphologies for CLASH: a convolutional neural network
transferred from CANDELS
Submitted: 2018-10-17
We present visual-like morphologies over 16 photometric bands, from
ultra-violet to near infrared, for 8,412 galaxies in the Cluster Lensing And
Supernova survey with Hubble (CLASH) obtained by a convolutional neural network
(CNN) model. Our model follows the CANDELS main morphological classification
scheme, obtaining the probability for each galaxy at each CLASH band of being
spheroid, disk, irregular, point source, or unclassifiable. Our catalog
contains morphologies for each galaxy with Hmag < 24.5 in every filter where
the galaxy is observed. We trained an initial CNN model using approximately
7,500 expert eyeball labels from The Cosmic Assembly Near-IR Deep Extragalactic
Legacy Survey (CANDELS). We created eyeball labels for 100 randomly selected
galaxies per each of the 16-filters set of CLASH (1,600 galaxy images in
total), where each image was classified by at least five of us. We use these
labels to fine-tune the network in order to accurately predict labels for the
CLASH data and to evaluate the performance of our model. We achieve a
root-mean-square error of 0.0991 on the test set. We show that our proposed
fine-tuning technique reduces the number of labeled images needed for training,
as compared to directly training over the CLASH data, and achieves a better
performance. This approach is very useful to minimize eyeball labeling efforts
when classifying unlabeled data from new surveys. This will become particularly
useful for massive datasets such as the ones coming from near future surveys
such as EUCLID or the LSST. Our catalog consists of prediction of probabilities
for each galaxy by morphology in their different bands and is made publicly
available at http://www.inf.udec.cl/~guille/data/Deep-CLASH.csv.
[7]
oai:arXiv.org:1809.00763 [pdf] - 1767631
The High Cadence Transient Survey (HITS): Compilation and
characterization of light-curve catalogs
Martínez-Palomera, Jorge;
Förster, Francisco;
Protopapas, Pavlos;
Maureira, Juan Carlos;
Lira, Paulina;
Cabrera-Vives, Guillermo;
Huijse, Pablo;
Galbany, Lluis;
de Jaeger, Thomas;
González-Gaitán, Santiago;
Medina, Gustavo;
Pignata, Giuliano;
Martín, Jaime San;
Hamuy, Mario;
Muñoz, Ricardo R.
Submitted: 2018-09-03, last modified: 2018-09-07
The High Cadence Transient Survey (HiTS) aims to discover and study transient
objects with characteristic timescales between hours and days, such as
pulsating, eclipsing and exploding stars. This survey represents a unique
laboratory to explore large etendue observations from cadences of about 0.1
days and to test new computational tools for the analysis of large data. This
work follows a fully \textit{Data Science} approach: from the raw data to the
analysis and classification of variable sources. We compile a catalog of
${\sim}15$ million object detections and a catalog of ${\sim}2.5$ million
light-curves classified by variability. The typical depth of the survey is
$24.2$, $24.3$, $24.1$ and $23.8$ in $u$, $g$, $r$ and $i$ bands, respectively.
We classified all point-like non-moving sources by first extracting features
from their light-curves and then applying a Random Forest classifier. For the
classification, we used a training set constructed using a combination of
cross-matched catalogs, visual inspection, transfer/active learning and data
augmentation. The classification model consists of several Random Forest
classifiers organized in a hierarchical scheme. The classifier accuracy
estimated on a test set is approximately $97\%$. In the unlabeled data,
$3\,485$ sources were classified as variables, of which $1\,321$ were
classified as periodic. Among the periodic classes we discovered with high
confidence, 1 $\delta$-scutti, 39 eclipsing binaries, 48 rotational variables
and 90 RR-Lyrae and for the non-periodic classes we discovered 1 cataclysmic
variables, 630 QSO, and 1 supernova candidates. The first data release can be
accessed in the project archive of HiTS.
[8]
oai:arXiv.org:1803.10779 [pdf] - 1743672
Unraveling the Spectral Energy Distributions of Clustered YSOs
Submitted: 2018-03-28
Stars form in clustered environments, but how they form when the available
resources are shared is still not well understood. A related question is
whether the IMF is in fact universal across galactic environments, a galactic
initial mass function (IGIMF), or whether it is an average of local IMFs. One
of the long-standing problems in resolving this question and in the study of
young clusters is observational: the emission from multiple sources is
frequently seen as blended because at different wavelengths or with different
telescopes the beam sizes are different. The confusion hinders our ability to
fully characterize clustered star formation. Here we present a new method that
uses a genetic algorithm and Bayesian inference to fit the blended SEDs and
images of individual YSOs in confused clusters. We apply this method to the
infrared photometry of a sample comprising 70 Spitzer-selected, low-mass
($M_{\rm{cl}}<100~\rm{M}_{\odot}$) young clusters in the galactic plane, and
use the derived physical parameters to investigate the distributions of masses
and evolutionary stages of clustered YSOs, and the implications of those
distributions for studies of the IMF and the different models of star
formation. We find that for low-mass clusters composed of class I and class II
YSOs, there exists a non-trivial relationship between the total stellar mass of
the cluster ($M_{\rm{cl}}$) and the mass of its most massive member
($m_{\rm{max}}$). The properties of the derived correlation are most compatible
with the random sampling of a Kroupa IMF, with a fundamental high-mass limit of
$150~\rm{M}_{\odot}$. Our results are also compatible with SPH models that
predict a dynamical termination of the accretion in protostars, with massive
stars undergoing this stopping at later times in their evolution.
[9]
oai:arXiv.org:1801.09732 [pdf] - 1626565
Uncertain classification of Variable Stars: handling observational GAPS
and noise
Submitted: 2018-01-29
Automatic classification methods applied to sky surveys have revolutionized
the astronomical target selection process. Most surveys generate a vast amount
of time series, or \quotes{lightcurves}, that represent the brightness
variability of stellar objects in time. Unfortunately, lightcurves'
observations take several years to be completed, producing truncated time
series that generally remain without the application of automatic classifiers
until they are finished. This happens because state of the art methods rely on
a variety of statistical descriptors or features that present an increasing
degree of dispersion when the number of observations decreases, which reduces
their precision. In this paper we propose a novel method that increases the
performance of automatic classifiers of variable stars by incorporating the
deviations that scarcity of observations produces. Our method uses Gaussian
Process Regression to form a probabilistic model of each lightcurve's
observations. Then, based on this model, bootstrapped samples of the time
series features are generated. Finally a bagging approach is used to improve
the overall performance of the classification. We perform tests on the MACHO
and OGLE catalogs, results show that our method classifies effectively some
variability classes using a small fraction of the original observations. For
example, we found that RR Lyrae stars can be classified with around 80\% of
accuracy just by observing the first 5\% of the whole lightcurves' observations
in MACHO and OGLE catalogs. We believe these results prove that, when studying
lightcurves, it is important to consider the features' error and how the
measurement process impacts it.
[10]
oai:arXiv.org:1801.09737 [pdf] - 1626567
Automatic Survey-Invariant Variable Star Classification
Submitted: 2018-01-29
Machine learning techniques have been successfully used to classify variable
stars on widely-studied astronomical surveys. These datasets have been
available to astronomers long enough, thus allowing them to perform deep
analysis over several variable sources and generating useful catalogs with
identified variable stars. The products of these studies are labeled data that
enable supervised learning models to be trained successfully. However, when
these models are blindly applied to data from new sky surveys their performance
drops significantly. Furthermore, unlabeled data becomes available at a much
higher rate than its labeled counterpart, since labeling is a manual and
time-consuming effort. Domain adaptation techniques aim to learn from a domain
where labeled data is available, the \textit{source domain}, and through some
adaptation perform well on a different domain, the \textit{target domain}. We
propose a full probabilistic model that represents the joint distribution of
features from two surveys as well as a probabilistic transformation of the
features between one survey to the other. This allows us to transfer labeled
data to a study where it is not available and to effectively run a variable
star classification model in a new survey. Our model represents the features of
each domain as a Gaussian mixture and models the transformation as a
translation, rotation and scaling of each separate component. We perform tests
using three different variability catalogs: EROS, MACHO, and HiTS, presenting
differences among them, such as the amount of observations per star, cadence,
observational time and optical bands observed, among others.
[11]
oai:arXiv.org:1709.05427 [pdf] - 1648624
A dwarf planet class object in the 21:5 resonance with Neptune
Holman, Matthew J.;
Payne, Matthew J.;
Fraser, Wesley;
Lacerda, Pedro;
Bannister, Michele T.;
Lackner, Michael;
Chen, Ying-Tung;
Lin, Hsing Wen;
Smith, Kenneth W.;
Kotanekova, Rositako;
Young, David;
Chambers, K.;
Chastel, S.;
Denneau, L.;
Fitzsimmons, A.;
Flewelling, H.;
Grav, Tommy;
Huber, M.;
Induni, Nick;
Kudritzki, Rolf-Peter;
Krolewski, Alex;
Jedicke, R.;
Kaiser, N.;
Lilly, E.;
Magnier, E.;
Mark, Zachary;
Meech, K. J.;
micheli, M.;
Murray, Daniel;
Parker, Alex;
Protopapas, Pavlos;
Ragozzine, Darin;
Veres, Peter;
Wainscoat, R.;
Waters, C.;
Weryk, R.
Submitted: 2017-09-15
We report the discovery of a $H_r = 3.4\pm0.1$ dwarf planet candidate by the
Pan-STARRS Outer Solar System Survey. 2010 JO$_{179}$ is red with $(g-r)=0.88
\pm 0.21$, roughly round, and slowly rotating, with a period of $30.6$ hr.
Estimates of its albedo imply a diameter of 600--900~km. Observations sampling
the span between 2005--2016 provide an exceptionally well-determined orbit for
2010 JO$_{179}$, with a semi-major axis of $78.307\pm0.009$ au, distant orbits
known to this precision are rare. We find that 2010 JO$_{179}$ librates
securely within the 21:5 mean-motion resonance with Neptune on hundred-megayear
time scales, joining the small but growing set of known distant dwarf planets
on metastable resonant orbits. These imply a substantial trans-Neptunian
population that shifts between stability in high-order resonances, the detached
population, and the eroding population of the scattering disk.
[12]
oai:arXiv.org:1709.03541 [pdf] - 1685055
Robust period estimation using mutual information for multi-band light
curves in the synoptic survey era
Submitted: 2017-09-11
The Large Synoptic Survey Telescope (LSST) will produce an unprecedented
amount of light curves using six optical bands. Robust and efficient methods
that can aggregate data from multidimensional sparsely-sampled time series are
needed. In this paper we present a new method for light curve period estimation
based on the quadratic mutual information (QMI). The proposed method does not
assume a particular model for the light curve nor its underlying probability
density and it is robust to non-Gaussian noise and outliers. By combining the
QMI from several bands the true period can be estimated even when no
single-band QMI yields the period. Period recovery performance as a function of
average magnitude and sample size is measured using 30,000 synthetic multi-band
light curves of RR Lyrae and Cepheid variables generated by the LSST Operations
and Catalog simulators. The results show that aggregating information from
several bands is highly beneficial in LSST sparsely-sampled time series,
obtaining an absolute increase in period recovery rate up to 50%. We also show
that the QMI is more robust to noise and light curve length (sample size) than
the multiband generalizations of the Lomb Scargle and Analysis of Variance
periodograms, recovering the true period in 10-30% more cases than its
competitors. A python package containing efficient Cython implementations of
the QMI and other methods is provided.
[13]
oai:arXiv.org:1612.08747 [pdf] - 1539002
Detection of Time Lags Between Quasar Continuum Emission Bands based on
Pan-STARRS Light-curves
Jiang, Yan-Fei;
Green, Paul J.;
Greene, Jenny E.;
Morganson, Eric;
Shen, Yue;
Pancoast, Anna;
MacLeod, Chelsea L.;
Anderson, Scott F.;
Brandt, W. N.;
Grier, C. J.;
Rix, H. W.;
Ruan, John J.;
Protopapas, Pavlos;
Scott, Caroline;
Burgett, W. S.;
Hodapp, K. W.;
Huber, M. E.;
Kaiser, N.;
Kudritzki, R. P.;
Magnier, E. A.;
Metcalfe, N.;
Tonry, J. T.;
Wainscoat, R. J.;
Waters, C.
Submitted: 2016-12-27
We study the time lags between the continuum emission of quasars at different
wavelengths, based on more than four years of multi-band ($g$, $r$, $i$, $z$)
light-curves in the Pan-STARRS Medium Deep Fields. As photons from different
bands emerge from different radial ranges in the accretion disk, the lags
constrain the sizes of the accretion disks. We select 240 quasars with
redshifts $z \approx 1$ or $z \approx 0.3$ that are relatively emission line
free. The light curves are sampled from day to month timescales, which makes it
possible to detect lags on the scale of the light crossing time of the
accretion disks. With the code JAVELIN, we detect typical lags of several days
in the rest frame between the $g$ band and the $riz$ bands. The detected lags
are $\sim 2-3$ times larger than the light crossing time estimated from the
standard thin disk model, consistent with the recently measured lag in NGC5548
and micro-lensing measurements of quasars. The lags in our sample are found to
increase with increasing luminosity. Furthermore, the increase in lags going
from $g-r$ to $g-i$ and then to $g-z$ is slower than predicted in the thin disk
model, particularly for high luminosity quasars. The radial temperature profile
in the disk must be different from what is assumed. We also find evidence that
the lags decrease with increasing line ratios between ultraviolet FeII lines
and MgII, which may point to changes in the accretion disk structure at higher
metallicity.
[14]
oai:arXiv.org:1602.08977 [pdf] - 1388963
Clustering Based Feature Learning on Variable Stars
Submitted: 2016-02-29
The success of automatic classification of variable stars strongly depends on
the lightcurve representation. Usually, lightcurves are represented as a vector
of many statistical descriptors designed by astronomers called features. These
descriptors commonly demand significant computational power to calculate,
require substantial research effort to develop and do not guarantee good
performance on the final classification task. Today, lightcurve representation
is not entirely automatic; algorithms that extract lightcurve features are
designed by humans and must be manually tuned up for every survey. The vast
amounts of data that will be generated in future surveys like LSST mean
astronomers must develop analysis pipelines that are both scalable and
automated. Recently, substantial efforts have been made in the machine learning
community to develop methods that prescind from expert-designed and manually
tuned features for features that are automatically learned from data. In this
work we present what is, to our knowledge, the first unsupervised feature
learning algorithm designed for variable stars. Our method first extracts a
large number of lightcurve subsequences from a given set of photometric data,
which are then clustered to find common local patterns in the time series.
Representatives of these patterns, called exemplars, are then used to transform
lightcurves of a labeled set into a new representation that can then be used to
train an automatic classifier. The proposed algorithm learns the features from
both labeled and unlabeled lightcurves, overcoming the bias generated when the
learning process is done only with labeled data. We test our method on MACHO
and OGLE datasets; the results show that the classification performance we
achieve is as good and in some cases better than the performance achieved using
traditional features, while the computational cost is significantly lower.
[15]
oai:arXiv.org:1601.03013 [pdf] - 1365579
Meta Classification for Variable Stars
Submitted: 2016-01-12
The need for the development of automatic tools to explore astronomical
databases has been recognized since the inception of CCDs and modern computers.
Astronomers already have developed solutions to tackle several science
problems, such as automatic classification of stellar objects, outlier
detection, and globular clusters identification, among others. New science
problems emerge and it is critical to be able to re-use the models learned
before, without rebuilding everything from the beginning when the science
problem changes. In this paper, we propose a new meta-model that automatically
integrates existing classification models of variable stars. The proposed
meta-model incorporates existing models that are trained in a different
context, answering different questions and using different representations of
data. Conventional mixture of experts algorithms in machine learning literature
can not be used since each expert (model) uses different inputs. We also
consider computational complexity of the model by using the most expensive
models only when it is necessary. We test our model with EROS-2 and MACHO
datasets, and we show that we solve most of the classification challenges only
by training a meta-model to learn how to integrate the previous experts.
[16]
oai:arXiv.org:1509.07823 [pdf] - 1283333
Computational Intelligence Challenges and Applications on Large-Scale
Astronomical Time Series Databases
Submitted: 2015-09-25
Time-domain astronomy (TDA) is facing a paradigm shift caused by the
exponential growth of the sample size, data complexity and data generation
rates of new astronomical sky surveys. For example, the Large Synoptic Survey
Telescope (LSST), which will begin operations in northern Chile in 2022, will
generate a nearly 150 Petabyte imaging dataset of the southern hemisphere sky.
The LSST will stream data at rates of 2 Terabytes per hour, effectively
capturing an unprecedented movie of the sky. The LSST is expected not only to
improve our understanding of time-varying astrophysical objects, but also to
reveal a plethora of yet unknown faint and fast-varying phenomena. To cope with
a change of paradigm to data-driven astronomy, the fields of astroinformatics
and astrostatistics have been created recently. The new data-oriented paradigms
for astronomy combine statistics, data mining, knowledge discovery, machine
learning and computational intelligence, in order to provide the automated and
robust methods needed for the rapid detection and classification of known
astrophysical objects as well as the unsupervised characterization of novel
phenomena. In this article we present an overview of machine learning and
computational intelligence applications to TDA. Future big data challenges and
new lines of research in TDA, focusing on the LSST, are identified and
discussed from the viewpoint of computational intelligence/machine learning.
Interdisciplinary collaboration will be required to cope with the challenges
posed by the deluge of astronomical data coming from the LSST.
[17]
oai:arXiv.org:1506.00010 [pdf] - 1269336
FATS: Feature Analysis for Time Series
Submitted: 2015-05-29, last modified: 2015-08-31
In this paper, we present the FATS (Feature Analysis for Time Series)
library. FATS is a Python library which facilitates and standardizes feature
extraction for time series data. In particular, we focus on one application:
feature extraction for astronomical light curve data, although the library is
generalizable for other uses. We detail the methods and features implemented
for light curve analysis, and present examples for its usage.
[18]
oai:arXiv.org:1404.4888 [pdf] - 1085209
Supervised detection of anomalous light-curves in massive astronomical
catalogs
Submitted: 2014-04-18, last modified: 2015-05-27
The development of synoptic sky surveys has led to a massive amount of data
for which resources needed for analysis are beyond human capabilities. To
process this information and to extract all possible knowledge, machine
learning techniques become necessary. Here we present a new method to
automatically discover unknown variable objects in large astronomical catalogs.
With the aim of taking full advantage of all the information we have about
known objects, our method is based on a supervised algorithm. In particular, we
train a random forest classifier using known variability classes of objects and
obtain votes for each of the objects in the training set. We then model this
voting distribution with a Bayesian network and obtain the joint voting
distribution among the training objects. Consequently, an unknown object is
considered as an outlier insofar it has a low joint probability. Our method is
suitable for exploring massive datasets given that the training process is
performed offline. We tested our algorithm on 20 millions light-curves from the
MACHO catalog and generated a list of anomalous candidates. We divided the
candidates into two main classes of outliers: artifacts and intrinsic outliers.
Artifacts were principally due to air mass variation, seasonal variation, bad
calibration or instrumental errors and were consequently removed from our
outlier list and added to the training set. After retraining, we selected about
4000 objects, which we passed to a post analysis stage by perfoming a
cross-match with all publicly available catalogs. Within these candidates we
identified certain known but rare objects such as eclipsing Cepheids, blue
variables, cataclysmic variables and X-ray sources. For some outliers there
were no additional information. Among them we identified three unknown
variability types and few individual outliers that will be followed up for a
deeper analysis.
[19]
oai:arXiv.org:1412.1840 [pdf] - 1282187
A Novel, Fully Automated Pipeline for Period Estimation in the EROS 2
Data Set
Submitted: 2014-12-04
We present a new method to discriminate periodic from non-periodic
irregularly sampled lightcurves. We introduce a periodic kernel and maximize a
similarity measure derived from information theory to estimate the periods and
a discriminator factor. We tested the method on a dataset containing 100,000
synthetic periodic and non-periodic lightcurves with various periods,
amplitudes and shapes generated using a multivariate generative model. We
correctly identified periodic and non-periodic lightcurves with a completeness
of 90% and a precision of 95%, for lightcurves with a signal-to-noise ratio
(SNR) larger than 0.5. We characterize the efficiency and reliability of the
model using these synthetic lightcurves and applied the method on the EROS-2
dataset. A crucial consideration is the speed at which the method can be
executed. Using hierarchical search and some simplification on the parameter
search we were able to analyze 32.8 million lightcurves in 18 hours on a
cluster of GPGPUs. Using the sensitivity analysis on the synthetic dataset, we
infer that 0.42% in the LMC and 0.61% in the SMC of the sources show periodic
behavior. The training set, the catalogs and source code are all available in
http://timemachine.iic.harvard.edu.
[20]
oai:arXiv.org:1403.6131 [pdf] - 844816
The EPOCH Project: I. Periodic variable stars in the EROS-2 LMC database
Submitted: 2014-03-24, last modified: 2014-03-28
The EPOCH (EROS-2 periodic variable star classification using machine
learning) project aims to detect periodic variable stars in the EROS-2 light
curve database. In this paper, we present the first result of the
classification of periodic variable stars in the EROS-2 LMC database. To
classify these variables, we first built a training set by compiling known
variables in the Large Magellanic Cloud area from the OGLE and MACHO surveys.
We crossmatched these variables with the EROS-2 sources and extracted 22
variability features from 28 392 light curves of the corresponding EROS-2
sources. We then used the random forest method to classify the EROS-2 sources
in the training set. We designed the model to separate not only $\delta$ Scuti
stars, RR Lyraes, Cepheids, eclipsing binaries, and long-period variables, the
superclasses, but also their subclasses, such as RRab, RRc, RRd, and RRe for RR
Lyraes, and similarly for the other variable types. The model trained using
only the superclasses shows 99% recall and precision, while the model trained
on all subclasses shows 87% recall and precision. We applied the trained model
to the entire EROS-2 LMC database, which contains about 29 million sources, and
found 117 234 periodic variable candidates. Out of these 117 234 periodic
variables, 55 285 have not been discovered by either OGLE or MACHO variability
studies. This set comprises 1 906 $\delta$ Scuti stars, 6 607 RR Lyraes, 638
Cepheids, 178 Type II Cepheids, 34 562 eclipsing binaries, and 11 394
long-period variables. A catalog of these EROS-2 LMC periodic variable stars
will be available online at http://stardb.yonsei.ac.kr and at the CDS website
(http://vizier.u-strasbg.fr/viz-bin/VizieR).
[21]
oai:arXiv.org:1403.2181 [pdf] - 794631
The expansion rate of the intermediate Universe in light of Planck
Submitted: 2014-03-10
We use cosmology-independent measurements of the expansion history in the
redshift range 0.1 < z <1.2 and compare them with the Cosmic Microwave
Background-derived expansion history predictions. The motivation is to
investigate if the tension between the local (cosmology independent) Hubble
constant H0 value and the Planck-derived H0 is also present at other redshifts.
We conclude that there is no tension between Planck and cosmology
independent-measurements of the Hubble parameter H(z) at 0.1 < z < 1.2 for the
LCDM model (odds of tension are only 1:15, statistically not significant).
Considering extensions of the LCDM model does not improve these odds (actually
makes them worse), thus favouring the simpler model over its extensions. On the
other hand the H(z) data are also not in tension with the local H0 measurements
but the combination of all three data-sets shows a highly significant tension
(odds ~ 1:400). Thus the new data deepen the mystery of the mismatch between
Planck and local H0 measurements, and cannot univocally determine wether it is
an effect localised at a particular redshift. Having said this, we find that
assuming the NGC4258 maser distance as the correct anchor for H0, brings the
odds to comfortable values.
Further, using only the expansion history measurements we constrain, within
the LCDM model, H0 = 68.5 +- 3.5 and Omega_m = 0.32 +- 0.05 without relying on
any CMB prior. We also address the question of how smooth the expansion history
of the universe is given the cosmology independent data and conclude that there
is no evidence for deviations from smoothness on the expansion history, neither
variations with time in the value of the equation of state of dark energy.
[22]
oai:arXiv.org:1402.6403 [pdf] - 1203579
Pan-STARRS 1 observations of the unusual active Centaur P/2011 S1(Gibbs)
Lin, H. W.;
Chen, Y. T.;
Lacerda, P.;
Ip, W. H.;
Holman, M.;
Protopapas, P.;
Chen, W. P.;
Burgett, W. S.;
Chambers, K. C.;
Flewelling, H.;
Huber, M. E.;
Jedicke, R.;
Kaiser, N.;
Magnier, E. A.;
Metcalfe, N.;
Price, P. A.
Submitted: 2014-02-25
P/2011 S1 (Gibbs) is an outer solar system comet or active Centaur with a
similar orbit to that of the famous 29P/Schwassmann-Wachmann 1. P/2011 S1
(Gibbs) has been observed by the Pan-STARRS 1 (PS1) sky survey from 2010 to
2012. The resulting data allow us to perform multi-color studies of the nucleus
and coma of the comet. Analysis of PS1 images reveals that P/2011 S1 (Gibbs)
has a small nucleus $< 4$ km radius, with colors $g_{P1}-r_{P1} = 0.5 \pm
0.02$, $r_{P1}-i_{P1} = 0.12 \pm 0.02$ and $i_{P1}-z_{P1} = 0.46 \pm 0.03$. The
comet remained active from 2010 to 2012, with a model-dependent mass-loss rate
of $\sim100$ kg s$^{-1}$. The mass-loss rate per unit surface area of P/2011 S1
(Gibbs) is as high as that of 29P/Schwassmann-Wachmann 1, making it one of the
most active Centaurs. The mass-loss rate also varies with time from $\sim 40$
kg s$^{-1}$ to 150 kg s$^{-1}$. Due to its rather circular orbit, we propose
that P/2011 S1 (Gibbs) has 29P/Schwassmann-Wachmann 1-like outbursts that
control the outgassing rate. The results indicate that it may have a similar
surface composition to that of 29P/Schwassmann-Wachmann 1.
Our numerical simulations show that the future orbital evolution of P/2011 S1
(Gibbs) is more similar to that of the main population of Centaurs than to that
of 29P/Schwassmann-Wachmann 1. The results also demonstrate that P/2011 S1
(Gibbs) is dynamically unstable and can only remain near its current orbit for
roughly a thousand years.
[23]
oai:arXiv.org:1310.7868 [pdf] - 739149
Automatic Classification of Variable Stars in Catalogs with missing data
Submitted: 2013-10-29
We present an automatic classification method for astronomical catalogs with
missing data. We use Bayesian networks, a probabilistic graphical model, that
allows us to perform inference to pre- dict missing values given observed data
and dependency relationships between variables. To learn a Bayesian network
from incomplete data, we use an iterative algorithm that utilises sampling
methods and expectation maximization to estimate the distributions and
probabilistic dependencies of variables from data with missing values. To test
our model we use three catalogs with missing data (SAGE, 2MASS and UBVI) and
one complete catalog (MACHO). We examine how classification accuracy changes
when information from missing data catalogs is included, how our method
compares to traditional missing data approaches and at what computational cost.
Integrating these catalogs with missing data we find that classification of
variable objects improves by few percent and by 15% for quasar detection while
keeping the computational cost the same.
[24]
oai:arXiv.org:1306.6766 [pdf] - 686817
Planck and the local Universe: quantifying the tension
Submitted: 2013-06-28
We use the latest Planck constraints, and in particular constraints on the
derived parameters (Hubble constant and age of the Universe) for the local
universe and compare them with local measurements of the same quantities. We
propose a way to quantify whether cosmological parameters constraints from two
different experiments are in tension or not. Our statistic, T, is an evidence
ratio and therefore can be interpreted with the widely used Jeffrey's scale. We
find that in the framework of the LCDM model, the Planck inferred two
dimensional, joint, posterior distribution for the Hubble constant and age of
the Universe is in "strong" tension with the local measurements; the odds being
~ 1:50. We explore several possibilities for explaining this tension and
examine the consequences both in terms of unknown errors and deviations from
the LCDM model. In some one-parameter LCDM model extensions, tension is reduced
whereas in other extensions, tension is instead increased. In particular, small
total neutrino masses are favored and a total neutrino mass above 0.15 eV makes
the tension "highly significant" (odds ~ 1:150). A consequence of accepting
this interpretation of the tension is that the degenerate neutrino hierarchy is
highly disfavoured by cosmological data and the direct hierarchy is slightly
favored over the inverse.
[25]
oai:arXiv.org:1203.0970 [pdf] - 968262
Infinite Shift-invariant Grouped Multi-task Learning for Gaussian
Processes
Submitted: 2012-03-05, last modified: 2013-05-20
Multi-task learning leverages shared information among data sets to improve
the learning performance of individual tasks. The paper applies this framework
for data where each task is a phase-shifted periodic time series. In
particular, we develop a novel Bayesian nonparametric model capturing a mixture
of Gaussian processes where each task is a sum of a group-specific function and
a component capturing individual variation, in addition to each task being
phase shifted. We develop an efficient \textsc{em} algorithm to learn the
parameters of the model. As a special case we obtain the Gaussian mixture model
and \textsc{em} algorithm for phased-shifted periodic time series. Furthermore,
we extend the proposed model by using a Dirichlet Process prior and thereby
leading to an infinite mixture model that is capable of doing automatic model
selection. A Variational Bayesian approach is developed for inference in this
model. Experiments in regression, classification and class discovery
demonstrate the performance of the proposed models using both synthetic data
and real-world time series data from astrophysics. Our methods are particularly
useful when the time series are sparsely and non-synchronously sampled.
[26]
oai:arXiv.org:1304.0401 [pdf] - 646062
An improved quasar detection method in EROS-2 and MACHO LMC datasets
Submitted: 2013-04-01
We present a new classification method for quasar identification in the
EROS-2 and MACHO datasets based on a boosted version of Random Forest
classifier. We use a set of variability features including parameters of a
continuous auto regressive model. We prove that continuous auto regressive
parameters are very important discriminators in the classification process. We
create two training sets (one for EROS-2 and one for MACHO datasets) using
known quasars found in the LMC. Our model's accuracy in both EROS-2 and MACHO
training sets is about 90% precision and 86% recall, improving the state of the
art models accuracy in quasar detection. We apply the model on the complete,
including 28 million objects, EROS-2 and MACHO LMC datasets, finding 1160 and
2551 candidates respectively. To further validate our list of candidates, we
crossmatched our list with a previous 663 known strong candidates, getting 74%
of matches for MACHO and 40% in EROS-2. The main difference on matching level
is because EROS-2 is a slightly shallower survey which translates to
significantly lower signal-to-noise ratio lightcurves.
[27]
oai:arXiv.org:1303.1031 [pdf] - 1165012
Statistical Properties of Galactic {\delta} Scuti Stars: Revisited
Submitted: 2013-03-05
We present statistical characteristics of 1,578 {\delta} Scuti stars
including nearby field stars and cluster member stars within the Milky Way. We
obtained 46% of these stars (718 stars) from the works done by Rodr\'{i}guez
and collected the remaining 54% stars (860 stars) from other literatures. We
updated the entries with the latest information of sky coordinate, color,
rotational velocity, spectral type, period, amplitude and binarity. The
majority of our sample are well characterized in terms of typical period range
(0.02-0.25 days), pulsation amplitudes (<0.5 mag) and spectral types (A-F
type). Given this list of {\delta} Scuti stars, we examined relations between
their physical properties (i.e., periods, amplitudes, spectral types and
rotational velocities) for field stars and cluster members, and confirmed that
the correlations of properties are not significantly different from those
reported in the Rodr\'{i}guez's works. All the {\delta} Scuti stars are
cross-matched with several X-ray and UV catalogs, resulting in 27 X-ray and 41
UV-only counterparts. These counterparts are interesting targets for further
study because of their rarity and uniqueness in showing {\delta} Scuti-type
variability and X-ray/UV emission at the same time. The compiled catalog can be
accessed through the web interface http://stardb.yonsei.ac.kr/DeltaScuti
[28]
oai:arXiv.org:1301.6182 [pdf] - 1159293
The TAOS Project: Results From Seven Years of Survey Data
Zhang, Z. -W.;
Lehner, M. J.;
Wang, J. -H.;
Wen, C. -Y.;
Wang, S. -Y.;
King, S. -K.;
Granados, Á. P.;
Alcock, C.;
Axelrod, T.;
Bianco, F. B.;
Byun, Y. -I.;
Chen, W. P.;
Coehlo, N. K.;
Cook, K. H.;
de Pater, I.;
Kim, D. -W.;
Lee, T.;
Lissauer, J. J.;
Marshall, S. L.;
Protopapas, P.;
Rice, J. A.;
Schwamb, M. E.
Submitted: 2013-01-25
The Taiwanese-American Occultation Survey (TAOS) aims to detect serendipitous
occultations of stars by small (about 1 km diameter) objects in the Kuiper Belt
and beyond. Such events are very rare (<0.001 events per star per year) and
short in duration (about 200 ms), so many stars must be monitored at a high
readout cadence. TAOS monitors typically around 500 stars simultaneously at a 5
Hz readout cadence with four telescopes located at Lulin Observatory in central
Taiwan. In this paper, we report the results of the search for small Kuiper
Belt Objects (KBOs) in seven years of data. No occultation events were found,
resulting in a 95% c.l. upper limit on the slope of the faint end of the KBO
size distribution of q = 3.34 to 3.82, depending on the surface density at the
break in the size distribution at a diameter of about 90 km.
[29]
oai:arXiv.org:1301.3027 [pdf] - 616798
Semi-parametric Robust Event Detection for Massive Time-Domain Databases
Submitted: 2013-01-14, last modified: 2013-01-19
The detection and analysis of events within massive collections of
time-series has become an extremely important task for time-domain astronomy.
In particular, many scientific investigations (e.g. the analysis of
microlensing and other transients) begin with the detection of isolated events
in irregularly-sampled series with both non-linear trends and non-Gaussian
noise. We outline a semi-parametric, robust, parallel method for identifying
variability and isolated events at multiple scales in the presence of the above
complications. This approach harnesses the power of Bayesian modeling while
maintaining much of the speed and scalability of more ad-hoc machine learning
approaches. We also contrast this work with event detection methods from other
fields, highlighting the unique challenges posed by astronomical surveys.
Finally, we present results from the application of this method to 87.2 million
EROS-2 sources, where we have obtained a greater than 100-fold reduction in
candidates for certain types of phenomena while creating high-quality features
for subsequent analyses.
[30]
oai:arXiv.org:1212.2398 [pdf] - 903316
An Information Theoretic Algorithm for Finding Periodicities in Stellar
Light Curves
Submitted: 2012-12-11
We propose a new information theoretic metric for finding periodicities in
stellar light curves. Light curves are astronomical time series of brightness
over time, and are characterized as being noisy and unevenly sampled. The
proposed metric combines correntropy (generalized correlation) with a periodic
kernel to measure similarity among samples separated by a given period. The new
metric provides a periodogram, called Correntropy Kernelized Periodogram (CKP),
whose peaks are associated with the fundamental frequencies present in the
data. The CKP does not require any resampling, slotting or folding scheme as it
is computed directly from the available samples. CKP is the main part of a
fully-automated pipeline for periodic light curve discrimination to be used in
astronomical survey databases. We show that the CKP method outperformed the
slotted correntropy, and conventional methods used in astronomy for periodicity
discrimination and period estimation tasks, using a set of light curves drawn
from the MACHO survey. The proposed metric achieved 97.2% of true positives
with 0% of false positives at the confidence level of 99% for the periodicity
discrimination task; and 88% of hits with 11.6% of multiples and 0.4% of misses
in the period estimation task.
[31]
oai:arXiv.org:1204.3055 [pdf] - 1886350
IVOA Recommendation: Spectrum Data Model 1.1
McDowell, Jonathan;
Tody, Doug;
Budavari, Tamas;
Dolensky, Markus;
Kamp, Inga;
McCusker, Kelly;
Protopapas, Pavlos;
Rots, Arnold;
Thompson, Randy;
Valdes, Frank;
Skoda, Petr;
Rino, Bruno;
Derriere, Sebastien;
Salgado, Jesus;
Laurino, Omar;
Layer, the IVOA Data Access;
Groups, Data Model Working
Submitted: 2012-04-13
We present a data model describing the structure of spectrophotometric
datasets with spectral and temporal coordinates and associated metadata. This
data model may be used to represent spectra, time series data, segments of SED
(Spectral Energy Distributions) and other spectral or temporal associations.
[32]
oai:arXiv.org:1111.1315 [pdf] - 550465
Nonparametric Bayesian Estimation of Periodic Functions
Submitted: 2011-11-05, last modified: 2012-03-06
Many real world problems exhibit patterns that have periodic behavior. For
example, in astrophysics, periodic variable stars play a pivotal role in
understanding our universe. An important step when analyzing data from such
processes is the problem of identifying the period: estimating the period of a
periodic function based on noisy observations made at irregularly spaced time
points. This problem is still a difficult challenge despite extensive study in
different disciplines. The paper makes several contributions toward solving
this problem. First, we present a nonparametric Bayesian model for period
finding, based on Gaussian Processes (GP), that does not make strong
assumptions on the shape of the periodic function. As our experiments
demonstrate, the new model leads to significantly better results in period
estimation when the target function is non-sinusoidal. Second, we develop a new
algorithm for parameter optimization for GP which is useful when the likelihood
function is very sensitive to the setting of the hyper-parameters with numerous
local minima, as in the case of period estimation. The algorithm combines
gradient optimization with grid search and incorporates several mechanisms to
overcome the high complexity of inference with GP. Third, we develop a novel
approach for using domain knowledge, in the form of a probabilistic generative
model, and incorporate it into the period estimation algorithm. Experimental
results on astrophysics data validate our approach showing significant
improvement over the state of the art in this domain.
[33]
oai:arXiv.org:1110.5632 [pdf] - 1085137
A Refined QSO Selection Method Using Diagnostics Tests: 663 QSO
Candidates in the LMC
Submitted: 2011-10-25, last modified: 2011-12-31
We present 663 QSO candidates in the Large Magellanic Cloud (LMC) selected
using multiple diagnostics. We started with a set of 2,566 QSO candidates from
our previous work selected using time variability of the MACHO LMC lightcurves.
We then obtained additional information for the candidates by crossmatching
them with the Spitzer SAGE, the MACHO UBVI, the 2MASS, the Chandra and the XMM
catalogs. Using this information, we specified six diagnostic features based on
mid-IR colors, photometric redshifts using SED template fitting, and X-ray
luminosities in order to further discriminate high confidence QSO candidates in
the absence of spectra information. We then trained a one-class SVM (Support
Vector Machine) model using the diagnostics features of the confirmed 58 MACHO
QSOs. We applied the trained model to the original candidates and finally
selected 663 high confidence QSO candidates. Furthermore, we crossmatched these
663 QSO candidates with the newly confirmed 144 QSOs and 275 non-QSOs in the
LMC fields. On the basis of the counterpart analysis, we found that the false
positive rate is less than 1%.
[34]
oai:arXiv.org:1112.2962 [pdf] - 903304
Period Estimation in Astronomical Time Series Using Slotted Correntropy
Submitted: 2011-12-13
In this letter, we propose a method for period estimation in light curves
from periodic variable stars using correntropy. Light curves are astronomical
time series of stellar brightness over time, and are characterized as being
noisy and unevenly sampled. We propose to use slotted time lags in order to
estimate correntropy directly from irregularly sampled time series. A new
information theoretic metric is proposed for discriminating among the peaks of
the correntropy spectral density. The slotted correntropy method outperformed
slotted correlation, string length, VarTools (Lomb-Scargle periodogram and
Analysis of Variance), and SigSpec applications on a set of light curves drawn
from the MACHO survey.
[35]
oai:arXiv.org:1101.3316 [pdf] - 1051482
QSO Selection Algorithm Using Time Variability and Machine Learning:
Selection of 1,620 QSO Candidates from MACHO LMC Database
Submitted: 2011-01-17, last modified: 2011-04-19
We present a new QSO selection algorithm using a Support Vector Machine
(SVM), a supervised classification method, on a set of extracted times series
features including period, amplitude, color, and autocorrelation value. We
train a model that separates QSOs from variable stars, non-variable stars and
microlensing events using 58 known QSOs, 1,629 variable stars and 4,288
non-variables using the MAssive Compact Halo Object (MACHO) database as a
training set. To estimate the efficiency and the accuracy of the model, we
perform a cross-validation test using the training set. The test shows that the
model correctly identifies ~80% of known QSOs with a 25% false positive rate.
The majority of the false positives are Be stars.
We applied the trained model to the MACHO Large Magellanic Cloud (LMC)
dataset, which consists of 40 million lightcurves, and found 1,620 QSO
candidates. During the selection none of the 33,242 known MACHO variables were
misclassified as QSO candidates. In order to estimate the true false positive
rate, we crossmatched the candidates with astronomical catalogs including the
Spitzer Surveying the Agents of a Galaxy's Evolution (SAGE) LMC catalog and a
few X-ray catalogs. The results further suggest that the majority of the
candidates, more than 70%, are QSOs.
[36]
oai:arXiv.org:1008.2209 [pdf] - 1034253
Trans-Neptunian Objects with Hubble Space Telescope ACS/WFC
Submitted: 2010-08-12
We introduce a novel search technique that can identify trans-neptunian
objects in three to five exposures of a pointing within a single Hubble Space
Telescope orbit. The process is fast enough to allow the discovery of
candidates soon after the data are available. This allows sufficient time to
schedule follow up observations with HST within a month. We report the
discovery of 14 slow-moving objects found within 5\circ of the ecliptic in
archival data taken with the Wide Field Channel of the Advanced Camera for
Surveys. The luminosity function of these objects is consistent with previous
ground-based and space-based results. We show evidence that the size
distribution of both high and low inclination populations is similar for
objects smaller than 100 km, as expected from collisional evolution models,
while their size distribution differ for brighter objects. We suggest the two
populations formed in different parts of the protoplanetary disk and after
being dynamically mixed have collisionally evolved together. Among the objects
discovered there is an equal mass binary with an angular separation ~ 0."53.
[37]
oai:arXiv.org:1003.2526 [pdf] - 1025677
The TAOS Project Stellar Variability II. Detection of 15 Variable Stars
Mondal, S.;
Lin, C. C.;
Chen, W. P.;
Zhang, Z. -W.;
Alcock, C.;
Axelrod, T.;
Bianco, F. B.;
Byun, Y. -I.;
Coehlo, N. K.;
Cook, K. H.;
Dave, R.;
Kim, D. -W.;
King, S. -K.;
Lee, T.;
Lehner, M. J.;
Lin, H. -C.;
Marshal, S. L.;
Protopapas, P.;
Rice, J. A.;
Schwamb, M. E.;
Wang, J. -H.;
Wang, S. -Y.;
Wen, C. -Y.
Submitted: 2010-03-12
The Taiwanese-American Occultation Survey (TAOS) project has collected more
than a billion photometric measurements since 2005 January. These sky survey
data-covering timescales from a fraction of a second to a few hundred days-are
a useful source to study stellar variability. A total of 167 star fields,
mostly along the ecliptic plane, have been selected for photometric monitoring
with the TAOS telescopes. This paper presents our initial analysis of a search
for periodic variable stars from the time-series TAOS data on one particular
TAOS field, No. 151 (RA = 17$^{\rm h}30^{\rm m}6\fs$67, Dec = 27\degr17\arcmin
30\arcsec, J2000), which had been observed over 47 epochs in 2005. A total of
81 candidate variables are identified in the 3 square degree field, with
magnitudes in the range 8 < R < 16. On the basis of the periodicity and shape
of the lightcurves, 29 variables, 15 of which were previously unknown, are
classified as RR Lyrae, Cepheid, delta Scuti, SX Phonencis, semi-regular and
eclipsing binaries.
[38]
oai:arXiv.org:1002.3626 [pdf] - 1025278
The TAOS Project: Statistical Analysis of Multi-Telescope Time Series
Data
Lehner, M. J.;
Coehlo, N. K.;
Zhang, Z. -W.;
Bianco, F. B.;
Wang, J. -H.;
Rice, J. A.;
Protopapas, P.;
Alcock, C.;
Axelrod, T.;
Byun, Y. -I.;
Chen, W. P.;
Cook, K. H.;
de Pater, I.;
Kim, D. -W.;
King, S. -K.;
Lee, T.;
Marshall, S. L.;
Schwamb, M. E.;
Wang, S. -Y.;
Wen, C. -Y.
Submitted: 2010-02-18
The Taiwanese-American Occultation Survey (TAOS) monitors fields of up to
~1000 stars at 5 Hz simultaneously with four small telescopes to detect
occultation events from small (~1 km) Kuiper Belt Objects (KBOs). The survey
presents a number of challenges, in particular the fact that the occultation
events we are searching for are extremely rare and are typically manifested as
slight flux drops for only one or two consecutive time series measurements. We
have developed a statistical analysis technique to search the multi-telescope
data set for simultaneous flux drops which provides a robust false positive
rejection and calculation of event significance. In this paper, we describe in
detail this statistical technique and its application to the TAOS data set.
[39]
oai:arXiv.org:1001.2006 [pdf] - 430445
The TAOS Project: Upper Bounds on the Population of Small KBOs and Tests
of Models of Formation and Evolution of the Outer Solar System
Bianco, F. B.;
Zhang, Z. -W.;
Lehner, M. J.;
Mondal, S.;
King, S. -K.;
Giammarco, J.;
Holman, M. J.;
Coehlo, N. K.;
Wang, J. -H.;
Alcock, C.;
Axelrod, T.;
Byun, Y. -I.;
Chen, W. P.;
Cook, K. H.;
Dave, R.;
de Pater, I.;
Kim, D. -W.;
Lee, T.;
Lin, H. -C.;
Lissauer, J. J.;
Marshall, S. L.;
Protopapas, P.;
Rice, J. A.;
Schwamb, M. E.;
Wang, S. -Y.;
Wen, C. -Y.
Submitted: 2010-01-12, last modified: 2010-01-15
We have analyzed the first 3.75 years of data from TAOS, the Taiwanese
American Occultation Survey. TAOS monitors bright stars to search for
occultations by Kuiper Belt Objects (KBOs). This dataset comprises 5e5
star-hours of multi-telescope photometric data taken at 4 or 5 Hz. No events
consistent with KBO occultations were found in this dataset. We compute the
number of events expected for the Kuiper Belt formation and evolution models of
Pan & Sari (2005), Kenyon & Bromley (2004), Benavidez & Campo Bagatin (2009),
and Fraser (2009). A comparison with the upper limits we derive from our data
constrains the parameter space of these models. This is the first detailed
comparison of models of the KBO size distribution with data from an occultation
survey. Our results suggest that the KBO population is comprised of objects
with low internal strength and that planetary migration played a role in the
shaping of the size distribution.
[40]
oai:arXiv.org:0912.1791 [pdf] - 31552
The TAOS Project Stellar Variability I. Detection of Low-Amplitude delta
Scuti Stars
Kim, D. -W.;
Protopapas, P.;
Alcock, C.;
Byun, Y. -I.;
Kyeong, J.;
Lee, B. -C.;
Wright, N. J.;
Axelrod, T.;
Bianco, F. B.;
Chen, W. -P.;
Coehlo, N. K.;
Cook, K. H.;
Dave, R.;
King, S. -K.;
Lee, T.;
Lehner, M. J.;
Lin, H. -C.;
Marshall, S. L.;
Porrata, R.;
Rice, J. A.;
Schwamb, M. E.;
Wang, J. -H.;
Wang, S. -Y.;
Wen, C. -Y.;
Zhang, Z. -W.
Submitted: 2009-12-09, last modified: 2009-12-10
We analyzed data accumulated during 2005 and 2006 by the Taiwan-American
Occultation Survey (TAOS) in order to detect short-period variable stars
(periods of <~ 1 hour) such as delta Scuti. TAOS is designed for the detection
of stellar occultation by small-size Kuiper Belt Objects (KBOs) and is
operating four 50cm telescopes at an effective cadence of 5Hz. The four
telescopes simultaneously monitor the same patch of the sky in order to reduce
false positives. To detect short-period variables, we used the Fast Fourier
Transform algorithm (FFT) inasmuch as the data points in TAOS light-curves are
evenly spaced. Using FFT, we found 41 short-period variables with amplitudes
smaller than a few hundredths of a magnitude and periods of about an hour,
which suggest that they are low-amplitude delta Scuti stars (LADS). The
light-curves of TAOS delta Scuti stars are accessible online at the Time Series
Center website (http://timemachine.iic.harvard.edu)
[41]
oai:arXiv.org:0910.5598 [pdf] - 1018138
Searching for sub-kilometer TNOs using Pan-STARRS video mode
lightcurves: Preliminary study and evaluation using engineering data
Submitted: 2009-10-29, last modified: 2009-11-04
We present a pre-survey study of using Pan-STARRS high sampling rate video
mode guide star images to search for TNOs. With suitable selection of the guide
stars within the Pan-STARRS 7 deg^{2} field of view, the lightcurves of these
guide stars can also be used to search for occultations by TNOs. The best
target stars for this purpose are stars with high signal-to-noise ratio (SNR)
and small angular size.In order to do this, we compiled a catalog using the SNR
calculated from stars with m_V <13 mag in the Tycho2 catalog then cross matched
these stars with the 2MASS catalog and estimated their angular sizes from (V-K)
color. We also outlined a new detection method based on matched filter that is
optimized to search for diffraction patterns in the lightcurves due to
occultation by sub-kilometer TNOs. A detection threshold is set to compromise
between real detections and false positives. Depending on the theoretical size
distribution model used, we expect to find up to a hundred events during the
three-year life time of the Pan-STARRS-1 project. We have tested the detection
algorithm and the pipeline on a set of engineering data (taken at 10Hz in stead
of 30Hz). No events were found within the engineering data, which is consistent
with the small size of the data set and the theoretical models. Meanwhile, with
a total of ~ 22 star-hours video mode data (|\beta| < 10^{\circ}), we are able
to set an upper limit of N(>0.5 km) ~ 2.47x10^10 deg^-2 at 95% confidence
limit.
[42]
oai:arXiv.org:0910.5282 [pdf] - 1018120
Upper Limits on the Number of Small Bodies in Sedna-Like Orbits by the
TAOS Project
Wang, J. -H.;
Lehner, M. J.;
Zhang, Z. -W.;
Bianco, F. B.;
Alcock, C.;
Chen, W. -P.;
Axelrod, T.;
Byun, Y. -I.;
Coehlo, N. K.;
Cook, K. H.;
Dave, R.;
de Pater, I.;
Porrata, R.;
Kim, D. -W.;
King, S. -K.;
Lee, T.;
Lin, H. -C.;
Lissauer, J. J.;
Marshall, S. L.;
Protopapas, P.;
Rice, J. A.;
Schwamb, M. E.;
Wang, S. -Y.;
Wen, C. -Y.
Submitted: 2009-10-27
We present the results of a search for occultation events by objects at
distances between 100 and 1000 AU in lightcurves from the Taiwanese-American
Occultation Survey (TAOS). We searched for consecutive, shallow flux reductions
in the stellar lightcurves obtained by our survey between 7 February 2005 and
31 December 2006 with a total of $\sim4.5\times10^{9}$ three-telescope
simultaneous photometric measurements. No events were detected, allowing us to
set upper limits on the number density as a function of size and distance of
objects in Sedna-like orbits, using simple models.
[43]
oai:arXiv.org:0905.3428 [pdf] - 24487
Finding Anomalous Periodic Time Series: An Application to Catalogs of
Periodic Variable Stars
Submitted: 2009-05-20
Catalogs of periodic variable stars contain large numbers of periodic
light-curves (photometric time series data from the astrophysics domain).
Separating anomalous objects from well-known classes is an important step
towards the discovery of new classes of astronomical objects. Most anomaly
detection methods for time series data assume either a single continuous time
series or a set of time series whose periods are aligned. Light-curve data
precludes the use of these methods as the periods of any given pair of
light-curves may be out of sync. One may use an existing anomaly detection
method if, prior to similarity calculation, one performs the costly act of
aligning two light-curves, an operation that scales poorly to massive data
sets. This paper presents PCAD, an unsupervised anomaly detection method for
large sets of unsynchronized periodic time-series data, that outputs a ranked
list of both global and local anomalies. It calculates its anomaly score for
each light-curve in relation to a set of centroids produced by a modified
k-means clustering algorithm. Our method is able to scale to large data sets
through the use of sampling. We validate our method on both light-curve data
and other time series data sets. We demonstrate its effectiveness at finding
known anomalies, and discuss the effect of sample size and number of centroids
on our results. We compare our method to naive solutions and existing time
series anomaly detection methods for unphased data, and show that PCAD's
reported anomalies are comparable to or better than all other methods. Finally,
astrophysicists on our team have verified that PCAD finds true anomalies that
might be indicative of novel astrophysical phenomena.
[44]
oai:arXiv.org:0812.1010 [pdf] - 19177
De-Trending Time Series for Astronomical Variability Surveys
Submitted: 2008-12-04, last modified: 2009-04-13
We present a de-trending algorithm for the removal of trends in time series.
Trends in time series could be caused by various systematic and random noise
sources such as cloud passages, changes of airmass, telescope vibration or CCD
noise. Those trends undermine the intrinsic signals of stars and should be
removed. We determine the trends from subsets of stars that are highly
correlated among themselves. These subsets are selected based on a hierarchical
tree clustering algorithm. A bottom-up merging algorithm based on the departure
from normal distribution in the correlation is developed to identify subsets,
which we call clusters. After identification of clusters, we determine a trend
per cluster by weighted sum of normalized light-curves. We then use quadratic
programming to de-trend all individual light-curves based on these determined
trends. Experimental results with synthetic light-curves containing artificial
trends and events are presented. Results from other de-trending methods are
also compared. The developed algorithm can be applied to time series for trend
removal in both narrow and wide field astronomy.
[45]
oai:arXiv.org:0904.0645 [pdf] - 315790
A Bayesian approach to the analysis of time symmetry in light curves:
Reconsidering Scorpius X-1 occultations
Submitted: 2009-04-04
We present a new approach to the analysis of time symmetry in light curves,
such as those in the x-ray at the center of the Scorpius X-1 occultation
debate. Our method uses a new parameterization for such events (the bilogistic
event profile) and provides a clear, physically relevant characterization of
each event's key features. We also demonstrate a Markov Chain Monte Carlo
algorithm to carry out this analysis, including a novel independence chain
configuration for the estimation of each event's location in the light curve.
These tools are applied to the Scorpius X-1 light curves presented in Chang et
al. (2007), providing additional evidence based on the time series that the
events detected thus far are most likely not occultations by TNOs.
[46]
oai:arXiv.org:0903.3036 [pdf] - 430442
A Search for Occultations of Bright Stars by Small Kuiper Belt Objects
using Megacam on the MMT
Submitted: 2009-03-18, last modified: 2009-03-20
We conducted a search for occultations of bright stars by Kuiper Belt Objects
(KBOs) to estimate the density of sub-km KBOs in the sky. We report here the
first results of this occultation survey of the outer solar system conducted in
June 2007 and June/July 2008 at the MMT Observatory using Megacam, the large
MMT optical imager. We used Megacam in a novel shutterless continuous--readout
mode to achieve high precision photometry at 200 Hz. We present an analysis of
220 star hours at signal-to-noise ratio of 25 or greater. The survey efficiency
is greater than 10% for occultations by KBOs of diameter d>=0.7 km, and we
report no detections in our dataset. We set a new 95% confidence level upper
limit for the surface density \Sigma_N(d) of KBOs larger than 1 km:
\Sigma_N(d>=1 km) <= 2.0e8 deg^-2, and for KBOs larger than 0.7 km \Sigma_N(d>=
0.7 km) <= 4.8e8 deg^-2.
[47]
oai:arXiv.org:0902.1160 [pdf] - 250696
Reverberation in the UV-Optical Continuum Brightness Fluctuations of
MACHO Quasar 13.5962.237
Submitted: 2009-02-06
We examine the nature of brightness fluctuations in the UV-Optical spectral
region of an ordinary quasar with 881 optical brightness measurements made
during the epoch 1993 - 1999. We find evidence for systematic trends having the
character of a pattern of reverberations following an initial disturbance. The
initial pulses have brightness increases of order 20% and pulse widths of 50
days, and the reverberations have typical amplitudes of 12% with longer mean
pulse widths of order 80 days and pulse separations of order 90 days. The
repeat pattern occurs over the same time scales whether the initial disturbance
is a brightening or fading. The lags of the pulse trains are comparable to the
lags seen previously in reverberation of the broad blue-shifted emission lines
following brightness disturbances in Seyfert galaxies, when allowance is made
for the mass of the central object. In addition to the burst pulse trains, we
find evidence for a semi-periodicity with a time scale of 2 years. These strong
patterns of brightness fluctuations suggest a method of discovering quasars
from photometric monitoring alone, with data of the quality expected from large
brightness monitoring programs like Pann-Stars and LSST.
[48]
oai:arXiv.org:0901.3329 [pdf] - 20559
Event Discovery in Time Series
Submitted: 2009-01-21
The discovery of events in time series can have important implications, such
as identifying microlensing events in astronomical surveys, or changes in a
patient's electrocardiogram. Current methods for identifying events require a
sliding window of a fixed size, which is not ideal for all applications and
could overlook important events. In this work, we develop probability models
for calculating the significance of an arbitrary-sized sliding window and use
these probabilities to find areas of significance. Because a brute force search
of all sliding windows and all window sizes would be computationally
intractable, we introduce a method for quickly approximating the results. We
apply our method to over 100,000 astronomical time series from the MACHO
survey, in which 56 different sections of the sky are considered, each with one
or more known events. Our method was able to recover 100% of these events in
the top 1% of the results, essentially pruning 99% of the data. Interestingly,
our method was able to identify events that do not pass traditional event
discovery procedures.
[49]
oai:arXiv.org:0808.2051 [pdf] - 15406
First Results From The Taiwanese-American Occultation Survey (TAOS)
Zhang, Z. -W.;
Bianco, F. B.;
Lehner, M. J.;
Coehlo, N. K.;
Wang, J. -H.;
Mondal, S.;
Alcock, C.;
Axelrod, T.;
Byun, Y. -I.;
Chen, W. -P.;
Cook, K. H.;
Dave, R.;
de Pater, I.;
Porrata, R.;
Kim, D. -W.;
King, S. -K.;
Lee, T.;
Lin, H. -C.;
Lissauer, J. J.;
Marshall, S. L.;
Protopapas, P.;
Rice, J. A.;
Schwamb, M. E.;
Wang, S. -Y.;
Wen, C. -Y.
Submitted: 2008-08-14
Results from the first two years of data from the Taiwanese-American
Occultation Survey (TAOS) are presented. Stars have been monitored
photometrically at 4 Hz or 5 Hz to search for occultations by small (~3 km)
Kuiper Belt Objects (KBOs). No statistically significant events were found,
allowing us to present an upper bound to the size distribution of KBOs with
diameters 0.5 km < D < 28 km.
[50]
oai:arXiv.org:0711.1617 [pdf] - 253851
Eclipsing binary stars in the Large and Small Magellanic Clouds from the
MACHO project: The Sample
Submitted: 2007-11-10
We present a new sample of 4634 eclipsing binary stars in the Large
Magellanic Cloud (LMC), expanding on a previous sample of 611 objects and a new
sample of 1509 eclipsing binary stars in the Small Magellanic Cloud (SMC), that
were identified in the light curve database of the MACHO project. We perform a
cross correlation with the OGLE-II LMC sample, finding 1236 matches. A cross
correlation with the OGLE-II SMC sample finds 698 matches. We then compare the
LMC subsamples corresponding to center and the periphery of the LMC and find
only minor differences between the two populations. These samples are
sufficiently large and complete that statistical studies of the binary star
populations are possible.
[51]
oai:arXiv.org:astro-ph/0509103 [pdf] - 75685
Combined reconstruction of weak and strong lensing data with WSLAP
Submitted: 2005-09-05
We describe a method to estimate the mass distribution of a gravitational
lens and the position of the sources from combined strong and weak lensing
data. The algorithm combines weak and strong lensing data in a unified way
producing a solution which is valid in both the weak and strong lensing
regimes. We study how the result depends on the relative weighting of the weak
and strong lensing data and on choice of basis to represent the mass
distribution. We find that combining weak and strong lensing information has
two major advantages: it eliminates the need for priors and/or regularization
schemes for the intrinsic size of the background galaxies (this assumption was
needed in previous strong lensing algorithms) and it corrects for biases in the
recovered mass in the outer regions where the strong lensing data is less
sensitive. The code is implemented into a software package called WSLAP (Weak &
Strong Lensing Analysis Package) which is publicly available at
http://darwin.cfa.harvard.edu/SLAP/
[52]
oai:arXiv.org:astro-ph/0505495 [pdf] - 73265
Finding outlier light-curves in catalogs of periodic variable stars
Submitted: 2005-05-24
We present a methodology to discover outliers in catalogs of periodic
light-curves. We use cross-correlation as measure of ``similarity'' between two
individual light-curves and then classify light-curves with lowest average
``similarity'' as outliers. We performed the analysis on catalogs of variable
stars of known type from the MACHO and OGLE projects and established that our
method correctly identifies light-curves that do not belong to those catalogs
as outliers. We show how our method can scale to large datasets that will be
available in the near future such as those anticipated from Pan-STARRS and
LSST.
[53]
oai:arXiv.org:astro-ph/0408418 [pdf] - 66909
Non-parametric inversion of strong lensing systems
Submitted: 2004-08-24, last modified: 2005-03-14
We revisit the issue of non-parametric gravitational lens reconstruction and
present a new method to obtain the cluster mass distribution using strong
lensing data without using any prior information on the underlying mass. The
method relies on the decomposition of the lens plane into individual cells. We
show how the problem in this approximation can be expressed as a system of
linear equations for which a solution can be found. Moreover, we propose to
include information about the null space. That is, make use of the pixels where
we know there are no arcs above the sky noise. The only prior information is an
estimation of the physical size of the sources. No priors on the luminosity of
the cluster or shape of the halos are needed thus making the results very
robust. In order to test the accuracy and bias of the method we make use of
simulated strong lensing data. We find that the method reproduces accurately
both the lens mass and source positions and provide error estimates.
[54]
oai:arXiv.org:astro-ph/0502301 [pdf] - 71092
Fast identification of transits from light-curves
Submitted: 2005-02-15
We present an algorithm that allows fast and efficient detection of transits,
including planetary transits, from light-curves. The method is based on
building an ensemble of fiducial models and compressing the data using the
MOPED algorithm. We describe the method and demonstrate its efficiency by
finding planet-like transits in simulated Pan-STARRS light-curves. We show that
that our method is independent of the size of the search space of transit
parameters. In large sets of light-curves, we achieve speed up factors of order
of $10^{8}$ times over the full $\chi2$ search. We discuss how the algorithm
can be used in forthcoming large surveys like Pan-STARRS and LSST and how it
may be optimized for future space missions like Kepler and COROT where most of
the processing must be done on board.
[55]
oai:arXiv.org:astro-ph/0412191 [pdf] - 69595
Non-parametric mass reconstruction of A1689 from strong lensing data
with SLAP
Submitted: 2004-12-08
We present the mass distribution in the central area of the cluster A1689 by
fitting over 100 multiply lensed images with the non-parametric Strong Lensing
Analysis Package (SLAP, Diego et al. 2004). The surface mass distribution is
obtained in a robust way finding a total mass of 0.25E15 M_sun/h within a 70''
circle radius from the central peak. Our reconstructed density profile fits
well an NFW profile with small perturbations due to substructure and is
compatible with the more model dependent analysis of Broadhurst et al. (2004a)
based on the same data. Our estimated mass does not rely on any prior
information about the distribution of dark matter in the cluster. The peak of
the mass distribution falls very close to the central cD and there is
substructure near the center suggesting that the cluster is not fully relaxed.
We also examine the effect on the recovered mass when we include the
uncertainties in the redshift of the sources and in the original shape of the
sources. Using simulations designed to mimic the data, we identify some biases
in our reconstructed mass distribution. We find that the recovered mass is
biased toward lower masses beyond 1 arcmin (150 kpc) from the central cD and
that in the very center we may be affected by degeneracy problems. On the other
hand, we confirm that the reconstructed mass between 25'' and 70'' is a robust,
unbiased estimate of the true mass distribution and is compatible with an NFW
profile.