Full-text search for arXiv

55 article(s) in total. 182 co-authors, from 1 to 17 common article(s). Median position in authors list is 3,0.

[1] oai:arXiv.org:2002.00994 [pdf] - 2046524

Scalable End-to-end Recurrent Neural Network for Variable star classification

Becker, Ignacio; Pichara, Karim; Catelan, Márcio; Protopapas, Pavlos; Aguirre, Carlos; Nikzat, Fatemeh

Comments: 15 pages, 17 figures. To be published in MNRAS

Submitted: 2020-02-03

During the last decade, considerable effort has been made to perform automatic classification of variable stars using machine learning techniques. Traditionally, light curves are represented as a vector of descriptors or features used as input for many algorithms. Some features are computationally expensive, cannot be updated quickly and hence for large datasets such as the LSST cannot be applied. Previous work has been done to develop alternative unsupervised feature extraction algorithms for light curves, but the cost of doing so still remains high. In this work, we propose an end-to-end algorithm that automatically learns the representation of light curves that allows an accurate automatic classification. We study a series of deep learning architectures based on Recurrent Neural Networks and test them in automated classification scenarios. Our method uses minimal data preprocessing, can be updated with a low computational cost for new observations and light curves, and can scale up to massive datasets. We transform each light curve into an input matrix representation whose elements are the differences in time and magnitude, and the outputs are classification probabilities. We test our method in three surveys: OGLE-III, Gaia and WISE. We obtain accuracies of about $95\%$ in the main classes and $75\%$ in the majority of subclasses. We compare our results with the Random Forest classifier and obtain competitive accuracies while being faster and scalable. The analysis shows that the computational complexity of our approach grows up linearly with the light curve size, while the traditional approach cost grows as $N\log{(N)}$.

[2] oai:arXiv.org:1912.02235 [pdf] - 2026504

Streaming Classification of Variable Stars

Zorich, Lukas; Pichara, Karim; Protopapas, Pavlos

Comments:

Submitted: 2019-12-04

In the last years, automatic classification of variable stars has received substantial attention. Using machine learning techniques for this task has proven to be quite useful. Typically, machine learning classifiers used for this task require to have a fixed training set, and the training process is performed offline. Upcoming surveys such as the Large Synoptic Survey Telescope (LSST) will generate new observations daily, where an automatic classification system able to create alerts online will be mandatory. A system with those characteristics must be able to update itself incrementally. Unfortunately, after training, most machine learning classifiers do not support the inclusion of new observations in light curves, they need to re-train from scratch. Naively re-training from scratch is not an option in streaming settings, mainly because of the expensive pre-processing routines required to obtain a vector representation of light curves (features) each time we include new observations. In this work, we propose a streaming probabilistic classification model; it uses a set of newly designed features that work incrementally. With this model, we can have a machine learning classifier that updates itself in real time with new observations. To test our approach, we simulate a streaming scenario with light curves from CoRot, OGLE and MACHO catalogs. Results show that our model achieves high classification performance, staying an order of magnitude faster than traditional classification approaches.

[3] oai:arXiv.org:1911.02444 [pdf] - 2026263

An Information Theory Approach on Deciding Spectroscopic Follow Ups

Astudillo, Javiera; Protopapas, Pavlos; Pichara, Karim; Huijse, Pablo

Comments:

Submitted: 2019-11-06

Classification and characterization of variable phenomena and transient phenomena are critical for astrophysics and cosmology. These objects are commonly studied using photometric time series or spectroscopic data. Given that many ongoing and future surveys are in time-domain and given that adding spectra provide further insights but requires more observational resources, it would be valuable to know which objects should we prioritize to have spectrum in addition to time series. We propose a methodology in a probabilistic setting that determines a-priory which objects are worth taking spectrum to obtain better insights, where we focus 'insight' as the type of the object (classification). Objects for which we query its spectrum are reclassified using their full spectrum information. We first train two classifiers, one that uses photometric data and another that uses photometric and spectroscopic data together. Then for each photometric object we estimate the probability of each possible spectrum outcome. We combine these models in various probabilistic frameworks (strategies) which are used to guide the selection of follow up observations. The best strategy depends on the intended use, whether it is getting more confidence or accuracy. For a given number of candidate objects (127, equal to 5% of the dataset) for taking spectra, we improve 37% class prediction accuracy as opposed to 20% of a non-naive (non-random) best base-line strategy. Our approach provides a general framework for follow-up strategies and can be extended beyond classification and to include other forms of follow-ups beyond spectroscopy.

[4] oai:arXiv.org:1903.03254 [pdf] - 1846983

An Algorithm for the Visualization of Relevant Patterns in Astronomical Light Curves

Pieringer, Christian; Pichara, Karim; Catelán, Márcio; Protopapas, Pavlos

Comments: Accepted 2019 January 8. Received 2019 January 8; in original form 2018 January 29. 7 pages, 6 figures

Submitted: 2019-03-07

Within the last years, the classification of variable stars with Machine Learning has become a mainstream area of research. Recently, visualization of time series is attracting more attention in data science as a tool to visually help scientists to recognize significant patterns in complex dynamics. Within the Machine Learning literature, dictionary-based methods have been widely used to encode relevant parts of image data. These methods intrinsically assign a degree of importance to patches in pictures, according to their contribution in the image reconstruction. Inspired by dictionary-based techniques, we present an approach that naturally provides the visualization of salient parts in astronomical light curves, making the analogy between image patches and relevant pieces in time series. Our approach encodes the most meaningful patterns such that we can approximately reconstruct light curves by just using the encoded information. We test our method in light curves from the OGLE-III and StarLight databases. Our results show that the proposed model delivers an automatic and intuitive visualization of relevant light curve parts, such as local peaks and drops in magnitude.

[5] oai:arXiv.org:1807.03869 [pdf] - 1957097

Deep Learning for Image Sequence Classification of Astronomical Events

Carrasco-Davis, Rodrigo; Cabrera-Vives, Guillermo; Förster, Francisco; Estévez, Pablo A.; Huijse, Pablo; Protopapas, Pavlos; Reyes, Ignacio; Martínez-Palomera, Jorge; Donoso, Cristóbal

Comments: 20 pages, 20 figures (corrected compilation errors). This is an Accepted Manuscript version of an article accepted for publication in Publications of the Astronomical Society of the Pacific. Nether the Astronomical Society of the Pacific nor IOP Publishing Ltd is responsible for any errors or omissions in this version of the manuscript or any version derived from it

Submitted: 2018-07-10, last modified: 2018-11-07

We propose a new sequential classification model for astronomical objects based on a recurrent convolutional neural network (RCNN) which uses sequences of images as inputs. This approach avoids the computation of light curves or difference images. This is the first time that sequences of images are used directly for the classification of variable objects in astronomy. The second contribution of this work is the image simulation process. We generate synthetic image sequences that take into account the instrumental and observing conditions, obtaining a realistic, set of movies for each astronomical object. The simulated dataset is used to train our RCNN classifier. This approach allows us to generate datasets to train and test our RCNN model for different astronomical surveys and telescopes. We aim at building a simulated dataset whose distribution is close enough to the real dataset, so that a fine tuning could match the distributions between real and simulated dataset. To test the RCNN classifier trained with the synthetic dataset, we used real-world data from the High cadence Transient Survey (HiTS) obtaining an average recall of 85%, improved to 94% after performing fine tuning with 10 real samples per class. We compare the results of our model with those of a light curve random forest classifier. The proposed RCNN with fine tuning has a similar performance on the HiTS dataset compared to the light curve classifier, trained on an augmented training set with 10 real samples per class. The RCNN approach presents several advantages in an alert stream classification scenario, such as a reduction of the data pre-processing, faster online evaluation and easier performance improvement using a few real data samples. These results encourage us to use this method for alert brokers systems that will process alert streams generated by new telescopes such as the Large Synoptic Survey Telescope.

[6] oai:arXiv.org:1810.07857 [pdf] - 1953345

Multiband galaxy morphologies for CLASH: a convolutional neural network transferred from CANDELS

Pérez-Carrasco, Manuel; Cabrera-Vives, Guillermo; Martinez-Marín, Monserrat; Cerulo, Pierluigi; Demarco, Ricardo; Protopapas, Pavlos; Godoy, Julio; Huertas-Company, Marc

Comments: 11 pages, 11 figures, submitted to Publications of the Astronomical Society of the Pacific

Submitted: 2018-10-17

We present visual-like morphologies over 16 photometric bands, from ultra-violet to near infrared, for 8,412 galaxies in the Cluster Lensing And Supernova survey with Hubble (CLASH) obtained by a convolutional neural network (CNN) model. Our model follows the CANDELS main morphological classification scheme, obtaining the probability for each galaxy at each CLASH band of being spheroid, disk, irregular, point source, or unclassifiable. Our catalog contains morphologies for each galaxy with Hmag < 24.5 in every filter where the galaxy is observed. We trained an initial CNN model using approximately 7,500 expert eyeball labels from The Cosmic Assembly Near-IR Deep Extragalactic Legacy Survey (CANDELS). We created eyeball labels for 100 randomly selected galaxies per each of the 16-filters set of CLASH (1,600 galaxy images in total), where each image was classified by at least five of us. We use these labels to fine-tune the network in order to accurately predict labels for the CLASH data and to evaluate the performance of our model. We achieve a root-mean-square error of 0.0991 on the test set. We show that our proposed fine-tuning technique reduces the number of labeled images needed for training, as compared to directly training over the CLASH data, and achieves a better performance. This approach is very useful to minimize eyeball labeling efforts when classifying unlabeled data from new surveys. This will become particularly useful for massive datasets such as the ones coming from near future surveys such as EUCLID or the LSST. Our catalog consists of prediction of probabilities for each galaxy by morphology in their different bands and is made publicly available at http://www.inf.udec.cl/~guille/data/Deep-CLASH.csv.

[7] oai:arXiv.org:1809.00763 [pdf] - 1767631

The High Cadence Transient Survey (HITS): Compilation and characterization of light-curve catalogs

Martínez-Palomera, Jorge; Förster, Francisco; Protopapas, Pavlos; Maureira, Juan Carlos; Lira, Paulina; Cabrera-Vives, Guillermo; Huijse, Pablo; Galbany, Lluis; de Jaeger, Thomas; González-Gaitán, Santiago; Medina, Gustavo; Pignata, Giuliano; Martín, Jaime San; Hamuy, Mario; Muñoz, Ricardo R.

Comments: 22 pages including 10 figures and 9 tables. Accepted for publication in AJ. For associated files, see http://astro.cmm.uchile.cl/HiTS/

Submitted: 2018-09-03, last modified: 2018-09-07

The High Cadence Transient Survey (HiTS) aims to discover and study transient objects with characteristic timescales between hours and days, such as pulsating, eclipsing and exploding stars. This survey represents a unique laboratory to explore large etendue observations from cadences of about 0.1 days and to test new computational tools for the analysis of large data. This work follows a fully \textit{Data Science} approach: from the raw data to the analysis and classification of variable sources. We compile a catalog of ${\sim}15$ million object detections and a catalog of ${\sim}2.5$ million light-curves classified by variability. The typical depth of the survey is $24.2$, $24.3$, $24.1$ and $23.8$ in $u$, $g$, $r$ and $i$ bands, respectively. We classified all point-like non-moving sources by first extracting features from their light-curves and then applying a Random Forest classifier. For the classification, we used a training set constructed using a combination of cross-matched catalogs, visual inspection, transfer/active learning and data augmentation. The classification model consists of several Random Forest classifiers organized in a hierarchical scheme. The classifier accuracy estimated on a test set is approximately $97\%$. In the unlabeled data, $3\,485$ sources were classified as variables, of which $1\,321$ were classified as periodic. Among the periodic classes we discovered with high confidence, 1 $\delta$-scutti, 39 eclipsing binaries, 48 rotational variables and 90 RR-Lyrae and for the non-periodic classes we discovered 1 cataclysmic variables, 630 QSO, and 1 supernova candidates. The first data release can be accessed in the project archive of HiTS.

[8] oai:arXiv.org:1803.10779 [pdf] - 1743672

Unraveling the Spectral Energy Distributions of Clustered YSOs

Martínez-Galarza, Juan R.; Protopapas, Pavlos; Smith, Howard A.; Morales, Esteban F. E.

Comments: 31 pages, 11 figures. Submitted to ApJ. Comments are welcome

Submitted: 2018-03-28

Stars form in clustered environments, but how they form when the available resources are shared is still not well understood. A related question is whether the IMF is in fact universal across galactic environments, a galactic initial mass function (IGIMF), or whether it is an average of local IMFs. One of the long-standing problems in resolving this question and in the study of young clusters is observational: the emission from multiple sources is frequently seen as blended because at different wavelengths or with different telescopes the beam sizes are different. The confusion hinders our ability to fully characterize clustered star formation. Here we present a new method that uses a genetic algorithm and Bayesian inference to fit the blended SEDs and images of individual YSOs in confused clusters. We apply this method to the infrared photometry of a sample comprising 70 Spitzer-selected, low-mass ($M_{\rm{cl}}<100~\rm{M}_{\odot}$) young clusters in the galactic plane, and use the derived physical parameters to investigate the distributions of masses and evolutionary stages of clustered YSOs, and the implications of those distributions for studies of the IMF and the different models of star formation. We find that for low-mass clusters composed of class I and class II YSOs, there exists a non-trivial relationship between the total stellar mass of the cluster ($M_{\rm{cl}}$) and the mass of its most massive member ($m_{\rm{max}}$). The properties of the derived correlation are most compatible with the random sampling of a Kroupa IMF, with a fundamental high-mass limit of $150~\rm{M}_{\odot}$. Our results are also compatible with SPH models that predict a dynamical termination of the accretion in protostars, with massive stars undergoing this stopping at later times in their evolution.

[9] oai:arXiv.org:1801.09732 [pdf] - 1626565

Uncertain classification of Variable Stars: handling observational GAPS and noise

Castro, Nicolas; Protopapas, Pavlos; Pichara, Karim

Comments:

Submitted: 2018-01-29

Automatic classification methods applied to sky surveys have revolutionized the astronomical target selection process. Most surveys generate a vast amount of time series, or \quotes{lightcurves}, that represent the brightness variability of stellar objects in time. Unfortunately, lightcurves' observations take several years to be completed, producing truncated time series that generally remain without the application of automatic classifiers until they are finished. This happens because state of the art methods rely on a variety of statistical descriptors or features that present an increasing degree of dispersion when the number of observations decreases, which reduces their precision. In this paper we propose a novel method that increases the performance of automatic classifiers of variable stars by incorporating the deviations that scarcity of observations produces. Our method uses Gaussian Process Regression to form a probabilistic model of each lightcurve's observations. Then, based on this model, bootstrapped samples of the time series features are generated. Finally a bagging approach is used to improve the overall performance of the classification. We perform tests on the MACHO and OGLE catalogs, results show that our method classifies effectively some variability classes using a small fraction of the original observations. For example, we found that RR Lyrae stars can be classified with around 80\% of accuracy just by observing the first 5\% of the whole lightcurves' observations in MACHO and OGLE catalogs. We believe these results prove that, when studying lightcurves, it is important to consider the features' error and how the measurement process impacts it.

[10] oai:arXiv.org:1801.09737 [pdf] - 1626567

Automatic Survey-Invariant Variable Star Classification

Benavente, Patricio; Protopapas, Pavlos; Pichara, Karim

Comments:

Submitted: 2018-01-29

Machine learning techniques have been successfully used to classify variable stars on widely-studied astronomical surveys. These datasets have been available to astronomers long enough, thus allowing them to perform deep analysis over several variable sources and generating useful catalogs with identified variable stars. The products of these studies are labeled data that enable supervised learning models to be trained successfully. However, when these models are blindly applied to data from new sky surveys their performance drops significantly. Furthermore, unlabeled data becomes available at a much higher rate than its labeled counterpart, since labeling is a manual and time-consuming effort. Domain adaptation techniques aim to learn from a domain where labeled data is available, the \textit{source domain}, and through some adaptation perform well on a different domain, the \textit{target domain}. We propose a full probabilistic model that represents the joint distribution of features from two surveys as well as a probabilistic transformation of the features between one survey to the other. This allows us to transfer labeled data to a study where it is not available and to effectively run a variable star classification model in a new survey. Our model represents the features of each domain as a Gaussian mixture and models the transformation as a translation, rotation and scaling of each separate component. We perform tests using three different variability catalogs: EROS, MACHO, and HiTS, presenting differences among them, such as the amount of observations per star, cadence, observational time and optical bands observed, among others.

[11] oai:arXiv.org:1709.05427 [pdf] - 1648624

A dwarf planet class object in the 21:5 resonance with Neptune

Comments:

Submitted: 2017-09-15

We report the discovery of a $H_r = 3.4\pm0.1$ dwarf planet candidate by the Pan-STARRS Outer Solar System Survey. 2010 JO$_{179}$ is red with $(g-r)=0.88 \pm 0.21$, roughly round, and slowly rotating, with a period of $30.6$ hr. Estimates of its albedo imply a diameter of 600--900~km. Observations sampling the span between 2005--2016 provide an exceptionally well-determined orbit for 2010 JO$_{179}$, with a semi-major axis of $78.307\pm0.009$ au, distant orbits known to this precision are rare. We find that 2010 JO$_{179}$ librates securely within the 21:5 mean-motion resonance with Neptune on hundred-megayear time scales, joining the small but growing set of known distant dwarf planets on metastable resonant orbits. These imply a substantial trans-Neptunian population that shifts between stability in high-order resonances, the detached population, and the eroding population of the scattering disk.

[12] oai:arXiv.org:1709.03541 [pdf] - 1685055

Robust period estimation using mutual information for multi-band light curves in the synoptic survey era

Huijse, Pablo; Estevez, Pablo A.; Forster, Francisco; Daniel, Scott F.; Connolly, Andrew J.; Protopapas, Pavlos; Carrasco, Rodrigo; Principe, Jose C.

Comments: Accepted for publication ApJ Supplement Series: Special Issue on Solar/Stellar Astronomy Big Data

Submitted: 2017-09-11

The Large Synoptic Survey Telescope (LSST) will produce an unprecedented amount of light curves using six optical bands. Robust and efficient methods that can aggregate data from multidimensional sparsely-sampled time series are needed. In this paper we present a new method for light curve period estimation based on the quadratic mutual information (QMI). The proposed method does not assume a particular model for the light curve nor its underlying probability density and it is robust to non-Gaussian noise and outliers. By combining the QMI from several bands the true period can be estimated even when no single-band QMI yields the period. Period recovery performance as a function of average magnitude and sample size is measured using 30,000 synthetic multi-band light curves of RR Lyrae and Cepheid variables generated by the LSST Operations and Catalog simulators. The results show that aggregating information from several bands is highly beneficial in LSST sparsely-sampled time series, obtaining an absolute increase in period recovery rate up to 50%. We also show that the QMI is more robust to noise and light curve length (sample size) than the multiband generalizations of the Lomb Scargle and Analysis of Variance periodograms, recovering the true period in 10-30% more cases than its competitors. A python package containing efficient Cython implementations of the QMI and other methods is provided.

[13] oai:arXiv.org:1612.08747 [pdf] - 1539002

Detection of Time Lags Between Quasar Continuum Emission Bands based on Pan-STARRS Light-curves

Comments: 18 pages, 17 Figures, 3 Tables, submitted to ApJ

Submitted: 2016-12-27

We study the time lags between the continuum emission of quasars at different wavelengths, based on more than four years of multi-band ($g$, $r$, $i$, $z$) light-curves in the Pan-STARRS Medium Deep Fields. As photons from different bands emerge from different radial ranges in the accretion disk, the lags constrain the sizes of the accretion disks. We select 240 quasars with redshifts $z \approx 1$ or $z \approx 0.3$ that are relatively emission line free. The light curves are sampled from day to month timescales, which makes it possible to detect lags on the scale of the light crossing time of the accretion disks. With the code JAVELIN, we detect typical lags of several days in the rest frame between the $g$ band and the $riz$ bands. The detected lags are $\sim 2-3$ times larger than the light crossing time estimated from the standard thin disk model, consistent with the recently measured lag in NGC5548 and micro-lensing measurements of quasars. The lags in our sample are found to increase with increasing luminosity. Furthermore, the increase in lags going from $g-r$ to $g-i$ and then to $g-z$ is slower than predicted in the thin disk model, particularly for high luminosity quasars. The radial temperature profile in the disk must be different from what is assumed. We also find evidence that the lags decrease with increasing line ratios between ultraviolet FeII lines and MgII, which may point to changes in the accretion disk structure at higher metallicity.

[14] oai:arXiv.org:1602.08977 [pdf] - 1388963

Clustering Based Feature Learning on Variable Stars

Mackenzie, Cristóbal; Pichara, Karim; Protopapas, Pavlos

Comments:

Submitted: 2016-02-29

The success of automatic classification of variable stars strongly depends on the lightcurve representation. Usually, lightcurves are represented as a vector of many statistical descriptors designed by astronomers called features. These descriptors commonly demand significant computational power to calculate, require substantial research effort to develop and do not guarantee good performance on the final classification task. Today, lightcurve representation is not entirely automatic; algorithms that extract lightcurve features are designed by humans and must be manually tuned up for every survey. The vast amounts of data that will be generated in future surveys like LSST mean astronomers must develop analysis pipelines that are both scalable and automated. Recently, substantial efforts have been made in the machine learning community to develop methods that prescind from expert-designed and manually tuned features for features that are automatically learned from data. In this work we present what is, to our knowledge, the first unsupervised feature learning algorithm designed for variable stars. Our method first extracts a large number of lightcurve subsequences from a given set of photometric data, which are then clustered to find common local patterns in the time series. Representatives of these patterns, called exemplars, are then used to transform lightcurves of a labeled set into a new representation that can then be used to train an automatic classifier. The proposed algorithm learns the features from both labeled and unlabeled lightcurves, overcoming the bias generated when the learning process is done only with labeled data. We test our method on MACHO and OGLE datasets; the results show that the classification performance we achieve is as good and in some cases better than the performance achieved using traditional features, while the computational cost is significantly lower.

[15] oai:arXiv.org:1601.03013 [pdf] - 1365579

Meta Classification for Variable Stars

Pichara, Karim; Protopapas, Pavlos; León, Daniel

Comments: Accepted for publication, The Astrophysical Journal

Submitted: 2016-01-12

The need for the development of automatic tools to explore astronomical databases has been recognized since the inception of CCDs and modern computers. Astronomers already have developed solutions to tackle several science problems, such as automatic classification of stellar objects, outlier detection, and globular clusters identification, among others. New science problems emerge and it is critical to be able to re-use the models learned before, without rebuilding everything from the beginning when the science problem changes. In this paper, we propose a new meta-model that automatically integrates existing classification models of variable stars. The proposed meta-model incorporates existing models that are trained in a different context, answering different questions and using different representations of data. Conventional mixture of experts algorithms in machine learning literature can not be used since each expert (model) uses different inputs. We also consider computational complexity of the model by using the most expensive models only when it is necessary. We test our model with EROS-2 and MACHO datasets, and we show that we solve most of the classification challenges only by training a meta-model to learn how to integrate the previous experts.

[16] oai:arXiv.org:1509.07823 [pdf] - 1283333

Computational Intelligence Challenges and Applications on Large-Scale Astronomical Time Series Databases

Huijse, Pablo; Estevez, Pablo A.; Protopapas, Pavlos; Principe, Jose C.; Zegers, Pablo

Comments:

Submitted: 2015-09-25

Time-domain astronomy (TDA) is facing a paradigm shift caused by the exponential growth of the sample size, data complexity and data generation rates of new astronomical sky surveys. For example, the Large Synoptic Survey Telescope (LSST), which will begin operations in northern Chile in 2022, will generate a nearly 150 Petabyte imaging dataset of the southern hemisphere sky. The LSST will stream data at rates of 2 Terabytes per hour, effectively capturing an unprecedented movie of the sky. The LSST is expected not only to improve our understanding of time-varying astrophysical objects, but also to reveal a plethora of yet unknown faint and fast-varying phenomena. To cope with a change of paradigm to data-driven astronomy, the fields of astroinformatics and astrostatistics have been created recently. The new data-oriented paradigms for astronomy combine statistics, data mining, knowledge discovery, machine learning and computational intelligence, in order to provide the automated and robust methods needed for the rapid detection and classification of known astrophysical objects as well as the unsupervised characterization of novel phenomena. In this article we present an overview of machine learning and computational intelligence applications to TDA. Future big data challenges and new lines of research in TDA, focusing on the LSST, are identified and discussed from the viewpoint of computational intelligence/machine learning. Interdisciplinary collaboration will be required to cope with the challenges posed by the deluge of astronomical data coming from the LSST.

[17] oai:arXiv.org:1506.00010 [pdf] - 1269336

FATS: Feature Analysis for Time Series

Nun, Isadora; Protopapas, Pavlos; Sim, Brandon; Zhu, Ming; Dave, Rahul; Castro, Nicolas; Pichara, Karim

Comments:

Submitted: 2015-05-29, last modified: 2015-08-31

In this paper, we present the FATS (Feature Analysis for Time Series) library. FATS is a Python library which facilitates and standardizes feature extraction for time series data. In particular, we focus on one application: feature extraction for astronomical light curve data, although the library is generalizable for other uses. We detail the methods and features implemented for light curve analysis, and present examples for its usage.

[18] oai:arXiv.org:1404.4888 [pdf] - 1085209

Supervised detection of anomalous light-curves in massive astronomical catalogs

Nun, Isadora; Pichara, Karim; Protopapas, Pavlos; Kim, Dae-Won

Comments: 16 pages, 18 figures, published in The Astrophysical Journal

Submitted: 2014-04-18, last modified: 2015-05-27

The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. To process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new method to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all the information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each of the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. Our method is suitable for exploring massive datasets given that the training process is performed offline. We tested our algorithm on 20 millions light-curves from the MACHO catalog and generated a list of anomalous candidates. We divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post analysis stage by perfoming a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables and X-ray sources. For some outliers there were no additional information. Among them we identified three unknown variability types and few individual outliers that will be followed up for a deeper analysis.

[19] oai:arXiv.org:1412.1840 [pdf] - 1282187

A Novel, Fully Automated Pipeline for Period Estimation in the EROS 2 Data Set

Protopapas, Pavlos; Huijse, Pablo; Estevez, Pablo A.; Zegers, Pablo; Principe, Jose C.

Comments:

Submitted: 2014-12-04

We present a new method to discriminate periodic from non-periodic irregularly sampled lightcurves. We introduce a periodic kernel and maximize a similarity measure derived from information theory to estimate the periods and a discriminator factor. We tested the method on a dataset containing 100,000 synthetic periodic and non-periodic lightcurves with various periods, amplitudes and shapes generated using a multivariate generative model. We correctly identified periodic and non-periodic lightcurves with a completeness of 90% and a precision of 95%, for lightcurves with a signal-to-noise ratio (SNR) larger than 0.5. We characterize the efficiency and reliability of the model using these synthetic lightcurves and applied the method on the EROS-2 dataset. A crucial consideration is the speed at which the method can be executed. Using hierarchical search and some simplification on the parameter search we were able to analyze 32.8 million lightcurves in 18 hours on a cluster of GPGPUs. Using the sensitivity analysis on the synthetic dataset, we infer that 0.42% in the LMC and 0.61% in the SMC of the sources show periodic behavior. The training set, the catalogs and source code are all available in http://timemachine.iic.harvard.edu.

[20] oai:arXiv.org:1403.6131 [pdf] - 844816

The EPOCH Project: I. Periodic variable stars in the EROS-2 LMC database

Kim, Dae-Won; Protopapas, Pavlos; Bailer-Jones, Coryn A. L.; Byun, Yong-Ik; Chang, Seo-Won; Marquette, Jean-Baptiste; Shin, Min-Su

Comments: 18 pages, 20 figures, suggseted language-editing by the A&A editorial office is applied

Submitted: 2014-03-24, last modified: 2014-03-28

The EPOCH (EROS-2 periodic variable star classification using machine learning) project aims to detect periodic variable stars in the EROS-2 light curve database. In this paper, we present the first result of the classification of periodic variable stars in the EROS-2 LMC database. To classify these variables, we first built a training set by compiling known variables in the Large Magellanic Cloud area from the OGLE and MACHO surveys. We crossmatched these variables with the EROS-2 sources and extracted 22 variability features from 28 392 light curves of the corresponding EROS-2 sources. We then used the random forest method to classify the EROS-2 sources in the training set. We designed the model to separate not only $\delta$ Scuti stars, RR Lyraes, Cepheids, eclipsing binaries, and long-period variables, the superclasses, but also their subclasses, such as RRab, RRc, RRd, and RRe for RR Lyraes, and similarly for the other variable types. The model trained using only the superclasses shows 99% recall and precision, while the model trained on all subclasses shows 87% recall and precision. We applied the trained model to the entire EROS-2 LMC database, which contains about 29 million sources, and found 117 234 periodic variable candidates. Out of these 117 234 periodic variables, 55 285 have not been discovered by either OGLE or MACHO variability studies. This set comprises 1 906 $\delta$ Scuti stars, 6 607 RR Lyraes, 638 Cepheids, 178 Type II Cepheids, 34 562 eclipsing binaries, and 11 394 long-period variables. A catalog of these EROS-2 LMC periodic variable stars will be available online at http://stardb.yonsei.ac.kr and at the CDS website (http://vizier.u-strasbg.fr/viz-bin/VizieR).

[21] oai:arXiv.org:1403.2181 [pdf] - 794631

The expansion rate of the intermediate Universe in light of Planck

Verde, Licia; Protopapas, Pavlos; Jimenez, Raul

Comments: Submitted to Physics of the Dark Universe

Submitted: 2014-03-10

We use cosmology-independent measurements of the expansion history in the redshift range 0.1 < z <1.2 and compare them with the Cosmic Microwave Background-derived expansion history predictions. The motivation is to investigate if the tension between the local (cosmology independent) Hubble constant H0 value and the Planck-derived H0 is also present at other redshifts. We conclude that there is no tension between Planck and cosmology independent-measurements of the Hubble parameter H(z) at 0.1 < z < 1.2 for the LCDM model (odds of tension are only 1:15, statistically not significant). Considering extensions of the LCDM model does not improve these odds (actually makes them worse), thus favouring the simpler model over its extensions. On the other hand the H(z) data are also not in tension with the local H0 measurements but the combination of all three data-sets shows a highly significant tension (odds ~ 1:400). Thus the new data deepen the mystery of the mismatch between Planck and local H0 measurements, and cannot univocally determine wether it is an effect localised at a particular redshift. Having said this, we find that assuming the NGC4258 maser distance as the correct anchor for H0, brings the odds to comfortable values. Further, using only the expansion history measurements we constrain, within the LCDM model, H0 = 68.5 +- 3.5 and Omega_m = 0.32 +- 0.05 without relying on any CMB prior. We also address the question of how smooth the expansion history of the universe is given the cosmology independent data and conclude that there is no evidence for deviations from smoothness on the expansion history, neither variations with time in the value of the equation of state of dark energy.

[22] oai:arXiv.org:1402.6403 [pdf] - 1203579

Pan-STARRS 1 observations of the unusual active Centaur P/2011 S1(Gibbs)

Lin, H. W.; Chen, Y. T.; Lacerda, P.; Ip, W. H.; Holman, M.; Protopapas, P.; Chen, W. P.; Burgett, W. S.; Chambers, K. C.; Flewelling, H.; Huber, M. E.; Jedicke, R.; Kaiser, N.; Magnier, E. A.; Metcalfe, N.; Price, P. A.

Comments: 30 pages, 6 figures, Accepted to AJ

Submitted: 2014-02-25

P/2011 S1 (Gibbs) is an outer solar system comet or active Centaur with a similar orbit to that of the famous 29P/Schwassmann-Wachmann 1. P/2011 S1 (Gibbs) has been observed by the Pan-STARRS 1 (PS1) sky survey from 2010 to 2012. The resulting data allow us to perform multi-color studies of the nucleus and coma of the comet. Analysis of PS1 images reveals that P/2011 S1 (Gibbs) has a small nucleus $< 4$ km radius, with colors $g_{P1}-r_{P1} = 0.5 \pm 0.02$, $r_{P1}-i_{P1} = 0.12 \pm 0.02$ and $i_{P1}-z_{P1} = 0.46 \pm 0.03$. The comet remained active from 2010 to 2012, with a model-dependent mass-loss rate of $\sim100$ kg s$^{-1}$. The mass-loss rate per unit surface area of P/2011 S1 (Gibbs) is as high as that of 29P/Schwassmann-Wachmann 1, making it one of the most active Centaurs. The mass-loss rate also varies with time from $\sim 40$ kg s$^{-1}$ to 150 kg s$^{-1}$. Due to its rather circular orbit, we propose that P/2011 S1 (Gibbs) has 29P/Schwassmann-Wachmann 1-like outbursts that control the outgassing rate. The results indicate that it may have a similar surface composition to that of 29P/Schwassmann-Wachmann 1. Our numerical simulations show that the future orbital evolution of P/2011 S1 (Gibbs) is more similar to that of the main population of Centaurs than to that of 29P/Schwassmann-Wachmann 1. The results also demonstrate that P/2011 S1 (Gibbs) is dynamically unstable and can only remain near its current orbit for roughly a thousand years.

[23] oai:arXiv.org:1310.7868 [pdf] - 739149

Automatic Classification of Variable Stars in Catalogs with missing data

Pichara, Karim; Protopapas, Pavlos

Comments:

Submitted: 2013-10-29

We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks, a probabilistic graphical model, that allows us to perform inference to pre- dict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilises sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model we use three catalogs with missing data (SAGE, 2MASS and UBVI) and one complete catalog (MACHO). We examine how classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches and at what computational cost. Integrating these catalogs with missing data we find that classification of variable objects improves by few percent and by 15% for quasar detection while keeping the computational cost the same.

[24] oai:arXiv.org:1306.6766 [pdf] - 686817

Planck and the local Universe: quantifying the tension

Verde, Licia; Protopapas, Pavlos; Jimenez, Raul

Comments: Submitted to Physics of the Dark Universe

Submitted: 2013-06-28

We use the latest Planck constraints, and in particular constraints on the derived parameters (Hubble constant and age of the Universe) for the local universe and compare them with local measurements of the same quantities. We propose a way to quantify whether cosmological parameters constraints from two different experiments are in tension or not. Our statistic, T, is an evidence ratio and therefore can be interpreted with the widely used Jeffrey's scale. We find that in the framework of the LCDM model, the Planck inferred two dimensional, joint, posterior distribution for the Hubble constant and age of the Universe is in "strong" tension with the local measurements; the odds being ~ 1:50. We explore several possibilities for explaining this tension and examine the consequences both in terms of unknown errors and deviations from the LCDM model. In some one-parameter LCDM model extensions, tension is reduced whereas in other extensions, tension is instead increased. In particular, small total neutrino masses are favored and a total neutrino mass above 0.15 eV makes the tension "highly significant" (odds ~ 1:150). A consequence of accepting this interpretation of the tension is that the degenerate neutrino hierarchy is highly disfavoured by cosmological data and the direct hierarchy is slightly favored over the inverse.

[25] oai:arXiv.org:1203.0970 [pdf] - 968262

Infinite Shift-invariant Grouped Multi-task Learning for Gaussian Processes

Wang, Yuyang; Khardon, Roni; Protopapas, Pavlos

Comments: This is an extended version of our ECML 2010 paper entitled "Shift-invariant Grouped Multi-task Learning for Gaussian Processes"; ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III

Submitted: 2012-03-05, last modified: 2013-05-20

Multi-task learning leverages shared information among data sets to improve the learning performance of individual tasks. The paper applies this framework for data where each task is a phase-shifted periodic time series. In particular, we develop a novel Bayesian nonparametric model capturing a mixture of Gaussian processes where each task is a sum of a group-specific function and a component capturing individual variation, in addition to each task being phase shifted. We develop an efficient \textsc{em} algorithm to learn the parameters of the model. As a special case we obtain the Gaussian mixture model and \textsc{em} algorithm for phased-shifted periodic time series. Furthermore, we extend the proposed model by using a Dirichlet Process prior and thereby leading to an infinite mixture model that is capable of doing automatic model selection. A Variational Bayesian approach is developed for inference in this model. Experiments in regression, classification and class discovery demonstrate the performance of the proposed models using both synthetic data and real-world time series data from astrophysics. Our methods are particularly useful when the time series are sparsely and non-synchronously sampled.

[26] oai:arXiv.org:1304.0401 [pdf] - 646062

An improved quasar detection method in EROS-2 and MACHO LMC datasets

Pichara, Karim; Protopapas, Pavlos; Kim, Dae-Won; Marquette, Jean-Baptiste; Tisserand, Patrick

Comments:

Submitted: 2013-04-01

We present a new classification method for quasar identification in the EROS-2 and MACHO datasets based on a boosted version of Random Forest classifier. We use a set of variability features including parameters of a continuous auto regressive model. We prove that continuous auto regressive parameters are very important discriminators in the classification process. We create two training sets (one for EROS-2 and one for MACHO datasets) using known quasars found in the LMC. Our model's accuracy in both EROS-2 and MACHO training sets is about 90% precision and 86% recall, improving the state of the art models accuracy in quasar detection. We apply the model on the complete, including 28 million objects, EROS-2 and MACHO LMC datasets, finding 1160 and 2551 candidates respectively. To further validate our list of candidates, we crossmatched our list with a previous 663 known strong candidates, getting 74% of matches for MACHO and 40% in EROS-2. The main difference on matching level is because EROS-2 is a slightly shallower survey which translates to significantly lower signal-to-noise ratio lightcurves.

[27] oai:arXiv.org:1303.1031 [pdf] - 1165012

Statistical Properties of Galactic {\delta} Scuti Stars: Revisited

Chang, Seo-Won; Protopapas, Pavlos; Kim, Dae-Won; Byun, Yong-Ik

Comments: 15 pages, 8 figures, 6 tables. Accepted for publication in AJ. The catalog is available at http://stardb.yonsei.ac.kr/DeltaScuti

Submitted: 2013-03-05

We present statistical characteristics of 1,578 {\delta} Scuti stars including nearby field stars and cluster member stars within the Milky Way. We obtained 46% of these stars (718 stars) from the works done by Rodr\'{i}guez and collected the remaining 54% stars (860 stars) from other literatures. We updated the entries with the latest information of sky coordinate, color, rotational velocity, spectral type, period, amplitude and binarity. The majority of our sample are well characterized in terms of typical period range (0.02-0.25 days), pulsation amplitudes (<0.5 mag) and spectral types (A-F type). Given this list of {\delta} Scuti stars, we examined relations between their physical properties (i.e., periods, amplitudes, spectral types and rotational velocities) for field stars and cluster members, and confirmed that the correlations of properties are not significantly different from those reported in the Rodr\'{i}guez's works. All the {\delta} Scuti stars are cross-matched with several X-ray and UV catalogs, resulting in 27 X-ray and 41 UV-only counterparts. These counterparts are interesting targets for further study because of their rarity and uniqueness in showing {\delta} Scuti-type variability and X-ray/UV emission at the same time. The compiled catalog can be accessed through the web interface http://stardb.yonsei.ac.kr/DeltaScuti

[28] oai:arXiv.org:1301.6182 [pdf] - 1159293

The TAOS Project: Results From Seven Years of Survey Data

Comments: 11 pages, 9 figures. Submitted to Astronomical Journal 2013 January 16

Submitted: 2013-01-25

The Taiwanese-American Occultation Survey (TAOS) aims to detect serendipitous occultations of stars by small (about 1 km diameter) objects in the Kuiper Belt and beyond. Such events are very rare (<0.001 events per star per year) and short in duration (about 200 ms), so many stars must be monitored at a high readout cadence. TAOS monitors typically around 500 stars simultaneously at a 5 Hz readout cadence with four telescopes located at Lulin Observatory in central Taiwan. In this paper, we report the results of the search for small Kuiper Belt Objects (KBOs) in seven years of data. No occultation events were found, resulting in a 95% c.l. upper limit on the slope of the faint end of the KBO size distribution of q = 3.34 to 3.82, depending on the surface density at the break in the size distribution at a diameter of about 90 km.

[29] oai:arXiv.org:1301.3027 [pdf] - 616798

Semi-parametric Robust Event Detection for Massive Time-Domain Databases

Blocker, Alexander W; Protopapas, Pavlos

Comments: 16 pages, 5 figures. A shorter version of this work appeared in Statistical Challenges in Modern Astronomy V, Springer-Verlag, 177-189. Implementations of the core algorithms of this paper in C and R are available as the rowavedt package via https://www.github.com/awblocker/rowavedt/

Submitted: 2013-01-14, last modified: 2013-01-19

The detection and analysis of events within massive collections of time-series has become an extremely important task for time-domain astronomy. In particular, many scientific investigations (e.g. the analysis of microlensing and other transients) begin with the detection of isolated events in irregularly-sampled series with both non-linear trends and non-Gaussian noise. We outline a semi-parametric, robust, parallel method for identifying variability and isolated events at multiple scales in the presence of the above complications. This approach harnesses the power of Bayesian modeling while maintaining much of the speed and scalability of more ad-hoc machine learning approaches. We also contrast this work with event detection methods from other fields, highlighting the unique challenges posed by astronomical surveys. Finally, we present results from the application of this method to 87.2 million EROS-2 sources, where we have obtained a greater than 100-fold reduction in candidates for certain types of phenomena while creating high-quality features for subsequent analyses.

[30] oai:arXiv.org:1212.2398 [pdf] - 903316

An Information Theoretic Algorithm for Finding Periodicities in Stellar Light Curves

Huijse, Pablo; Estevez, Pablo A.; Protopapas, Pavlos; Zegers, Pablo; Principe, Jose C.

Comments:

Submitted: 2012-12-11

We propose a new information theoretic metric for finding periodicities in stellar light curves. Light curves are astronomical time series of brightness over time, and are characterized as being noisy and unevenly sampled. The proposed metric combines correntropy (generalized correlation) with a periodic kernel to measure similarity among samples separated by a given period. The new metric provides a periodogram, called Correntropy Kernelized Periodogram (CKP), whose peaks are associated with the fundamental frequencies present in the data. The CKP does not require any resampling, slotting or folding scheme as it is computed directly from the available samples. CKP is the main part of a fully-automated pipeline for periodic light curve discrimination to be used in astronomical survey databases. We show that the CKP method outperformed the slotted correntropy, and conventional methods used in astronomy for periodicity discrimination and period estimation tasks, using a set of light curves drawn from the MACHO survey. The proposed metric achieved 97.2% of true positives with 0% of false positives at the confidence level of 99% for the periodicity discrimination task; and 88% of hits with 11.6% of multiples and 0.4% of misses in the period estimation task.

[31] oai:arXiv.org:1204.3055 [pdf] - 1886350

IVOA Recommendation: Spectrum Data Model 1.1

McDowell, Jonathan; Tody, Doug; Budavari, Tamas; Dolensky, Markus; Kamp, Inga; McCusker, Kelly; Protopapas, Pavlos; Rots, Arnold; Thompson, Randy; Valdes, Frank; Skoda, Petr; Rino, Bruno; Derriere, Sebastien; Salgado, Jesus; Laurino, Omar; Layer, the IVOA Data Access; Groups, Data Model Working

Comments: http://www.ivoa.net

Submitted: 2012-04-13

We present a data model describing the structure of spectrophotometric datasets with spectral and temporal coordinates and associated metadata. This data model may be used to represent spectra, time series data, segments of SED (Spectral Energy Distributions) and other spectral or temporal associations.

[32] oai:arXiv.org:1111.1315 [pdf] - 550465

Nonparametric Bayesian Estimation of Periodic Functions

Wang, Yuyang; Khardon, Roni; Protopapas, Pavlos

Comments:

Submitted: 2011-11-05, last modified: 2012-03-06

Many real world problems exhibit patterns that have periodic behavior. For example, in astrophysics, periodic variable stars play a pivotal role in understanding our universe. An important step when analyzing data from such processes is the problem of identifying the period: estimating the period of a periodic function based on noisy observations made at irregularly spaced time points. This problem is still a difficult challenge despite extensive study in different disciplines. The paper makes several contributions toward solving this problem. First, we present a nonparametric Bayesian model for period finding, based on Gaussian Processes (GP), that does not make strong assumptions on the shape of the periodic function. As our experiments demonstrate, the new model leads to significantly better results in period estimation when the target function is non-sinusoidal. Second, we develop a new algorithm for parameter optimization for GP which is useful when the likelihood function is very sensitive to the setting of the hyper-parameters with numerous local minima, as in the case of period estimation. The algorithm combines gradient optimization with grid search and incorporates several mechanisms to overcome the high complexity of inference with GP. Third, we develop a novel approach for using domain knowledge, in the form of a probabilistic generative model, and incorporate it into the period estimation algorithm. Experimental results on astrophysics data validate our approach showing significant improvement over the state of the art in this domain.

[33] oai:arXiv.org:1110.5632 [pdf] - 1085137

A Refined QSO Selection Method Using Diagnostics Tests: 663 QSO Candidates in the LMC

Kim, Dae-Won; Protopapas, Pavlos; Trichas, Markos; Rowan-Robinson, Michael; Khardon, Roni; Alcock, Charles; Byun, Yong-Ik

Comments: 13 pages, 17 figures. accepted for publication in ApJ

Submitted: 2011-10-25, last modified: 2011-12-31

We present 663 QSO candidates in the Large Magellanic Cloud (LMC) selected using multiple diagnostics. We started with a set of 2,566 QSO candidates from our previous work selected using time variability of the MACHO LMC lightcurves. We then obtained additional information for the candidates by crossmatching them with the Spitzer SAGE, the MACHO UBVI, the 2MASS, the Chandra and the XMM catalogs. Using this information, we specified six diagnostic features based on mid-IR colors, photometric redshifts using SED template fitting, and X-ray luminosities in order to further discriminate high confidence QSO candidates in the absence of spectra information. We then trained a one-class SVM (Support Vector Machine) model using the diagnostics features of the confirmed 58 MACHO QSOs. We applied the trained model to the original candidates and finally selected 663 high confidence QSO candidates. Furthermore, we crossmatched these 663 QSO candidates with the newly confirmed 144 QSOs and 275 non-QSOs in the LMC fields. On the basis of the counterpart analysis, we found that the false positive rate is less than 1%.

[34] oai:arXiv.org:1112.2962 [pdf] - 903304

Period Estimation in Astronomical Time Series Using Slotted Correntropy

Huijse, Pablo; Estévez, Pablo A.; Zegers, Pablo; Príncipe, José; Protopapas, Pavlos

Comments:

Submitted: 2011-12-13

In this letter, we propose a method for period estimation in light curves from periodic variable stars using correntropy. Light curves are astronomical time series of stellar brightness over time, and are characterized as being noisy and unevenly sampled. We propose to use slotted time lags in order to estimate correntropy directly from irregularly sampled time series. A new information theoretic metric is proposed for discriminating among the peaks of the correntropy spectral density. The slotted correntropy method outperformed slotted correlation, string length, VarTools (Lomb-Scargle periodogram and Analysis of Variance), and SigSpec applications on a set of light curves drawn from the MACHO survey.

[35] oai:arXiv.org:1101.3316 [pdf] - 1051482

QSO Selection Algorithm Using Time Variability and Machine Learning: Selection of 1,620 QSO Candidates from MACHO LMC Database

Kim, Dae-Won; Protopapas, Pavlos; Byun, Yong-Ik; Alcock, Charles; Khardon, Roni; Trichas, Markos

Comments: 17 pages, 11 figures; accepted for the publication in ApJ

Submitted: 2011-01-17, last modified: 2011-04-19

We present a new QSO selection algorithm using a Support Vector Machine (SVM), a supervised classification method, on a set of extracted times series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars and microlensing events using 58 known QSOs, 1,629 variable stars and 4,288 non-variables using the MAssive Compact Halo Object (MACHO) database as a training set. To estimate the efficiency and the accuracy of the model, we perform a cross-validation test using the training set. The test shows that the model correctly identifies ~80% of known QSOs with a 25% false positive rate. The majority of the false positives are Be stars. We applied the trained model to the MACHO Large Magellanic Cloud (LMC) dataset, which consists of 40 million lightcurves, and found 1,620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false positive rate, we crossmatched the candidates with astronomical catalogs including the Spitzer Surveying the Agents of a Galaxy's Evolution (SAGE) LMC catalog and a few X-ray catalogs. The results further suggest that the majority of the candidates, more than 70%, are QSOs.

[36] oai:arXiv.org:1008.2209 [pdf] - 1034253

Trans-Neptunian Objects with Hubble Space Telescope ACS/WFC

Fuentes, Cesar I.; Holman, Matthew J.; Trilling, David E.; Protopapas, Pavlos

Comments: 16 page, 10 figures, accepted by ApJ

Submitted: 2010-08-12

We introduce a novel search technique that can identify trans-neptunian objects in three to five exposures of a pointing within a single Hubble Space Telescope orbit. The process is fast enough to allow the discovery of candidates soon after the data are available. This allows sufficient time to schedule follow up observations with HST within a month. We report the discovery of 14 slow-moving objects found within 5\circ of the ecliptic in archival data taken with the Wide Field Channel of the Advanced Camera for Surveys. The luminosity function of these objects is consistent with previous ground-based and space-based results. We show evidence that the size distribution of both high and low inclination populations is similar for objects smaller than 100 km, as expected from collisional evolution models, while their size distribution differ for brighter objects. We suggest the two populations formed in different parts of the protoplanetary disk and after being dynamically mixed have collisionally evolved together. Among the objects discovered there is an equal mass binary with an angular separation ~ 0."53.

[37] oai:arXiv.org:1003.2526 [pdf] - 1025677

The TAOS Project Stellar Variability II. Detection of 15 Variable Stars

Comments: 20 pages, 6 figures, accepted in The Astronomical Journal

Submitted: 2010-03-12

The Taiwanese-American Occultation Survey (TAOS) project has collected more than a billion photometric measurements since 2005 January. These sky survey data-covering timescales from a fraction of a second to a few hundred days-are a useful source to study stellar variability. A total of 167 star fields, mostly along the ecliptic plane, have been selected for photometric monitoring with the TAOS telescopes. This paper presents our initial analysis of a search for periodic variable stars from the time-series TAOS data on one particular TAOS field, No. 151 (RA = 17$^{\rm h}30^{\rm m}6\fs$67, Dec = 27\degr17\arcmin 30\arcsec, J2000), which had been observed over 47 epochs in 2005. A total of 81 candidate variables are identified in the 3 square degree field, with magnitudes in the range 8 < R < 16. On the basis of the periodicity and shape of the lightcurves, 29 variables, 15 of which were previously unknown, are classified as RR Lyrae, Cepheid, delta Scuti, SX Phonencis, semi-regular and eclipsing binaries.

[38] oai:arXiv.org:1002.3626 [pdf] - 1025278

The TAOS Project: Statistical Analysis of Multi-Telescope Time Series Data

Comments: 15 pages, 14 figures. Submitted to PASP

Submitted: 2010-02-18

The Taiwanese-American Occultation Survey (TAOS) monitors fields of up to ~1000 stars at 5 Hz simultaneously with four small telescopes to detect occultation events from small (~1 km) Kuiper Belt Objects (KBOs). The survey presents a number of challenges, in particular the fact that the occultation events we are searching for are extremely rare and are typically manifested as slight flux drops for only one or two consecutive time series measurements. We have developed a statistical analysis technique to search the multi-telescope data set for simultaneous flux drops which provides a robust false positive rejection and calculation of event significance. In this paper, we describe in detail this statistical technique and its application to the TAOS data set.

[39] oai:arXiv.org:1001.2006 [pdf] - 430445

The TAOS Project: Upper Bounds on the Population of Small KBOs and Tests of Models of Formation and Evolution of the Outer Solar System

Comments: 18 pages, 16 figures, Aj submitted

Submitted: 2010-01-12, last modified: 2010-01-15

We have analyzed the first 3.75 years of data from TAOS, the Taiwanese American Occultation Survey. TAOS monitors bright stars to search for occultations by Kuiper Belt Objects (KBOs). This dataset comprises 5e5 star-hours of multi-telescope photometric data taken at 4 or 5 Hz. No events consistent with KBO occultations were found in this dataset. We compute the number of events expected for the Kuiper Belt formation and evolution models of Pan & Sari (2005), Kenyon & Bromley (2004), Benavidez & Campo Bagatin (2009), and Fraser (2009). A comparison with the upper limits we derive from our data constrains the parameter space of these models. This is the first detailed comparison of models of the KBO size distribution with data from an occultation survey. Our results suggest that the KBO population is comprised of objects with low internal strength and that planetary migration played a role in the shaping of the size distribution.

[40] oai:arXiv.org:0912.1791 [pdf] - 31552

The TAOS Project Stellar Variability I. Detection of Low-Amplitude delta Scuti Stars

Comments: Accepted for publication in AJ

Submitted: 2009-12-09, last modified: 2009-12-10

We analyzed data accumulated during 2005 and 2006 by the Taiwan-American Occultation Survey (TAOS) in order to detect short-period variable stars (periods of <~ 1 hour) such as delta Scuti. TAOS is designed for the detection of stellar occultation by small-size Kuiper Belt Objects (KBOs) and is operating four 50cm telescopes at an effective cadence of 5Hz. The four telescopes simultaneously monitor the same patch of the sky in order to reduce false positives. To detect short-period variables, we used the Fast Fourier Transform algorithm (FFT) inasmuch as the data points in TAOS light-curves are evenly spaced. Using FFT, we found 41 short-period variables with amplitudes smaller than a few hundredths of a magnitude and periods of about an hour, which suggest that they are low-amplitude delta Scuti stars (LADS). The light-curves of TAOS delta Scuti stars are accessible online at the Time Series Center website (http://timemachine.iic.harvard.edu)

[41] oai:arXiv.org:0910.5598 [pdf] - 1018138

Searching for sub-kilometer TNOs using Pan-STARRS video mode lightcurves: Preliminary study and evaluation using engineering data

Wang, J. -H.; Protopapas, P.; Chen, W. -P.; Alcock, C. R.; Burgett, W. S.; Dombeck, T.; Morgan, J. S.; Price, P. A.; Tonry, J. L.

Comments: 27 pages, 17 figures, add co-author, citation

Submitted: 2009-10-29, last modified: 2009-11-04

We present a pre-survey study of using Pan-STARRS high sampling rate video mode guide star images to search for TNOs. With suitable selection of the guide stars within the Pan-STARRS 7 deg^{2} field of view, the lightcurves of these guide stars can also be used to search for occultations by TNOs. The best target stars for this purpose are stars with high signal-to-noise ratio (SNR) and small angular size.In order to do this, we compiled a catalog using the SNR calculated from stars with m_V <13 mag in the Tycho2 catalog then cross matched these stars with the 2MASS catalog and estimated their angular sizes from (V-K) color. We also outlined a new detection method based on matched filter that is optimized to search for diffraction patterns in the lightcurves due to occultation by sub-kilometer TNOs. A detection threshold is set to compromise between real detections and false positives. Depending on the theoretical size distribution model used, we expect to find up to a hundred events during the three-year life time of the Pan-STARRS-1 project. We have tested the detection algorithm and the pipeline on a set of engineering data (taken at 10Hz in stead of 30Hz). No events were found within the engineering data, which is consistent with the small size of the data set and the theoretical models. Meanwhile, with a total of ~ 22 star-hours video mode data (|\beta| < 10^{\circ}), we are able to set an upper limit of N(>0.5 km) ~ 2.47x10^10 deg^-2 at 95% confidence limit.

[42] oai:arXiv.org:0910.5282 [pdf] - 1018120

Upper Limits on the Number of Small Bodies in Sedna-Like Orbits by the TAOS Project

Comments: 25 pages, 13 figures

Submitted: 2009-10-27

We present the results of a search for occultation events by objects at distances between 100 and 1000 AU in lightcurves from the Taiwanese-American Occultation Survey (TAOS). We searched for consecutive, shallow flux reductions in the stellar lightcurves obtained by our survey between 7 February 2005 and 31 December 2006 with a total of $\sim4.5\times10^{9}$ three-telescope simultaneous photometric measurements. No events were detected, allowing us to set upper limits on the number density as a function of size and distance of objects in Sedna-like orbits, using simple models.

[43] oai:arXiv.org:0905.3428 [pdf] - 24487

Finding Anomalous Periodic Time Series: An Application to Catalogs of Periodic Variable Stars

Rebbapragada, Umaa; Protopapas, Pavlos; Brodley, Carla E.; Alcock, Charles

Comments:

Submitted: 2009-05-20

Catalogs of periodic variable stars contain large numbers of periodic light-curves (photometric time series data from the astrophysics domain). Separating anomalous objects from well-known classes is an important step towards the discovery of new classes of astronomical objects. Most anomaly detection methods for time series data assume either a single continuous time series or a set of time series whose periods are aligned. Light-curve data precludes the use of these methods as the periods of any given pair of light-curves may be out of sync. One may use an existing anomaly detection method if, prior to similarity calculation, one performs the costly act of aligning two light-curves, an operation that scales poorly to massive data sets. This paper presents PCAD, an unsupervised anomaly detection method for large sets of unsynchronized periodic time-series data, that outputs a ranked list of both global and local anomalies. It calculates its anomaly score for each light-curve in relation to a set of centroids produced by a modified k-means clustering algorithm. Our method is able to scale to large data sets through the use of sampling. We validate our method on both light-curve data and other time series data sets. We demonstrate its effectiveness at finding known anomalies, and discuss the effect of sample size and number of centroids on our results. We compare our method to naive solutions and existing time series anomaly detection methods for unphased data, and show that PCAD's reported anomalies are comparable to or better than all other methods. Finally, astrophysicists on our team have verified that PCAD finds true anomalies that might be indicative of novel astrophysical phenomena.

[44] oai:arXiv.org:0812.1010 [pdf] - 19177

De-Trending Time Series for Astronomical Variability Surveys

Kim, Dae-Won; Protopapas, Pavlos; Alcock, Charles; Byun, Yong-Ik; Bianco, Federica

Comments: Revised version according to the referee's second review

Submitted: 2008-12-04, last modified: 2009-04-13

We present a de-trending algorithm for the removal of trends in time series. Trends in time series could be caused by various systematic and random noise sources such as cloud passages, changes of airmass, telescope vibration or CCD noise. Those trends undermine the intrinsic signals of stars and should be removed. We determine the trends from subsets of stars that are highly correlated among themselves. These subsets are selected based on a hierarchical tree clustering algorithm. A bottom-up merging algorithm based on the departure from normal distribution in the correlation is developed to identify subsets, which we call clusters. After identification of clusters, we determine a trend per cluster by weighted sum of normalized light-curves. We then use quadratic programming to de-trend all individual light-curves based on these determined trends. Experimental results with synthetic light-curves containing artificial trends and events are presented. Results from other de-trending methods are also compared. The developed algorithm can be applied to time series for trend removal in both narrow and wide field astronomy.

[45] oai:arXiv.org:0904.0645 [pdf] - 315790

A Bayesian approach to the analysis of time symmetry in light curves: Reconsidering Scorpius X-1 occultations

Blocker, Alexander W.; Protopapas, Pavlos; Alcock, Charles R.

Comments: 24 pages, 18 figures. Preprint typeset using LaTeX style emulateapj v. 04/20/08

Submitted: 2009-04-04

We present a new approach to the analysis of time symmetry in light curves, such as those in the x-ray at the center of the Scorpius X-1 occultation debate. Our method uses a new parameterization for such events (the bilogistic event profile) and provides a clear, physically relevant characterization of each event's key features. We also demonstrate a Markov Chain Monte Carlo algorithm to carry out this analysis, including a novel independence chain configuration for the estimation of each event's location in the light curve. These tools are applied to the Scorpius X-1 light curves presented in Chang et al. (2007), providing additional evidence based on the time series that the events detected thus far are most likely not occultations by TNOs.

[46] oai:arXiv.org:0903.3036 [pdf] - 430442

A Search for Occultations of Bright Stars by Small Kuiper Belt Objects using Megacam on the MMT

Bianco, Federica B.; Protopapas, Pavlos; McLeod, Brian A.; Alcock, Charles R.; Holman, Matthew J.; Lehner, Matthew J.

Comments: 13 pages, 12 figures, submitted to AJ, modified fig 11 that did now display properly

Submitted: 2009-03-18, last modified: 2009-03-20

We conducted a search for occultations of bright stars by Kuiper Belt Objects (KBOs) to estimate the density of sub-km KBOs in the sky. We report here the first results of this occultation survey of the outer solar system conducted in June 2007 and June/July 2008 at the MMT Observatory using Megacam, the large MMT optical imager. We used Megacam in a novel shutterless continuous--readout mode to achieve high precision photometry at 200 Hz. We present an analysis of 220 star hours at signal-to-noise ratio of 25 or greater. The survey efficiency is greater than 10% for occultations by KBOs of diameter d>=0.7 km, and we report no detections in our dataset. We set a new 95% confidence level upper limit for the surface density \Sigma_N(d) of KBOs larger than 1 km: \Sigma_N(d>=1 km) <= 2.0e8 deg^-2, and for KBOs larger than 0.7 km \Sigma_N(d>= 0.7 km) <= 4.8e8 deg^-2.

[47] oai:arXiv.org:0902.1160 [pdf] - 250696

Reverberation in the UV-Optical Continuum Brightness Fluctuations of MACHO Quasar 13.5962.237

Schild, Rudolph E.; Lovegrove, Justin; Protopapas, Pavlos

Comments: 25 pages, 8 figures, submitted to Astronomical Journal

Submitted: 2009-02-06

We examine the nature of brightness fluctuations in the UV-Optical spectral region of an ordinary quasar with 881 optical brightness measurements made during the epoch 1993 - 1999. We find evidence for systematic trends having the character of a pattern of reverberations following an initial disturbance. The initial pulses have brightness increases of order 20% and pulse widths of 50 days, and the reverberations have typical amplitudes of 12% with longer mean pulse widths of order 80 days and pulse separations of order 90 days. The repeat pattern occurs over the same time scales whether the initial disturbance is a brightening or fading. The lags of the pulse trains are comparable to the lags seen previously in reverberation of the broad blue-shifted emission lines following brightness disturbances in Seyfert galaxies, when allowance is made for the mass of the central object. In addition to the burst pulse trains, we find evidence for a semi-periodicity with a time scale of 2 years. These strong patterns of brightness fluctuations suggest a method of discovering quasars from photometric monitoring alone, with data of the quality expected from large brightness monitoring programs like Pann-Stars and LSST.

[48] oai:arXiv.org:0901.3329 [pdf] - 20559

Event Discovery in Time Series

Preston, Dan; Protopapas, Pavlos; Brodley, Carla

Comments: 12 pages, 12 figures, accepted for publication at SIAM SDM09

Submitted: 2009-01-21

The discovery of events in time series can have important implications, such as identifying microlensing events in astronomical surveys, or changes in a patient's electrocardiogram. Current methods for identifying events require a sliding window of a fixed size, which is not ideal for all applications and could overlook important events. In this work, we develop probability models for calculating the significance of an arbitrary-sized sliding window and use these probabilities to find areas of significance. Because a brute force search of all sliding windows and all window sizes would be computationally intractable, we introduce a method for quickly approximating the results. We apply our method to over 100,000 astronomical time series from the MACHO survey, in which 56 different sections of the sky are considered, each with one or more known events. Our method was able to recover 100% of these events in the top 1% of the results, essentially pruning 99% of the data. Interestingly, our method was able to identify events that do not pass traditional event discovery procedures.

[49] oai:arXiv.org:0808.2051 [pdf] - 15406

First Results From The Taiwanese-American Occultation Survey (TAOS)

Comments: 5 pages, 5 figure, accepted in ApJ

Submitted: 2008-08-14

Results from the first two years of data from the Taiwanese-American Occultation Survey (TAOS) are presented. Stars have been monitored photometrically at 4 Hz or 5 Hz to search for occultations by small (~3 km) Kuiper Belt Objects (KBOs). No statistically significant events were found, allowing us to present an upper bound to the size distribution of KBOs with diameters 0.5 km < D < 28 km.

[50] oai:arXiv.org:0711.1617 [pdf] - 253851

Eclipsing binary stars in the Large and Small Magellanic Clouds from the MACHO project: The Sample

Faccioli, Lorenzo; Alcock, Charles; Cook, Kem; Prochter, Gabriel E.; Protopapas, Pavlos; Syphers, David

Comments: 67 pages, 40 figures

Submitted: 2007-11-10

We present a new sample of 4634 eclipsing binary stars in the Large Magellanic Cloud (LMC), expanding on a previous sample of 611 objects and a new sample of 1509 eclipsing binary stars in the Small Magellanic Cloud (SMC), that were identified in the light curve database of the MACHO project. We perform a cross correlation with the OGLE-II LMC sample, finding 1236 matches. A cross correlation with the OGLE-II SMC sample finds 698 matches. We then compare the LMC subsamples corresponding to center and the periphery of the LMC and find only minor differences between the two populations. These samples are sufficiently large and complete that statistical studies of the binary star populations are possible.

[51] oai:arXiv.org:astro-ph/0509103 [pdf] - 75685

Combined reconstruction of weak and strong lensing data with WSLAP

Diego, J. M.; Tegmark, M.; Protopapas, P.; Sandvik, H. B.

Comments: 10 pages. 9 figures. MNRAS submitted

Submitted: 2005-09-05

We describe a method to estimate the mass distribution of a gravitational lens and the position of the sources from combined strong and weak lensing data. The algorithm combines weak and strong lensing data in a unified way producing a solution which is valid in both the weak and strong lensing regimes. We study how the result depends on the relative weighting of the weak and strong lensing data and on choice of basis to represent the mass distribution. We find that combining weak and strong lensing information has two major advantages: it eliminates the need for priors and/or regularization schemes for the intrinsic size of the background galaxies (this assumption was needed in previous strong lensing algorithms) and it corrects for biases in the recovered mass in the outer regions where the strong lensing data is less sensitive. The code is implemented into a software package called WSLAP (Weak & Strong Lensing Analysis Package) which is publicly available at http://darwin.cfa.harvard.edu/SLAP/

[52] oai:arXiv.org:astro-ph/0505495 [pdf] - 73265

Finding outlier light-curves in catalogs of periodic variable stars

Protopapas, P.; Giammarco, J. M.; Faccioli, L.; Struble, M. F.; Dave, R.; Alcock, C.

Comments: 16 pages, 24 figures

Submitted: 2005-05-24

We present a methodology to discover outliers in catalogs of periodic light-curves. We use cross-correlation as measure of ``similarity'' between two individual light-curves and then classify light-curves with lowest average ``similarity'' as outliers. We performed the analysis on catalogs of variable stars of known type from the MACHO and OGLE projects and established that our method correctly identifies light-curves that do not belong to those catalogs as outliers. We show how our method can scale to large datasets that will be available in the near future such as those anticipated from Pan-STARRS and LSST.

[53] oai:arXiv.org:astro-ph/0408418 [pdf] - 66909

Non-parametric inversion of strong lensing systems

Diego, J. M.; Protopapas, P.; Sandvik, H. B; Tegmark, M.

Comments: This is the accepted version in MNRAS. Thsi includes improvements suggested by the referee and one new plot. Additional material can be found in http://darwin.cfa.harvard.edu/SLAP/index.aspx

Submitted: 2004-08-24, last modified: 2005-03-14

We revisit the issue of non-parametric gravitational lens reconstruction and present a new method to obtain the cluster mass distribution using strong lensing data without using any prior information on the underlying mass. The method relies on the decomposition of the lens plane into individual cells. We show how the problem in this approximation can be expressed as a system of linear equations for which a solution can be found. Moreover, we propose to include information about the null space. That is, make use of the pixels where we know there are no arcs above the sky noise. The only prior information is an estimation of the physical size of the sources. No priors on the luminosity of the cluster or shape of the halos are needed thus making the results very robust. In order to test the accuracy and bias of the method we make use of simulated strong lensing data. We find that the method reproduces accurately both the lens mass and source positions and provide error estimates.

[54] oai:arXiv.org:astro-ph/0502301 [pdf] - 71092

Fast identification of transits from light-curves

Protopapas, Pavlos; Jimenez, Raul; Alcock, Charles

Comments: 9 pages, 9 figures

Submitted: 2005-02-15

We present an algorithm that allows fast and efficient detection of transits, including planetary transits, from light-curves. The method is based on building an ensemble of fiducial models and compressing the data using the MOPED algorithm. We describe the method and demonstrate its efficiency by finding planet-like transits in simulated Pan-STARRS light-curves. We show that that our method is independent of the size of the search space of transit parameters. In large sets of light-curves, we achieve speed up factors of order of $10^{8}$ times over the full $\chi2$ search. We discuss how the algorithm can be used in forthcoming large surveys like Pan-STARRS and LSST and how it may be optimized for future space missions like Kepler and COROT where most of the processing must be done on board.

[55] oai:arXiv.org:astro-ph/0412191 [pdf] - 69595

Non-parametric mass reconstruction of A1689 from strong lensing data with SLAP

Diego, J. M.; Sandvik, H. B.; Protopapas, P.; Tegmark, M.; Benitez, N.; Broadhurst, T.

Comments: 11 pages, 12 figures. MNRAS submitted. A full resolution of the paper can be found in http://darwin.physics.upenn.edu/SLAP/

Submitted: 2004-12-08

We present the mass distribution in the central area of the cluster A1689 by fitting over 100 multiply lensed images with the non-parametric Strong Lensing Analysis Package (SLAP, Diego et al. 2004). The surface mass distribution is obtained in a robust way finding a total mass of 0.25E15 M_sun/h within a 70'' circle radius from the central peak. Our reconstructed density profile fits well an NFW profile with small perturbations due to substructure and is compatible with the more model dependent analysis of Broadhurst et al. (2004a) based on the same data. Our estimated mass does not rely on any prior information about the distribution of dark matter in the cluster. The peak of the mass distribution falls very close to the central cD and there is substructure near the center suggesting that the cluster is not fully relaxed. We also examine the effect on the recovered mass when we include the uncertainties in the redshift of the sources and in the original shape of the sources. Using simulations designed to mimic the data, we identify some biases in our reconstructed mass distribution. We find that the recovered mass is biased toward lower masses beyond 1 arcmin (150 kpc) from the central cD and that in the very center we may be affected by degeneracy problems. On the other hand, we confirm that the reconstructed mass between 25'' and 70'' is a robust, unbiased estimate of the true mass distribution and is compatible with an NFW profile.