Full-text search for arXiv

21 article(s) in total. 132 co-authors, from 1 to 12 common article(s). Median position in authors list is 3,0.

[1] oai:arXiv.org:2005.05404 [pdf] - 2124844

The VVV Infrared Variability Catalog (VIVA-I)

Comments: 27 pages, 14 figures

Submitted: 2020-05-11

Thanks to the VISTA Variables in the Via Lactea (VVV) ESO Public Survey it is now possible to explore a large number of objects in those regions. This paper addresses the variability analysis of all VVV point sources having more than 10 observations in VVVDR4 using a novel approach. In total, the near-IR light curves of 288,378,769 sources were analysed using methods developed in the New Insight Into Time Series Analysis project. As a result, we present a complete sample having 44, 998, 752 variable star candidates (VVV-CVSC), which include accurate individual coordinates, near-IR magnitudes (ZYJHKs), extinctions A(Ks), variability indices, periods, amplitudes, among other parameters to assess the science. Unfortunately, a side effect of having a highly complete sample, is also having a high level of contamination by non-variable (contamination ratio of non-variables to variables is slightly over 10:1). To deal with this, we also provide some flags and parameters that can be used by the community to de-crease the number of variable candidates without heavily decreasing the completeness of the sample. In particular, we cross-identified 339,601 of our sources with Simbad and AAVSO databases, which provide us with information for these objects at other wavelegths. This sub-sample constitutes a unique resource to study the corresponding near-IR variability of known sources as well as to assess the IR variability related with X-ray and Gamma-Ray sources. On the other hand, the other 99.5% sources in our sample constitutes a number of potentially new objects with variability information for the heavily crowded and reddened regions of the Galactic Plane and Bulge. The present results also provide an important queryable resource to perform variability analysis and to characterize ongoing and future surveys like TESS and LSST.

[2] oai:arXiv.org:2004.06226 [pdf] - 2077490

Classifying CMB time-ordered data through deep neural networks

Rojas, Felipe; Maurin, Loïc; Dünner, Rolando; Pichara, Karim

Comments: 9 pages, 6 figures

Submitted: 2020-04-13

The Cosmic Microwave Background (CMB) has been measured over a wide range of multipoles. Experiments with arc-minute resolution like the Atacama Cosmology Telescope (ACT) have contributed to the measurement of primary and secondary anisotropies, leading to remarkable scientific discoveries. Such findings require careful data selection in order to remove poorly-behaved detectors and unwanted contaminants. The current data classification methodology used by ACT relies on several statistical parameters that are assessed and fine-tuned by an expert. This method is highly time-consuming and band or season-specific, which makes it less scalable and efficient for future CMB experiments. In this work, we propose a supervised machine learning model to classify detectors of CMB experiments. The model corresponds to a deep convolutional neural network. We tested our method on real ACT data, using the 2008 season, 148 GHz, as training set with labels provided by the ACT data selection software. The model learns to classify time-streams starting directly from the raw data. For the season and frequency considered during the training, we find that our classifier reaches a precision of 99.8%. For 220 and 280 GHz data, season 2008, we obtained 99.4% and 97.5% of precision, respectively. Finally, we performed a cross-season test over 148 GHz data from 2009 and 2010 for which our model reaches a precision of 99.8% and 99.5%, respectively. Our model is about 10x faster than the current pipeline, making it potentially suitable for real-time implementations.

[3] oai:arXiv.org:2002.00994 [pdf] - 2046524

Scalable End-to-end Recurrent Neural Network for Variable star classification

Becker, Ignacio; Pichara, Karim; Catelan, Márcio; Protopapas, Pavlos; Aguirre, Carlos; Nikzat, Fatemeh

Comments: 15 pages, 17 figures. To be published in MNRAS

Submitted: 2020-02-03

During the last decade, considerable effort has been made to perform automatic classification of variable stars using machine learning techniques. Traditionally, light curves are represented as a vector of descriptors or features used as input for many algorithms. Some features are computationally expensive, cannot be updated quickly and hence for large datasets such as the LSST cannot be applied. Previous work has been done to develop alternative unsupervised feature extraction algorithms for light curves, but the cost of doing so still remains high. In this work, we propose an end-to-end algorithm that automatically learns the representation of light curves that allows an accurate automatic classification. We study a series of deep learning architectures based on Recurrent Neural Networks and test them in automated classification scenarios. Our method uses minimal data preprocessing, can be updated with a low computational cost for new observations and light curves, and can scale up to massive datasets. We transform each light curve into an input matrix representation whose elements are the differences in time and magnitude, and the outputs are classification probabilities. We test our method in three surveys: OGLE-III, Gaia and WISE. We obtain accuracies of about $95\%$ in the main classes and $75\%$ in the majority of subclasses. We compare our results with the Random Forest classifier and obtain competitive accuracies while being faster and scalable. The analysis shows that the computational complexity of our approach grows up linearly with the light curve size, while the traditional approach cost grows as $N\log{(N)}$.

[4] oai:arXiv.org:1912.02235 [pdf] - 2026504

Streaming Classification of Variable Stars

Zorich, Lukas; Pichara, Karim; Protopapas, Pavlos

Comments:

Submitted: 2019-12-04

In the last years, automatic classification of variable stars has received substantial attention. Using machine learning techniques for this task has proven to be quite useful. Typically, machine learning classifiers used for this task require to have a fixed training set, and the training process is performed offline. Upcoming surveys such as the Large Synoptic Survey Telescope (LSST) will generate new observations daily, where an automatic classification system able to create alerts online will be mandatory. A system with those characteristics must be able to update itself incrementally. Unfortunately, after training, most machine learning classifiers do not support the inclusion of new observations in light curves, they need to re-train from scratch. Naively re-training from scratch is not an option in streaming settings, mainly because of the expensive pre-processing routines required to obtain a vector representation of light curves (features) each time we include new observations. In this work, we propose a streaming probabilistic classification model; it uses a set of newly designed features that work incrementally. With this model, we can have a machine learning classifier that updates itself in real time with new observations. To test our approach, we simulate a streaming scenario with light curves from CoRot, OGLE and MACHO catalogs. Results show that our model achieves high classification performance, staying an order of magnitude faster than traditional classification approaches.

[5] oai:arXiv.org:1911.02444 [pdf] - 2026263

An Information Theory Approach on Deciding Spectroscopic Follow Ups

Astudillo, Javiera; Protopapas, Pavlos; Pichara, Karim; Huijse, Pablo

Comments:

Submitted: 2019-11-06

Classification and characterization of variable phenomena and transient phenomena are critical for astrophysics and cosmology. These objects are commonly studied using photometric time series or spectroscopic data. Given that many ongoing and future surveys are in time-domain and given that adding spectra provide further insights but requires more observational resources, it would be valuable to know which objects should we prioritize to have spectrum in addition to time series. We propose a methodology in a probabilistic setting that determines a-priory which objects are worth taking spectrum to obtain better insights, where we focus 'insight' as the type of the object (classification). Objects for which we query its spectrum are reclassified using their full spectrum information. We first train two classifiers, one that uses photometric data and another that uses photometric and spectroscopic data together. Then for each photometric object we estimate the probability of each possible spectrum outcome. We combine these models in various probabilistic frameworks (strategies) which are used to guide the selection of follow up observations. The best strategy depends on the intended use, whether it is getting more confidence or accuracy. For a given number of candidate objects (127, equal to 5% of the dataset) for taking spectra, we improve 37% class prediction accuracy as opposed to 20% of a non-naive (non-random) best base-line strategy. Our approach provides a general framework for follow-up strategies and can be extended beyond classification and to include other forms of follow-ups beyond spectroscopy.

[6] oai:arXiv.org:1903.03254 [pdf] - 1846983

An Algorithm for the Visualization of Relevant Patterns in Astronomical Light Curves

Pieringer, Christian; Pichara, Karim; Catelán, Márcio; Protopapas, Pavlos

Comments: Accepted 2019 January 8. Received 2019 January 8; in original form 2018 January 29. 7 pages, 6 figures

Submitted: 2019-03-07

Within the last years, the classification of variable stars with Machine Learning has become a mainstream area of research. Recently, visualization of time series is attracting more attention in data science as a tool to visually help scientists to recognize significant patterns in complex dynamics. Within the Machine Learning literature, dictionary-based methods have been widely used to encode relevant parts of image data. These methods intrinsically assign a degree of importance to patches in pictures, according to their contribution in the image reconstruction. Inspired by dictionary-based techniques, we present an approach that naturally provides the visualization of salient parts in astronomical light curves, making the analogy between image patches and relevant pieces in time series. Our approach encodes the most meaningful patterns such that we can approximately reconstruct light curves by just using the encoded information. We test our method in light curves from the OGLE-III and StarLight databases. Our results show that the proposed model delivers an automatic and intuitive visualization of relevant light curve parts, such as local peaks and drops in magnitude.

[7] oai:arXiv.org:1810.09440 [pdf] - 1775748

Deep multi-survey classification of variable stars

Aguirre, Carlos; Pichara, Karim; Becker, Ignacio

Comments: Accepted for publication in Monthly Notices of the Royal Astronomical Society

Submitted: 2018-10-21

During the last decade, a considerable amount of effort has been made to classify variable stars using different machine learning techniques. Typically, light curves are represented as vectors of statistical descriptors or features that are used to train various algorithms. These features demand big computational powers that can last from hours to days, making impossible to create scalable and efficient ways of automatically classifying variable stars. Also, light curves from different surveys cannot be integrated and analyzed together when using features, because of observational differences. For example, having variations in cadence and filters, feature distributions become biased and require expensive data-calibration models. The vast amount of data that will be generated soon make necessary to develop scalable machine learning architectures without expensive integration techniques. Convolutional Neural Networks have shown impressing results in raw image classification and representation within the machine learning literature. In this work, we present a novel Deep Learning model for light curve classification, mainly based on convolutional units. Our architecture receives as input the differences between time and magnitude of light curves. It captures the essential classification patterns regardless of cadence and filter. In addition, we introduce a novel data augmentation schema for unevenly sampled time series. We test our method using three different surveys: OGLE-III; Corot; and VVV, which differ in filters, cadence, and area of the sky. We show that besides the benefit of scalability, our model obtains state of the art levels accuracy in light curve classification benchmarks.

[8] oai:arXiv.org:1802.02575 [pdf] - 1631950

New variable Stars from the Photographic Archive: Semi-automated Discoveries, Attempts of Automatic Classification, and the New Field 104 Her

Antipin, S. V.; Becker, I.; Belinski, A. A.; Kolesnikova, D. M.; Pichara, K.; Samus, N. N.; Sokolovsky, K. V.; Zharova, A. V.; Zubareva, A. M.

Comments: 14 pages, 3 figures, 1 table; accepted to RAA

Submitted: 2018-02-07

Using 172 plates taken with the 40-cm astrograph of the Sternberg Astronomical Institute (Lomonosov Moscow University) in 1976-1994 and digitized with the resolution of 2400 dpi, we discovered and studied 275 new variable stars. We present the list of our new variables with all necessary information concerning their brightness variations. As in our earlier studies, the new discoveries show a rather large number of high-amplitude Delta Scuti variables, predicting that many stars of this type remain not detected in the whole sky. We also performed automated classification of the newly discovered variable stars based on the Random Forest algorithm. The results of the automated classification were compared to traditional classification and showed that automated classification was possible even with noisy photographic data. However, further improvement of automated techniques is needed, which is especially important having in mind the very large numbers of new discoveries expected from all-sky surveys.

[9] oai:arXiv.org:1801.09737 [pdf] - 1626567

Automatic Survey-Invariant Variable Star Classification

Benavente, Patricio; Protopapas, Pavlos; Pichara, Karim

Comments:

Submitted: 2018-01-29

Machine learning techniques have been successfully used to classify variable stars on widely-studied astronomical surveys. These datasets have been available to astronomers long enough, thus allowing them to perform deep analysis over several variable sources and generating useful catalogs with identified variable stars. The products of these studies are labeled data that enable supervised learning models to be trained successfully. However, when these models are blindly applied to data from new sky surveys their performance drops significantly. Furthermore, unlabeled data becomes available at a much higher rate than its labeled counterpart, since labeling is a manual and time-consuming effort. Domain adaptation techniques aim to learn from a domain where labeled data is available, the \textit{source domain}, and through some adaptation perform well on a different domain, the \textit{target domain}. We propose a full probabilistic model that represents the joint distribution of features from two surveys as well as a probabilistic transformation of the features between one survey to the other. This allows us to transfer labeled data to a study where it is not available and to effectively run a variable star classification model in a new survey. Our model represents the features of each domain as a Gaussian mixture and models the transformation as a translation, rotation and scaling of each separate component. We perform tests using three different variability catalogs: EROS, MACHO, and HiTS, presenting differences among them, such as the amount of observations per star, cadence, observational time and optical bands observed, among others.

[10] oai:arXiv.org:1801.09723 [pdf] - 1626563

Unsupervised Classification of Variable Stars

Valenzuela, Lucas; Pichara, Karim

Comments: Accepted in November 2017

Submitted: 2018-01-29

During the last ten years, a considerable amount of effort has been made to develop algorithms for automatic classification of variable stars. That has been primarily achieved by applying machine learning methods to photometric datasets where objects are represented as light curves. Classifiers require training sets to learn the underlying patterns that allow the separation among classes. Unfortunately, building training sets is an expensive process that demands a lot of human efforts. Every time data comes from new surveys; the only available training instances are the ones that have a cross-match with previously labelled objects, consequently generating insufficient training sets compared with the large amounts of unlabelled sources. In this work, we present an algorithm that performs unsupervised classification of variable stars, relying only on the similarity among light curves. We tackle the unsupervised classification problem by proposing an untraditional approach. Instead of trying to match classes of stars with clusters found by a clustering algorithm, we propose a query based method where astronomers can find groups of variable stars ranked by similarity. We also develop a fast similarity function specific for light curves, based on a novel data structure that allows scaling the search over the entire dataset of unlabelled objects. Experiments show that our unsupervised model achieves high accuracy in the classification of different types of variable stars and that the proposed algorithm scales up to massive amounts of light curves.

[11] oai:arXiv.org:1801.09732 [pdf] - 1626565

Uncertain classification of Variable Stars: handling observational GAPS and noise

Castro, Nicolas; Protopapas, Pavlos; Pichara, Karim

Comments:

Submitted: 2018-01-29

Automatic classification methods applied to sky surveys have revolutionized the astronomical target selection process. Most surveys generate a vast amount of time series, or \quotes{lightcurves}, that represent the brightness variability of stellar objects in time. Unfortunately, lightcurves' observations take several years to be completed, producing truncated time series that generally remain without the application of automatic classifiers until they are finished. This happens because state of the art methods rely on a variety of statistical descriptors or features that present an increasing degree of dispersion when the number of observations decreases, which reduces their precision. In this paper we propose a novel method that increases the performance of automatic classifiers of variable stars by incorporating the deviations that scarcity of observations produces. Our method uses Gaussian Process Regression to form a probabilistic model of each lightcurve's observations. Then, based on this model, bootstrapped samples of the time series features are generated. Finally a bagging approach is used to improve the overall performance of the classification. We perform tests on the MACHO and OGLE catalogs, results show that our method classifies effectively some variability classes using a small fraction of the original observations. For example, we found that RR Lyrae stars can be classified with around 80\% of accuracy just by observing the first 5\% of the whole lightcurves' observations in MACHO and OGLE catalogs. We believe these results prove that, when studying lightcurves, it is important to consider the features' error and how the measurement process impacts it.

[12] oai:arXiv.org:1602.08977 [pdf] - 1388963

Clustering Based Feature Learning on Variable Stars

Mackenzie, Cristóbal; Pichara, Karim; Protopapas, Pavlos

Comments:

Submitted: 2016-02-29

The success of automatic classification of variable stars strongly depends on the lightcurve representation. Usually, lightcurves are represented as a vector of many statistical descriptors designed by astronomers called features. These descriptors commonly demand significant computational power to calculate, require substantial research effort to develop and do not guarantee good performance on the final classification task. Today, lightcurve representation is not entirely automatic; algorithms that extract lightcurve features are designed by humans and must be manually tuned up for every survey. The vast amounts of data that will be generated in future surveys like LSST mean astronomers must develop analysis pipelines that are both scalable and automated. Recently, substantial efforts have been made in the machine learning community to develop methods that prescind from expert-designed and manually tuned features for features that are automatically learned from data. In this work we present what is, to our knowledge, the first unsupervised feature learning algorithm designed for variable stars. Our method first extracts a large number of lightcurve subsequences from a given set of photometric data, which are then clustered to find common local patterns in the time series. Representatives of these patterns, called exemplars, are then used to transform lightcurves of a labeled set into a new representation that can then be used to train an automatic classifier. The proposed algorithm learns the features from both labeled and unlabeled lightcurves, overcoming the bias generated when the learning process is done only with labeled data. We test our method on MACHO and OGLE datasets; the results show that the classification performance we achieve is as good and in some cases better than the performance achieved using traditional features, while the computational cost is significantly lower.

[13] oai:arXiv.org:1601.03013 [pdf] - 1365579

Meta Classification for Variable Stars

Pichara, Karim; Protopapas, Pavlos; León, Daniel

Comments: Accepted for publication, The Astrophysical Journal

Submitted: 2016-01-12

The need for the development of automatic tools to explore astronomical databases has been recognized since the inception of CCDs and modern computers. Astronomers already have developed solutions to tackle several science problems, such as automatic classification of stellar objects, outlier detection, and globular clusters identification, among others. New science problems emerge and it is critical to be able to re-use the models learned before, without rebuilding everything from the beginning when the science problem changes. In this paper, we propose a new meta-model that automatically integrates existing classification models of variable stars. The proposed meta-model incorporates existing models that are trained in a different context, answering different questions and using different representations of data. Conventional mixture of experts algorithms in machine learning literature can not be used since each expert (model) uses different inputs. We also consider computational complexity of the model by using the most expensive models only when it is necessary. We test our model with EROS-2 and MACHO datasets, and we show that we solve most of the classification challenges only by training a meta-model to learn how to integrate the previous experts.

[14] oai:arXiv.org:1506.00010 [pdf] - 1269336

FATS: Feature Analysis for Time Series

Nun, Isadora; Protopapas, Pavlos; Sim, Brandon; Zhu, Ming; Dave, Rahul; Castro, Nicolas; Pichara, Karim

Comments:

Submitted: 2015-05-29, last modified: 2015-08-31

In this paper, we present the FATS (Feature Analysis for Time Series) library. FATS is a Python library which facilitates and standardizes feature extraction for time series data. In particular, we focus on one application: feature extraction for astronomical light curve data, although the library is generalizable for other uses. We detail the methods and features implemented for light curve analysis, and present examples for its usage.

[15] oai:arXiv.org:1405.5298 [pdf] - 1311894

Photometric Classification of quasars from RCS-2 using Random Forest

Carrasco, D.; Barrientos, L. F.; Pichara, K.; Anguita, T.; Murphy, D. N. A.; Gilbank, D. G.; Gladders, M. D.; Yee, H. K. C.; Hsieh, B. C.; López, S.

Comments: Accepted for publication in A&A. 20 pages, 16 figures, 6 tables. Tables 1, 2 and 3 with the quasar candidates are only available in electronic format http://www.aanda.org/

Submitted: 2014-05-21, last modified: 2015-08-24

Aims. Construction of a new quasar candidate catalog from the Red-Sequence Cluster Survey 2 (RCS-2), identified solely from photometric information using an automated algorithm suitable for large surveys. The algorithm performance is tested using a well-defined SDSS spectroscopic sample of quasars and stars. Methods. The Random Forest algorithm constructs the catalog from RCS-2 point sources using SDSS spectroscopically-confirmed stars and quasars. The algorithm identifies putative quasars from broadband magnitudes (g, r, i, z) and colours. Exploiting NUV GALEX measurements for a subset of the objects, we refine the classifier by adding new information. An additional subset of the data with WISE W1 and W2 bands is also studied. Results. Upon analyzing 542,897 RCS-2 point sources, the algorithm identified 21,501 quasar candidates, with a training-set-derived precision (the fraction of true positives within the group assigned quasar status) of 89.5% and recall (the fraction of true positives relative to all sources that actually are quasars) of 88.4%. These performance metrics improve for the GALEX subset; 6,530 quasar candidates are identified from 16,898 sources, with a precision and recall respectively of 97.0% and 97.5%. Algorithm performance is further improved when WISE data are included, with precision and recall increasing to 99.3% and 99.1% respectively for 21,834 quasar candidates from 242,902 sources. We compile our final catalog (38,257) by merging these samples and removing duplicates. An observational follow up of 17 bright (r < 19) candidates with long-slit spectroscopy at DuPont telescope (LCO) yields 14 confirmed quasars. Conclusions. The results signal encouraging progress in the classification of point sources with Random Forest algorithms to search for quasars within current and future large-area photometric surveys.

[16] oai:arXiv.org:1404.4888 [pdf] - 1085209

Supervised detection of anomalous light-curves in massive astronomical catalogs

Nun, Isadora; Pichara, Karim; Protopapas, Pavlos; Kim, Dae-Won

Comments: 16 pages, 18 figures, published in The Astrophysical Journal

Submitted: 2014-04-18, last modified: 2015-05-27

The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. To process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new method to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all the information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each of the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. Our method is suitable for exploring massive datasets given that the training process is performed offline. We tested our algorithm on 20 millions light-curves from the MACHO catalog and generated a list of anomalous candidates. We divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post analysis stage by perfoming a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables and X-ray sources. For some outliers there were no additional information. Among them we identified three unknown variability types and few individual outliers that will be followed up for a deeper analysis.

[17] oai:arXiv.org:1405.4517 [pdf] - 862800

The VVV Templates Project. Towards an Automated Classification of VVV Light-Curves. I. Building a database of stellar variability in the near-infrared

Comments: 12 pages, 16 figures, 3 tables, accepted for publication in A&A. Most of the data are now accessible through http://www.vvvtemplates.org/

Submitted: 2014-05-18, last modified: 2014-06-03

Context. The Vista Variables in the V\'ia L\'actea (VVV) ESO Public Survey is a variability survey of the Milky Way bulge and an adjacent section of the disk carried out from 2010 on ESO Visible and Infrared Survey Telescope for Astronomy (VISTA). VVV will eventually deliver a deep near-IR atlas with photometry and positions in five passbands (ZYJHK_S) and a catalogue of 1-10 million variable point sources - mostly unknown - which require classifications. Aims. The main goal of the VVV Templates Project, that we introduce in this work, is to develop and test the machine-learning algorithms for the automated classification of the VVV light-curves. As VVV is the first massive, multi-epoch survey of stellar variability in the near-infrared, the template light-curves that are required for training the classification algorithms are not available. In the first paper of the series we describe the construction of this comprehensive database of infrared stellar variability. Methods. First we performed a systematic search in the literature and public data archives, second, we coordinated a worldwide observational campaign, and third we exploited the VVV variability database itself on (optically) well-known stars to gather high-quality infrared light-curves of several hundreds of variable stars. Results. We have now collected a significant (and still increasing) number of infrared template light-curves. This database will be used as a training-set for the machine-learning algorithms that will automatically classify the light-curves produced by VVV. The results of such an automated classification will be covered in forthcoming papers of the series.

[18] oai:arXiv.org:1310.1996 [pdf] - 741931

Stellar Variability in the VVV survey

Comments: Presented at the "40 Years of Variable Stars: A Celebration of Contributions by Horace A. Smith" conference (arXiv:1310.0149). 49 pages, 28 figures

Submitted: 2013-10-07, last modified: 2013-11-04

The Vista Variables in the V\'ia L\'actea (VVV) ESO Public Survey is an ongoing time-series, near-infrared (IR) survey of the Galactic bulge and an adjacent portion of the inner disk, covering 562 square degrees of the sky, using ESO's VISTA telescope. The survey has provided superb multi-color photometry in 5 broadband filters ($Z$, $Y$, $J$, $H$, and $K_s$), leading to the best map of the inner Milky Way ever obtained, particularly in the near-IR. The main variability part of the survey, which is focused on $K_s$-band observations, is currently underway, with bulge fields having been observed between 31 and 70 times, and disk fields between 17 and 36 times. When the survey is complete, bulge (disk) fields will have been observed up to a total of 100 (60) times, providing unprecedented depth and time coverage. Here we provide a first overview of stellar variability in the VVV data, including examples of the light curves that have been collected thus far, scientific applications, and our efforts towards the automated classification of VVV light curves.

[19] oai:arXiv.org:1310.7868 [pdf] - 739149

Automatic Classification of Variable Stars in Catalogs with missing data

Pichara, Karim; Protopapas, Pavlos

Comments:

Submitted: 2013-10-29

We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks, a probabilistic graphical model, that allows us to perform inference to pre- dict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilises sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model we use three catalogs with missing data (SAGE, 2MASS and UBVI) and one complete catalog (MACHO). We examine how classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches and at what computational cost. Integrating these catalogs with missing data we find that classification of variable objects improves by few percent and by 15% for quasar detection while keeping the computational cost the same.

[20] oai:arXiv.org:1304.0401 [pdf] - 646062

An improved quasar detection method in EROS-2 and MACHO LMC datasets

Pichara, Karim; Protopapas, Pavlos; Kim, Dae-Won; Marquette, Jean-Baptiste; Tisserand, Patrick

Comments:

Submitted: 2013-04-01

We present a new classification method for quasar identification in the EROS-2 and MACHO datasets based on a boosted version of Random Forest classifier. We use a set of variability features including parameters of a continuous auto regressive model. We prove that continuous auto regressive parameters are very important discriminators in the classification process. We create two training sets (one for EROS-2 and one for MACHO datasets) using known quasars found in the LMC. Our model's accuracy in both EROS-2 and MACHO training sets is about 90% precision and 86% recall, improving the state of the art models accuracy in quasar detection. We apply the model on the complete, including 28 million objects, EROS-2 and MACHO LMC datasets, finding 1160 and 2551 candidates respectively. To further validate our list of candidates, we crossmatched our list with a previous 663 known strong candidates, getting 74% of matches for MACHO and 40% in EROS-2. The main difference on matching level is because EROS-2 is a slightly shallower survey which translates to significantly lower signal-to-noise ratio lightcurves.

[21] oai:arXiv.org:1105.1119 [pdf] - 369640

The Vista Variables in the Via Lactea (VVV) ESO Public Survey: Current Status and First Results

Comments: 25 pages, 18 figures. To appear in the Carnegie Observatories Astrophysics Series, Volume 5

Submitted: 2011-05-05, last modified: 2011-06-07

Vista Variables in the Via Lactea (VVV) is an ESO Public Survey that is performing a variability survey of the Galactic bulge and part of the inner disk using ESO's Visible and Infrared Survey Telescope for Astronomy (VISTA). The survey covers 520 deg^2 of sky area in the ZYJHK_S filters, for a total observing time of 1929 hours, including ~ 10^9 point sources and an estimated ~ 10^6 variable stars. Here we describe the current status of the VVV Survey, in addition to a variety of new results based on VVV data, including light curves for variable stars, newly discovered globular clusters, open clusters, and associations. A set of reddening-free indices based on the ZYJHK_S system is also introduced. Finally, we provide an overview of the VVV Templates Project, whose main goal is to derive well-defined light curve templates in the near-IR, for the automated classification of VVV light curves.