Full-text search for arXiv

Babu, G. J.

Normalized to: Babu, G.

13 article(s) in total. 77 co-authors, from 1 to 9 common article(s). Median position in authors list is 3,0.

[1] oai:arXiv.org:1712.00356 [pdf] - 2103936

Some Optimizations on Detecting Gravitational Wave Using Convolutional Neural Network

Li, Xiangru; Yu, Woliang; Fan, Xilong; Babu, G. Jogesh

Comments: 13 pages, 8 figures

Submitted: 2017-12-01, last modified: 2020-05-29

This work investigates the problem of detecting gravitational wave (GW) events based on simulated damped sinusoid signals contaminated with white Gaussian noise. It is treated as a classification problem with one class for the interesting events. The proposed scheme consists of the following two successive steps: decomposing the data using a wavelet packet, representing the GW signal and noise using the derived decomposition coefficients; and determining the existence of any GW event using a convolutional neural network (CNN) with a logistic regression output layer. The characteristics of this work is its comprehensive investigations on CNN structure, detection window width, data resolution, wavelet packet decomposition and detection window overlap scheme. Extensive simulation experiments show excellent performances for reliable detection of signals with a range of GW model parameters and signal-to-noise ratios. While we use a simple waveform model in this study, we expect the method to be particularly valuable when the potential GW shapes are too complex to be characterized with a template bank.

[2] oai:arXiv.org:2005.13025 [pdf] - 2102989

21st Century Statistical and Computational Challenges in Astrophysics

Feigelson, Eric D.; de Souza, Rafael S.; Ishida, Emille E. O.; Babu, Gutti Jogesh

Comments: Accepted for publication in volume 8 of Annual Reviews of Statistics and Its Application. 26 pages, 7 figures

Submitted: 2020-05-26

Modern astronomy has been rapidly increasing our ability to see deeper into the universe, acquiring enormous samples of cosmic populations. Gaining astrophysical insights from these datasets requires a wide range of sophisticated statistical and machine learning methods. Long-standing problems in cosmology include characterization of galaxy clustering and estimation of galaxy distances from photometric colors. Bayesian inference, central to linking astronomical data to nonlinear astrophysical models, addresses problems in solar physics, properties of star clusters, and exoplanet systems. Likelihood-free methods are growing in importance. Detection of faint signals in complicated noise is needed to find periodic behaviors in stars and detect explosive gravitational wave events. Open issues concern treatment of heteroscedastic measurement errors and understanding probability distributions characterizing astrophysical systems. The field of astrostatistics needs increased collaboration with statisticians in the design and analysis stages of research projects, and to jointly develop new statistical methodologies. Together, they will draw more astrophysical insights into astronomical populations and the cosmos itself.

[3] oai:arXiv.org:1911.02479 [pdf] - 1994455

Algorithms and Statistical Models for Scientific Discovery in the Petabyte Era

Comments: arXiv admin note: substantial text overlap with arXiv:1905.05116

Submitted: 2019-11-04

The field of astronomy has arrived at a turning point in terms of size and complexity of both datasets and scientific collaboration. Commensurately, algorithms and statistical models have begun to adapt --- e.g., via the onset of artificial intelligence --- which itself presents new challenges and opportunities for growth. This white paper aims to offer guidance and ideas for how we can evolve our technical and collaborative frameworks to promote efficient algorithmic development and take advantage of opportunities for scientific discovery in the petabyte era. We discuss challenges for discovery in large and complex data sets; challenges and requirements for the next stage of development of statistical methodologies and algorithmic tool sets; how we might change our paradigms of collaboration and education; and the ethical implications of scientists' contributions to widely applicable algorithms and computational modeling. We start with six distinct recommendations that are supported by the commentary following them. This white paper is related to a larger corpus of effort that has taken place within and around the Petabytes to Science Workshops (https://petabytestoscience.github.io/).

[4] oai:arXiv.org:1905.09852 [pdf] - 1920949

AutoRegressive Planet Search: Application to the Kepler Mission

Caceres, Gabriel A.; Feigelson, Eric D.; Babu, G. Jogesh; Bahamonde, Natalia; Christen, Alejandra; Bertin, Karine; Meza, Cristian; Curé, Michel

Comments: 66 pages with 26 figures and 5 tables to appear in the Astronomical Journal. A version with high-resolution graphics, machine readable tables and FigureSet for the 97KACTs is available at https://drive.google.com/drive/folders/107xZIAj3C0HHqsW66Xc-8FA6O30EncVt?usp=sharing

Submitted: 2019-05-23

The 4-year light curves of 156,717 stars observed with NASA's Kepler mission are analyzed using the AutoRegressive Planet Search (ARPS) methodology described by Caceres et al. (2019). The three stages of processing are: maximum likelihood ARIMA modeling of the light curves to reduce stellar brightness variations; constructing the Transit Comb Filter periodogram to identify transit-like periodic dips in the ARIMA residuals; Random Forest classification trained on Kepler Team confirmed planets using several dozen features from the analysis. Orbital periods between 0.2 and 100 days are examined. The result is a recovery of 76% of confirmed planets, 97% when period and transit depth constraints are added. The classifier is then applied to the full Kepler dataset; 1,004 previously noticed and 97 new stars have light curve criteria consistent with the confirmed planets, after subjective vetting removes clear False Alarms and False Positive cases. The 97 Kepler ARPS Candidate Transits mostly have periods $P<10$ days; many are UltraShort Period hot planets with radii $<1$% of the host star. Extensive tabular and graphical output from the ARPS time series analysis is provided to assist in other research relating to the Kepler sample.

[5] oai:arXiv.org:1901.05116 [pdf] - 1920819

AutoRegressive Planet Search: Methodology

Caceres, Gabriel A.; Feigelson, Eric D.; Babu, G. Jogesh; Bahamonde, Natalia; Christen, Alejandra; Bertin, Karine; Meza, Cristian; Curé, Michel

Comments: 40 pages, 12 figures, to appear in the Astronomical Journal

Submitted: 2019-01-15, last modified: 2019-05-14

The detection of periodic signals from transiting exoplanets is often impeded by extraneous aperiodic photometric variability, either intrinsic to the star or arising from the measurement process. Frequently, these variations are autocorrelated wherein later flux values are correlated with previous ones. In this work, we present the methodology of the Autoregessive Planet Search (ARPS) project which uses Autoregressive Integrated Moving Average (ARIMA) and related statistical models that treat a wide variety of stochastic processes, as well as nonstationarity, to improve detection of new planetary transits. Providing a time series is evenly spaced or can be placed on an evenly spaced grid with missing values, these low-dimensional parametric models can prove very effective. We introduce a planet-search algorithm to detect periodic transits in the residuals after the application of ARIMA models. Our matched-filter algorithm, the Transit Comb Filter (TCF), is closely related to the traditional Box-fitting Least Squares and provides an analogous periodogram. Finally, if a previously identified or simulated sample of planets is available, selected scalar features from different stages of the analysis -- the original light curves, ARIMA fits, TCF periodograms, and folded light curves -- can be collectively used with a multivariate classifier to identify promising candidates while efficiently rejecting false alarms. We use Random Forests for this task, in conjunction with Receiver Operating Characteristic (ROC) curves, to define discovery criteria for new, high fidelity planetary candidates. The ARPS methodology can be applied to both evenly spaced satellite light curves and densely cadenced ground-based photometric surveys.

[6] oai:arXiv.org:1901.08003 [pdf] - 1820264

Autoregressive Times Series Methods for Time Domain Astronomy

Feigelson, Eric D.; Babu, G. Jogesh; Caceres, Gabriel A.

Comments: 17 pages, 4 figures, published in 'Frontiers of Physics', vol 6, id. 80 (2018)

Submitted: 2019-01-23

Celestial objects exhibit a wide range of variability in brightness at different wavebands. Surprisingly, the most common methods for characterizing time series in statistics -- parametric autoregressive modeling -- is rarely used to interpret astronomical light curves. We review standard ARMA, ARIMA and ARFIMA (autoregressive moving average fractionally integrated) models that treat short-memory autocorrelation, long-memory $1/f^\alpha$ `red noise', and nonstationary trends. Though designed for evenly spaced time series, moderately irregular cadences can be treated as evenly-spaced time series with missing data. Fitting algorithms are efficient and software implementations are widely available. We apply ARIMA models to light curves of four variable stars, discussing their effectiveness for different temporal characteristics. A variety of extensions to ARIMA are outlined, with emphasis on recently developed continuous-time models like CARMA and CARFIMA designed for irregularly spaced time series. Strengths and weakness of ARIMA-type modeling for astronomical data analysis and astrophysical insights are reviewed.

[7] oai:arXiv.org:1302.0387 [pdf] - 1159458

VOStat: A Statistical Web Service for Astronomers

Chakraborty, Arnab; Feigelson, Eric D.; Babu, G. Jogesh

Comments:

Submitted: 2013-02-02

VOStat is a Web service providing interactive statistical analysis of astronomical tabular datasets. It is integrated into the suite of analysis and visualization tools associated with the international Virtual Observatory (VO) through the SAMP communication system. A user supplies VOStat with a dataset extracted from the VO, or otherwise acquired, and chooses among $\sim 60$ statistical functions. These include data transformations, plots and summaries, density estimation, one- and two-sample hypothesis tests, global and local regressions, multivariate analysis and clustering, spatial analysis, directional statistics, survival analysis (for censored data like upper limits), and time series analysis. The statistical operations are performed using the public domain {\bf R} statistical software environment, including a small fraction of its $>4000$ {\bf CRAN} add-on packages. The purpose of VOStat is to facilitate a wider range of statistical analyses than are commonly used in astronomy, and to promote use of more advanced methodology in {\bf R} and {\bf CRAN}.

[8] oai:arXiv.org:1211.5602 [pdf] - 1158005

The Astrophysical Multimessenger Observatory Network (AMON)

Comments: 32 pages, 4 figures

Submitted: 2012-11-23

We summarize the science opportunity, design elements, current and projected partner observatories, and anticipated science returns of the Astrophysical Multimessenger Observatory Network (AMON). AMON will link multiple current and future high-energy, multimessenger, and follow-up observatories together into a single network, enabling near real-time coincidence searches for multimessenger astrophysical transients and their electromagnetic counterparts. Candidate and high-confidence multimessenger transient events will be identified, characterized, and distributed as AMON alerts within the network and to interested external observers, leading to follow-up observations across the electromagnetic spectrum. In this way, AMON aims to evoke the discovery of multimessenger transients from within observatory subthreshold data streams and facilitate the exploitation of these transients for purposes of astronomy and fundamental physics. As a central hub of global multimessenger science, AMON will also enable cross-collaboration analyses of archival datasets in search of rare or exotic astrophysical phenomena.

[9] oai:arXiv.org:1205.2064 [pdf] - 510411

Statistical Methods for Astronomy

Feigelson, Eric D.; Babu, G. Jogesh

Comments: 48 pages, 2 figures. Adapted from `Statistical Methods for Astronomy' to appear in `Astronomical Techniques, Software, and Data', volume 2 (Howard Bond, editor) of `Planets, Stars, and Stellar Systems' (Terry Ostwalt, editor) to be published by Springer Science+Business Media

Submitted: 2012-05-09

This review outlines concepts of mathematical statistics, elements of probability theory, hypothesis tests and point estimation for use in the analysis of modern astronomical data. Least squares, maximum likelihood, and Bayesian approaches to statistical inference are treated. Resampling methods, particularly the bootstrap, provide valuable procedures when distributions functions of statistics are not known. Several approaches to model selection and good- ness of fit are considered. Applied statistics relevant to astronomical research are briefly discussed: nonparametric methods for use when little is known about the behavior of the astronomical populations or processes; data smoothing with kernel density estimation and nonparametric regression; unsupervised clustering and supervised classification procedures for multivariate problems; survival analysis for astronomical datasets with nondetections; time- and frequency-domain times series analysis for light curves; and spatial statistics to interpret the spatial distributions of points in low dimensions. Two types of resources are presented: about 40 recommended texts and monographs in various fields of statistics, and the public domain R software system for statistical analysis. Together with its \sim 3500 (and growing) add-on CRAN packages, R implements a vast range of statistical procedures in a coherent high-level language with advanced graphics.

[10] oai:arXiv.org:0908.4056 [pdf] - 27697

A statistical model for the relation between exoplanets and their host stars

Martinez-Gomez, E.; Babu, G. J.

Comments: 6 pages, Proceedings of the 24th International Workshop on Statistical Modelling, Cornell University, Ithaca NY, July 20-24 2009

Submitted: 2009-08-27

A general model is proposed to explain the relation between the extrasolar planets (or exoplanets) detected until June 2008 and the main characteristics of their host stars through statistical techniques. The main goal is to establish a mathematical relation among the set of variables which better describe the physical characteristics of the host star and the planet itself. The host star is characterized by its distance, age, effective temperature, mass, metallicity, radius and magnitude. The exoplanet is described through its physical parameters (radius and mass) and its orbital parameters (distance, period, eccentricity, inclination and major semiaxis). As a first approach we consider that only the mass of the exoplanet is being determined by the physical properties of its host star. The proposed model is then validated through statistical analysis. Finally we discuss the categorical behavior of the dependent variable through binary models.

[11] oai:arXiv.org:astro-ph/0612707 [pdf] - 88045

Object detection in multi-epoch data

Babu, G. Jogesh; Mahabal, Ashish; Djorgovski, S. G.; Williams, R.

Comments: 6 pages, 2 figures, to appear in ADA IV proceedings

Submitted: 2006-12-22

In astronomy multiple images are frequently obtained at the same position of the sky for follow-up co-addition as it helps one go deeper and look for fainter objects. With large scale panchromatic synoptic surveys becoming more common, image co-addition has become even more necessary as new observations start to get compared with co-added fiducial sky in real time. The standard co-addition techniques have included straight averages, variance weighted averages, medians etc. A more sophisticated nonlinear response chi-square method is also used when it is known that the data are background noise limited and the point spread function is homogenized in all channels. A more robust object detection technique capable of detecting faint sources, even those not seen at all epochs which will normally be smoothed out in traditional methods, is described. The analysis at each pixel level is based on a formula similar to Mahalanobis distance. The method does not depend on the point spread function.

[12] oai:arXiv.org:astro-ph/0401404 [pdf] - 880699

Statistical Challenges in Modern Astronomy

Feigelson, E. D.; Babu, G. J.

Comments: Talk from PhyStat2003, Stanford, Ca, USA, September 2003, 7 pages. PSN MOAT001

Submitted: 2004-01-20

Despite centuries of close association, statistics and astronomy are surprisingly distant today. Most observational astronomical research relies on an inadequate toolbox of methodological tools. Yet the needs are substantial: astronomy encounters sophisticated problems involving sampling theory, survival analysis, multivariate classification and analysis, time series analysis, wavelet analysis, spatial point processes, nonlinear regression, bootstrap resampling and model selection. We review the recent resurgence of astrostatistical research, and outline new challenges raised by the emerging Virtual Observatory. Our essay ends with a list of research challenges and infrastructure for astrostatistics in the coming decade.

[13] oai:arXiv.org:astro-ph/9802085 [pdf] - 100256

Three types of gamma-ray bursts

Mukherjee, Soma; Feigelson, Eric D.; Babu, Gutti Jogesh; Murtagh, Fionn; Fraley, Chris; Raftery, Adrian

Comments: 24 pages, 5 figure, 7 tables. Submitted to the Astrophysical Journal in February 1998

Submitted: 1998-02-07

A multivariate analysis of gamma-ray burst (GRB) bulk properties is presented to discriminate between distinct classes of GRBs. Several variables representing burst duration, fluence and spectral hardness are considered. Two multivariate clustering procedures are used on a sample of 797 bursts from the Third BATSE Catalog: a nonparametric average linkage hierarchical agglomerative clustering procedure validated with Wilks' $\Lambda^*$ and other MANOVA tests; and a parametric maximum likelihood model-based clustering procedure assuming multinormal populations calculated with the EM Algorithm and validated with the Bayesian Information Criterion. The two methods yield very similar results. The BATSE GRB population consists of three classes with the following Duration/Fluence/Spectrum bulk properties: Class I with long/bright/intermediate bursts, Class II with short/hard/faint bursts, and Class III with intermediate/intermediate/soft bursts. One outlier with poor data is also present. Classes I and II correspond to those reported by Kouveliotou et al. (1993), but Class III is clearly defined here for the first time.