sort results by

Use logical operators AND, OR, NOT and round brackets to construct complex queries. Whitespace-separated words are treated as ANDed.

Show articles per page in mode

van Dyk, David A.

Normalized to: Van Dyk, D.

36 article(s) in total. 132 co-authors, from 1 to 17 common article(s). Median position in authors list is 3,0.

[1]  oai:arXiv.org:1706.03811  [pdf] - 2074374
STACCATO: A Novel Solution to Supernova Photometric Classification with Biased Training Sets
Comments: Matches version accepted by MNRAS, v3 only changes metadata to point to zenodo repository for the code. The STACCATO code is available from https://doi.org/10.5281/zenodo.3701464
Submitted: 2017-06-12, last modified: 2020-04-02
We present a new solution to the problem of classifying Type Ia supernovae from their light curves alone given a spectroscopically confirmed but biased training set, circumventing the need to obtain an observationally expensive unbiased training set. We use Gaussian processes (GPs) to model the supernovae's (SN) light curves, and demonstrate that the choice of covariance function has only a small influence on the GPs ability to accurately classify SNe. We extend and improve the approach of Richards et al (2012} -- a diffusion map combined with a random forest classifier -- to deal specifically with the case of biassed training sets. We propose a novel method, called STACCATO (SynThetically Augmented Light Curve ClassificATiOn') that synthetically augments a biased training set by generating additional training data from the fitted GPs. Key to the success of the method is the partitioning of the observations into subgroups based on their propensity score of being included in the training set. Using simulated light curve data, we show that STACCATO increases performance, as measured by the area under the Receiver Operating Characteristic curve (AUC), from 0.93 to 0.96, close to the AUC of 0.977 obtained using the 'gold standard' of an unbiased training set and significantly improving on the previous best result of 0.88. STACCATO also increases the true positive rate for SNIa classification by up to a factor of 50 for high-redshift/low brightness SNe.
[2]  oai:arXiv.org:1803.03858  [pdf] - 1904617
Testing One Hypothesis Multiple Times: The Multidimensional Case
Comments:
Submitted: 2018-03-10, last modified: 2019-06-23
The identification of new rare signals in data, the detection of a sudden change in a trend, and the selection of competing models, are among the most challenging problems in statistical practice. These challenges can be tackled using a test of hypothesis where a nuisance parameter is present only under the alternative, and a computationally efficient solution can be obtained by the "Testing One Hypothesis Multiple times" (TOHM) method. In the one-dimensional setting, a fine discretization of the space of the non-identifiable parameter is specified, and a global p-value is obtained by approximating the distribution of the supremum of the resulting stochastic process. In this paper, we propose a computationally efficient inferential tool to perform TOHM in the multidimensional setting. Here, the approximations of interest typically involve the expected Euler Characteristics (EC) of the excursion set of the underlying random field. We introduce a simple algorithm to compute the EC in multiple dimensions and for arbitrary large significance levels. This leads to an highly generalizable computational tool to perform inference under non-standard regularity conditions.
[3]  oai:arXiv.org:1903.06796  [pdf] - 1850859
Astro2020 Science White Paper: The Next Decade of Astroinformatics and Astrostatistics
Comments: Submitted to the Astro2020 Decadal Survey call for science white papers
Submitted: 2019-03-15
Over the past century, major advances in astronomy and astrophysics have been largely driven by improvements in instrumentation and data collection. With the amassing of high quality data from new telescopes, and especially with the advent of deep and large astronomical surveys, it is becoming clear that future advances will also rely heavily on how those data are analyzed and interpreted. New methodologies derived from advances in statistics, computer science, and machine learning are beginning to be employed in sophisticated investigations that are not only bringing forth new discoveries, but are placing them on a solid footing. Progress in wide-field sky surveys, interferometric imaging, precision cosmology, exoplanet detection and characterization, and many subfields of stellar, Galactic and extragalactic astronomy, has resulted in complex data analysis challenges that must be solved to perform scientific inference. Research in astrostatistics and astroinformatics will be necessary to develop the state-of-the-art methodology needed in astronomy. Overcoming these challenges requires dedicated, interdisciplinary research. We recommend: (1) increasing funding for interdisciplinary projects in astrostatistics and astroinformatics; (2) dedicating space and time at conferences for interdisciplinary research and promotion; (3) developing sustainable funding for long-term astrostatisics appointments; and (4) funding infrastructure development for data archives and archive support, state-of-the-art algorithms, and efficient computing.
[4]  oai:arXiv.org:1802.01233  [pdf] - 1833971
Multidimensional Data Driven Classification of Emission-line Galaxies
Comments:
Submitted: 2018-02-04, last modified: 2019-02-07
We propose a new soft clustering scheme for classifying galaxies in different activity classes using simultaneously 4 emission-line ratios; log([NII ]/Ha), log([SII]/Ha), log([OI]/Ha) and log([OIII]/Hb). We fit 20 multivariate Gaussian distributions to the 4-dimensional distribution of these lines obtained from the Sloan Digital Sky Survey (SDSS) in order to capture local structures and subsequently group the multivariate Gaussian distributions to represent the complex multi-dimensional structure of the joint distribution of galaxy spectra in the 4 dimensional line ratio space. The main advantages of this method are the use of all four optical-line ratios simultaneously and the adoption of a clustering scheme. This maximises the available information, avoids contradicting classifications, and treats each class as a distribution resulting in soft classification boundaries and providing the probability for an object to belong to each class. We also introduce linear multi-dimensional decision surfaces using support vector machines based on the classification of our soft clustering scheme. This linear multi-dimensional hard clustering technique shows high classification accuracy with respect to our soft-clustering scheme.
[5]  oai:arXiv.org:1809.06173  [pdf] - 1775689
Incorporating Uncertainties in Atomic Data Into the Analysis of Solar and Stellar Observations: A Case Study in Fe XIII
Comments: in press at ApJ
Submitted: 2018-09-17
Information about the physical properties of astrophysical objects cannot be measured directly but is inferred by interpreting spectroscopic observations in the context of atomic physics calculations. Ratios of emission lines, for example, can be used to infer the electron density of the emitting plasma. Similarly, the relative intensities of emission lines formed over a wide range of temperatures yield information on the temperature structure. A critical component of this analysis is understanding how uncertainties in the underlying atomic physics propagates to the uncertainties in the inferred plasma parameters. At present, however, atomic physics databases do not include uncertainties on the atomic parameters and there is no established methodology for using them even if they did. In this paper we develop simple models for the uncertainties in the collision strengths and decay rates for Fe XIII and apply them to the interpretation of density sensitive lines observed with the EUV Imagining spectrometer (EIS) on Hinode. We incorporate these uncertainties in a Bayesian framework. We consider both a pragmatic Bayesian method where the atomic physics information is unaffected by the observed data, and a fully Bayesian method where the data can be used to probe the physics. The former generally increases the uncertainty in the inferred density by about a factor of 5 compared with models that incorporate only statistical uncertainties. The latter reduces the uncertainties on the inferred densities, but identifies areas of possible systematic problems with either the atomic physics or the observed intensities.
[6]  oai:arXiv.org:1612.04417  [pdf] - 1807649
Projected distances to host galaxy reduce SNIa dispersion
Comments: Major revision including new fits to the host galaxy images and checks of robustness of results. Statistical results are strengthened. Matched version accepted by MNRAS
Submitted: 2016-12-13, last modified: 2018-09-07
We use multi-band imagery data from the Sloan Digital Sky Survey (SDSS) to measure projected distances of 302 supernova type Ia (SNIa) from the centre of their host galaxies, normalized to the galaxy's brightness scale length, with a Bayesian approach. We test the hypothesis that SNIas further away from the centre of their host galaxy are less subject to dust contamination (as the dust column density in their environment is smaller) and/or come from a more homogeneous environment. Using the Mann-Whitney U test, we find a statistically significant difference in the observed colour correction distribution between SNIas that are near and those that are far from the centre of their host. The local p-value is 3 x 10^{-3}, which is significant at the 5 per cent level after look-elsewhere effect correction. We estimate the residual scatter of the two subgroups to be 0.073 +/- 0.018 for the far SNIas, compared to 0.114 +/- 0.009 for the near SNIas -- an improvement of 30 per cent, albeit with a low statistical significance of 2sigma. This confirms the importance of host galaxy properties in correctly interpreting SNIa observations for cosmological inference.
[7]  oai:arXiv.org:1806.06733  [pdf] - 1717192
Bayesian Hierarchical Modelling of Initial-Final Mass Relations Across Star Clusters
Comments: 29 pages, 12 figures
Submitted: 2018-06-18, last modified: 2018-07-17
The initial-final mass relation (IFMR) of white dwarfs (WDs) plays an important role in stellar evolution. To derive precise estimates of IFMRs and explore how they may vary among star clusters, we propose a Bayesian hierarchical model that pools photo- metric data from multiple star clusters. After performing a simulation study to show the benefits of the Bayesian hierarchical model, we apply this model to five star clus- ters: the Hyades, M67, NGC 188, NGC 2168, and NGC 2477, leading to reasonable and consistent estimates of IFMRs for these clusters. We illustrate how a cluster-specific analysis of NGC 188 using its own photometric data can produce an unreasonable IFMR since its WDs have a narrow range of zero-age main sequence (ZAMS) masses. However, the Bayesian hierarchical model corrects the cluster-specific analysis by bor- rowing strength from other clusters, thus generating more reliable estimates of IFMR parameters. The data analysis presents the benefits of Bayesian hierarchical modelling over conventional cluster-specific methods, which motivates us to elaborate the pow- erful statistical techniques in this article.
[8]  oai:arXiv.org:1703.09164  [pdf] - 1700864
A Hierarchical Model for the Ages of Galactic Halo White Dwarfs
Comments: 22 pages
Submitted: 2017-03-27, last modified: 2018-06-18
In astrophysics, we often aim to estimate one or more parameters for each member object in a population and study the distribution of the fitted parameters across the population. In this paper, we develop novel methods that allow us to take advantage of existing software designed for such case-by-case analyses to simultaneously fit parameters of both the individual objects and the parameters that quantify their distribution across the population. Our methods are based on Bayesian hierarchical modelling which is known to produce parameter estimators for the individual objects that are on average closer to their true values than estimators based on case-by-case analyses. We verify this in the context of estimating ages of Galactic halo white dwarfs (WDs) via a series of simulation studies. Finally, we deploy our new techniques on optical and near-infrared photometry of ten candidate halo WDs to obtain estimates of their ages along with an estimate of the mean age of Galactic halo WDs of [11.25, 12.96] Gyr. Although this sample is small, our technique lays the ground work for large-scale studies using data from the Gaia mission.
[9]  oai:arXiv.org:1702.08856  [pdf] - 1540614
The ACS Survey of Galactic Globular Clusters XIV: Bayesian Single-Population Analysis of 69 Globular Clusters
Comments: Accepted: MNRAS 20 pages, 13 figures, 5 tables
Submitted: 2017-02-28
We use Hubble Space Telescope (HST) imaging from the ACS Treasury Survey to determine fits for single population isochrones of 69 Galactic globular clusters. Using robust Bayesian analysis techniques, we simultaneously determine ages, distances, absorptions, and helium values for each cluster under the scenario of a "single" stellar population on model grids with solar ratio heavy element abundances. The set of cluster parameters is determined in a consistent and reproducible manner for all clusters using the Bayesian analysis suite BASE-9. Our results are used to re-visit the age-metallicity relation. We find correlations with helium and several other parameters such as metallicity, binary fraction, and proxies for cluster mass. The helium abundances of the clusters are also considered in the context of CNO abundances and the multiple population scenario.
[10]  oai:arXiv.org:1602.01462  [pdf] - 1579811
Bayesian Estimates of Astronomical Time Delays between Gravitationally Lensed Stochastic Light Curves
Comments: Accepted for publication in the Annals of Applied Statistics
Submitted: 2016-02-02, last modified: 2017-01-30
The gravitational field of a galaxy can act as a lens and deflect the light emitted by a more distant object such as a quasar. Strong gravitational lensing causes multiple images of the same quasar to appear in the sky. Since the light in each gravitationally lensed image traverses a different path length from the quasar to the Earth, fluctuations in the source brightness are observed in the several images at different times. The time delay between these fluctuations can be used to constrain cosmological parameters and can be inferred from the time series of brightness data or light curves of each image. To estimate the time delay, we construct a model based on a state-space representation for irregularly observed time series generated by a latent continuous-time Ornstein-Uhlenbeck process. We account for microlensing, an additional source of independent long-term extrinsic variability, via a polynomial regression. Our Bayesian strategy adopts a Metropolis-Hastings within Gibbs sampler. We improve the sampler by using an ancillarity-sufficiency interweaving strategy and adaptive Markov chain Monte Carlo. We introduce a profile likelihood of the time delay as an approximation of its marginal posterior distribution. The Bayesian and profile likelihood approaches complement each other, producing almost identical results; the Bayesian method is more principled but the profile likelihood is simpler to implement. We demonstrate our estimation strategy using simulated data of doubly- and quadruply-lensed quasars, and observed data from quasars Q0957+561 and J1029+2623.
[11]  oai:arXiv.org:1602.03765  [pdf] - 1530475
On methods for correcting for the look-elsewhere effect in searches for new physics
Comments:
Submitted: 2016-02-11, last modified: 2016-12-15
The search for new significant peaks over a energy spectrum often involves a statistical multiple hypothesis testing problem. Separate tests of hypothesis are conducted at different locations producing an ensemble of local p-values, the smallest of which is reported as evidence for the new resonance. Unfortunately, controlling the false detection rate (type I error rate) of such procedures may lead to excessively stringent acceptance criteria. In the recent physics literature, two promising statistical tools have been proposed to overcome these limitations. In 2005, a method to "find needles in haystacks" was introduced by Pilla et al. [1], and a second method was later proposed by Gross and Vitells [2] in the context of the "look elsewhere effect" and trial factors. We show that, for relatively small sample sizes, the former leads to an artificial inflation of statistical power that stems from an increase in the false detection rate, whereas the two methods exhibit similar performance for large sample sizes. We apply the methods to realistic simulations of the Fermi Large Area Telescope data, in particular the search for dark matter annihilation lines. Further, we discuss the counter-intutive scenario where the look-elsewhere corrections are more conservative than much more computationally efficient corrections for multiple hypothesis testing. Finally, we provide general guidelines for navigating the tradeoffs between statistical and computational efficiency when selecting a statistical procedure for signal detection.
[12]  oai:arXiv.org:1611.00835  [pdf] - 1510413
A Bayesian Analysis of the Ages of Four Open Clusters
Comments: 13 pages, 8 figures, published in ApJ
Submitted: 2016-11-02
In this paper we apply a Bayesian technique to determine the best fit of stellar evolution models to find the main sequence turn off age and other cluster parameters of four intermediate-age open clusters: NGC 2360, NGC 2477, NGC 2660, and NGC 3960. Our algorithm utilizes a Markov chain Monte Carlo technique to fit these various parameters, objectively finding the best-fit isochrone for each cluster. The result is a high-precision isochrone fit. We compare these results with the those of traditional "by-eye" isochrone fitting methods. By applying this Bayesian technique to NGC 2360, NGC 2477, NGC 2660, and NGC 3960, we determine the ages of these clusters to be 1.35 +/- 0.05, 1.02 +/- 0.02, 1.64 +/- 0.04, and 0.860 +/- 0.04 Gyr, respectively. The results of this paper continue our effort to determine cluster ages to higher precision than that offered by these traditional methods of isochrone fitting.
[13]  oai:arXiv.org:1609.03425  [pdf] - 1531556
Detecting Relativistic X-ray Jets in High-Redshift Quasars
Comments: 42 pages, 14 figures, submitted to ApJ
Submitted: 2016-09-12
We analyze Chandra X-ray images of a sample of 11 quasars that are known to contain kiloparsec scale radio jets. The sample consists of five high-redshift (z >= 3.6) flat-spectrum radio quasars, and six intermediate redshift (2.1 < z < 2.9) quasars. The dataset includes four sources with integrated steep radio spectra and seven with flat radio spectra. A total of 25 radio jet features are present in this sample. We apply a Bayesian multi-scale image reconstruction method to detect and measure the X-ray emission from the jets. We compute deviations from a baseline model that does not include the jet, and compare observed X-ray images with those computed with simulated images where no jet features exist. This allows us to compute p-value upper bounds on the significance that an X- ray jet is detected in a pre-determined region of interest. We detected 12 of the features unambiguously, and an additional 6 marginally. We also find residual emission in the cores of 3 quasars and in the background of 1 quasar that suggest the existence of unresolved X-ray jets. The dependence of the X-ray to radio luminosity ratio on redshift is a potential diagnostic of the emission mechanism, since the inverse Compton scattering of cosmic microwave background photons (IC/CMB) is thought to be redshift dependent, whereas in synchrotron models no clear redshift dependence is expected. We find that the high-redshift jets have X-ray to radio flux ratios that are marginally inconsistent with those from lower redshifts, suggesting that either the X-ray emissions is due to the IC/CMB rather than the synchrotron process, or that high redshift jets are qualitatively different.
[14]  oai:arXiv.org:1609.01527  [pdf] - 1483531
Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters III: Analysis of 30 Clusters
Comments: 17 pages, 11 figures, 4 tables. Accepted, MNRAS
Submitted: 2016-09-06
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of ~0.04 to 0.11. Because adequate models varying in CNO are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster and also find that the proportion of the first population of stars increases with mass as well. Our results are examined in the context of proposed globular cluster formation scenarios. Additionally, we leverage our Bayesian technique to shed light on inconsistencies between the theoretical models and the observed data.
[15]  oai:arXiv.org:1605.08064  [pdf] - 1483355
Standardizing Type Ia supernovae using Near Infrared rebrightening time
Comments: 6 pages, 2 Figures
Submitted: 2016-05-25
Accurate standardisation of Type Ia supernovae (SNIa) is instrumental to the usage of SNIa as distance indicators. We analyse a homogeneous sample of 22 low-z SNIa, observed by the Carnegie Supernova Project (CSP) in the optical and near infra-red (NIR). We study the time of the second peak in the NIR band due to re-brightening, t2, as an alternative standardisation parameter of SNIa peak brightness. We use BAHAMAS, a Bayesian hierarchical model for SNIa cosmology, to determine the residual scatter in the Hubble diagram. We find that in the absence of a colour correction, t2 is a better standardisation parameter compared to stretch: t2 has a 1 sigma posterior interval for the Hubble residual scatter of [0.250, 0.257] , compared to [0.280, 0.287] when stretch (x1) alone is used. We demonstrate that when employed together with a colour correction, t2 and stretch lead to similar residual scatter. Using colour, stretch and t2 jointly as standardisation parameters does not result in any further reduction in scatter, suggesting that t2 carries redundant information with respect to stretch and colour. With a much larger SNIa NIR sample at higher redshift in the future, t2 could be a useful quantity to perform robustness checks of the standardisation procedure.
[16]  oai:arXiv.org:1605.02810  [pdf] - 1403836
The Power of Principled Bayesian Methods in the Study of Stellar Evolution
Comments: 21 pages, 12 figures in The Ages of Stars, Edited by Y. Lebreton, D. Valls-Gabaud and C. Charbonnel. EAS Publications Series, Volume 65, 2014, pp.267-287
Submitted: 2016-05-09
It takes years of effort employing the best telescopes and instruments to obtain high-quality stellar photometry, astrometry, and spectroscopy. Stellar evolution models contain the experience of lifetimes of theoretical calculations and testing. Yet most astronomers fit these valuable models to these precious datasets by eye. We show that a principled Bayesian approach to fitting models to stellar data yields substantially more information over a range of stellar astrophysics. We highlight advances in determining the ages of star clusters, mass ratios of binary stars, limitations in the accuracy of stellar models, post-main-sequence mass loss, and the ages of individual white dwarfs. We also outline a number of unsolved problems that would benefit from principled Bayesian analyses.
[17]  oai:arXiv.org:1604.06073  [pdf] - 1443898
Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters I: Statistical and Computational Methods
Comments: 18 pages, 9 figures, 2 tables. To be published in The Astrophysical Journal
Submitted: 2016-04-20, last modified: 2016-04-21
We develop a Bayesian model for globular clusters composed of multiple stellar populations, extending earlier statistical models for open clusters composed of simple (single) stellar populations (vanDyk et al. 2009, Stein et al. 2013). Specifically, we model globular clusters with two populations that differ in helium abundance. Our model assumes a hierarchical structuring of the parameters in which physical properties---age, metallicity, helium abundance, distance, absorption, and initial mass---are common to (i) the cluster as a whole or to (ii) individual populations within a cluster, or are unique to (iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm is devised for model fitting that greatly improves convergence relative to its precursor non-adaptive MCMC algorithm. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We use numerical studies to demonstrate that our method can recover parameters of two-population clusters, and also show model misspecification can potentially be identified. As a proof of concept, we analyze the two stellar populations of globular cluster NGC 5272 using our model and methods. (BASE-9 is available from GitHub: https://github.com/argiopetech/base/releases).
[18]  oai:arXiv.org:1604.06074  [pdf] - 1443899
Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters II: NGC 5024, NGC 5272, and NGC 6352
Comments: ApJ, 21 pages, 14 figures, 7 tables
Submitted: 2016-04-20
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from $\sim$0.05 to 0.11 for these three clusters. Model grids with solar $\alpha$-element abundances ([$\alpha$/Fe] =0.0) and enhanced $\alpha$-elements ([$\alpha$/Fe]=0.4) are adopted.
[19]  oai:arXiv.org:1510.05954  [pdf] - 1483226
BAHAMAS: new SNIa analysis reveals inconsistencies with standard cosmology
Comments: Additional figures showing constraints from sub-samples from JLA. Matches ApJ accepted version
Submitted: 2015-10-20, last modified: 2016-04-18
We present results obtained by applying our BAyesian HierArchical Modeling for the Analysis of Supernova cosmology (BAHAMAS) software package to the 740 spectroscopically confirmed supernovae type Ia (SNIa) from the "Joint Light-curve Analysis" (JLA) dataset. We simultaneously determine cosmological parameters and standardization parameters, including host galaxy mass corrections, residual scatter and object-by-object intrinsic magnitudes. Combining JLA and Planck Cosmic Microwave Background data, we find significant discrepancies in cosmological parameter constraints with respect to the standard analysis: we find Omega_M = 0.399+/-0.027, 2.8\sigma\ higher than previously reported and w = -0.910+/-0.045, 1.6\sigma\ higher than the standard analysis. We determine the residual scatter to be sigma_res = 0.104+/-0.005. We confirm (at the 95% probability level) the existence of two sub-populations segregated by host galaxy mass, separated at log_{10}(M/M_solar) = 10, differing in mean intrinsic magnitude by 0.055+/-0.022 mag, lower than previously reported. Cosmological parameter constraints are however unaffected by inclusion of host galaxy mass corrections. We find ~4\sigma\ evidence for a sharp drop in the value of the color correction parameter, beta(z), at a redshift z_trans = 0.662+/-0.055. We rule out some possible explanations for this behaviour, which remains unexplained.
[20]  oai:arXiv.org:1509.01010  [pdf] - 1361320
A method for comparing non-nested models with application to astrophysical searches for new physics
Comments: We welcome examples of non-nested models testing problems
Submitted: 2015-09-03, last modified: 2016-02-19
Searches for unknown physics and decisions between competing astrophysical models to explain data both rely on statistical hypothesis testing. The usual approach in searches for new physical phenomena is based on the statistical Likelihood Ratio Test (LRT) and its asymptotic properties. In the common situation, when neither of the two models under comparison is a special case of the other i.e., when the hypotheses are non-nested, this test is not applicable. In astrophysics, this problem occurs when two models that reside in different parameter spaces are to be compared. An important example is the recently reported excess emission in astrophysical $\gamma$-rays and the question whether its origin is known astrophysics or dark matter. We develop and study a new, simple, generally applicable, frequentist method and validate its statistical properties using a suite of simulations studies. We exemplify it on realistic simulated data of the Fermi-LAT $\gamma$-ray satellite, where non-nested hypotheses testing appears in the search for particle dark matter.
[21]  oai:arXiv.org:1512.04273  [pdf] - 1326857
Preprocessing Solar Images while Preserving their Latent Structure
Comments:
Submitted: 2015-12-14
Telescopes such as the Atmospheric Imaging Assembly aboard the Solar Dynamics Observatory, a NASA satellite, collect massive streams of high resolution images of the Sun through multiple wavelength filters. Reconstructing pixel-by-pixel thermal properties based on these images can be framed as an ill-posed inverse problem with Poisson noise, but this reconstruction is computationally expensive and there is disagreement among researchers about what regularization or prior assumptions are most appropriate. This article presents an image segmentation framework for preprocessing such images in order to reduce the data volume while preserving as much thermal information as possible for later downstream analyses. The resulting segmented images reflect thermal properties but do not depend on solving the ill-posed inverse problem. This allows users to avoid the Poisson inverse problem altogether or to tackle it on each of $\sim$10 segments rather than on each of $\sim$10$^7$ pixels, reducing computing time by a factor of $\sim$10$^6$. We employ a parametric class of dissimilarities that can be expressed as cosine dissimilarity functions or Hellinger distances between nonlinearly transformed vectors of multi-passband observations in each pixel. We develop a decision theoretic framework for choosing the dissimilarity that minimizes the expected loss that arises when estimating identifiable thermal properties based on segmented images rather than on a pixel-by-pixel basis. We also examine the efficacy of different dissimilarities for recovering clusters in the underlying thermal properties. The expected losses are computed under scientifically motivated prior distributions. Two simulation studies guide our choices of dissimilarity function. We illustrate our method by segmenting images of a coronal hole observed on 26 February 2015.
[22]  oai:arXiv.org:1508.07083  [pdf] - 1325978
Detecting Abrupt Changes in the Spectra of High-Energy Astrophysical Sources
Comments: 30 pages, 6 figures
Submitted: 2015-08-27, last modified: 2015-12-10
Variable-intensity astronomical sources are the result of complex and often extreme physical processes. Abrupt changes in source intensity are typically accompanied by equally sudden spectral shifts, i.e., sudden changes in the wavelength distribution of the emission. This article develops a method for modeling photon counts collected from observation of such sources. We embed change points into a marked Poisson process, where photon wavelengths are regarded as marks and both the Poisson intensity parameter and the distribution of the marks are allowed to change. To the best of our knowledge this is the first effort to embed change points into a marked Poisson process. Between the change points, the spectrum is modeled non-parametrically using a mixture of a smooth radial basis expansion and a number of local deviations from the smooth term representing spectral emission lines. Because the model is over parameterized we employ an $\ell_1$ penalty. The tuning parameter in the penalty and the number of change points are determined via the minimum description length principle. Our method is validated via a series of simulation studies and its practical utility is illustrated in the analysis of the ultra-fast rotating yellow giant star known as FK Com.
[23]  oai:arXiv.org:1510.04662  [pdf] - 1360789
Detecting Unspecified Structure in Low-Count Images
Comments:
Submitted: 2015-10-15
Unexpected structure in images of astronomical sources often presents itself upon visual inspection of the image, but such apparent structure may either correspond to true features in the source or be due to noise in the data. This paper presents a method for testing whether inferred structure in an image with Poisson noise represents a significant departure from a baseline (null) model of the image. To infer image structure, we conduct a Bayesian analysis of a full model that uses a multiscale component to allow flexible departures from the posited null model. As a test statistic, we use a tail probability of the posterior distribution under the full model. This choice of test statistic allows us to estimate a computationally efficient upper bound on a p-value that enables us to draw strong conclusions even when there are limited computational resources that can be devoted to simulations under the null model. We demonstrate the statistical performance of our method on simulated images. Applying our method to an X-ray image of the quasar 0730+257, we find significant evidence against the null model of a single point source and uniform background, lending support to the claim of an X-ray jet.
[24]  oai:arXiv.org:1411.7447  [pdf] - 1579567
Disentangling Overlapping Astronomical Sources using Spatial and Spectral Information
Comments: 34 pages, 15 figures, references added
Submitted: 2014-11-26, last modified: 2015-05-21
We present a powerful new algorithm that combines both spatial information (event locations and the point spread function) and spectral information (photon energies) to separate photons from overlapping sources. We use Bayesian statistical methods to simultaneously infer the number of overlapping sources, to probabilistically separate the photons among the sources, and to fit the parameters describing the individual sources. Using the Bayesian joint posterior distribution, we are able to coherently quantify the uncertainties associated with all these parameters. The advantages of combining spatial and spectral information are demonstrated through a simulation study. The utility of the approach is then illustrated by analysis of observations of FK Aqr and FL Aqr with the XMM-Newton Observatory and the central region of the Orion Nebula Cluster with the Chandra X-ray Observatory.
[25]  oai:arXiv.org:1409.1254  [pdf] - 930941
Strong Lens Time Delay Challenge: II. Results of TDC1
Comments: referee's comments incorporated; to appear in ApJ
Submitted: 2014-09-03, last modified: 2014-12-11
We present the results of the first strong lens time delay challenge. The motivation, experimental design, and entry level challenge are described in a companion paper. This paper presents the main challenge, TDC1, which consisted of analyzing thousands of simulated light curves blindly. The observational properties of the light curves cover the range in quality obtained for current targeted efforts (e.g.,~COSMOGRAIL) and expected from future synoptic surveys (e.g.,~LSST), and include simulated systematic errors. \nteamsA\ teams participated in TDC1, submitting results from \nmethods\ different method variants. After a describing each method, we compute and analyze basic statistics measuring accuracy (or bias) $A$, goodness of fit $\chi^2$, precision $P$, and success rate $f$. For some methods we identify outliers as an important issue. Other methods show that outliers can be controlled via visual inspection or conservative quality control. Several methods are competitive, i.e., give $|A|<0.03$, $P<0.03$, and $\chi^2<1.5$, with some of the methods already reaching sub-percent accuracy. The fraction of light curves yielding a time delay measurement is typically in the range $f = $20--40\%. It depends strongly on the quality of the data: COSMOGRAIL-quality cadence and light curve lengths yield significantly higher $f$ than does sparser sampling. Taking the results of TDC1 at face value, we estimate that LSST should provide around 400 robust time-delay measurements, each with $P<0.03$ and $|A|<0.01$, comparable to current lens modeling uncertainties. In terms of observing strategies, we find that $A$ and $f$ depend mostly on season length, while P depends mostly on cadence and campaign duration.
[26]  oai:arXiv.org:1411.3786  [pdf] - 898736
Bayesian Analysis for Stellar Evolution with Nine Parameters (BASE-9): User's Manual
Comments: 22 pages, 7 figures
Submitted: 2014-11-13
BASE-9 is a Bayesian software suite that recovers star cluster and stellar parameters from photometry. BASE-9 is useful for analyzing single-age, single-metallicity star clusters, binaries, or single stars, and for simulating such systems. BASE-9 uses Markov chain Monte Carlo and brute-force numerical integration techniques to estimate the posterior probability distributions for the age, metallicity, helium abundance, distance modulus, and line-of-sight absorption for a cluster, and the mass, binary mass ratio, and cluster membership probability for every stellar object. BASE-9 is provided as open source code on a version-controlled web server. The executables are also available as Amazon Elastic Compute Cloud images. This manual provides potential users with an overview of BASE-9, including instructions for installation and use.
[27]  oai:arXiv.org:1307.7145  [pdf] - 1360728
A Bayesian Approach to Deriving Ages of Individual Field White Dwarfs
Comments: 9 pages, 10 figures, accepted for publication in ApJ
Submitted: 2013-07-26
We apply a self-consistent and robust Bayesian statistical approach to determining the ages, distances, and ZAMS masses of 28 field DA white dwarfs with ages of approximately 4 to 8 Gyrs. Our technique requires only quality optical and near-IR photometry to derive ages with < 15% uncertainties, generally with little sensitivity to our choice of modern initial-final mass relation. We find that age, distance, and ZAMS mass are correlated in a manner that is too complex to be captured by traditional error propagation techniques. We further find that the posterior distributions of age are often asymmetric, indicating that the standard approach to deriving WD ages can yield misleading results.
[28]  oai:arXiv.org:1208.1706  [pdf] - 1150592
A Bayesian Analysis of the Correlations Among Sunspot Cycles
Comments: Accepted for publication in Solar Physics
Submitted: 2012-08-08
Sunspot numbers form a comprehensive, long-duration proxy of solar activity and have been used numerous times to empirically investigate the properties of the solar cycle. A number of correlations have been discovered over the 24 cycles for which observational records are available. Here we carry out a sophisticated statistical analysis of the sunspot record that reaffirms these correlations, and sets up an empirical predictive framework for future cycles. An advantage of our approach is that it allows for rigorous assessment of both the statistical significance of various cycle features and the uncertainty associated with predictions. We summarize the data into three sequential relations that estimate the amplitude, duration, and time of rise to maximum for any cycle, given the values from the previous cycle. We find that there is no indication of a persistence in predictive power beyond one cycle, and conclude that the dynamo does not retain memory beyond one cycle. Based on sunspot records up to October 2011, we obtain, for Cycle 24, an estimated maximum smoothed monthly sunspot number of 97 +- 15, to occur in January--February 2014 +- 6 months.
[29]  oai:arXiv.org:1102.4610  [pdf] - 1360718
Accounting for Calibration Uncertainties in X-ray Analysis: Effective Areas in Spectral Fitting
Comments: 61 pages double spaced, 8 figures, accepted for publication in ApJ
Submitted: 2011-02-22
While considerable advance has been made to account for statistical uncertainties in astronomical analyses, systematic instrumental uncertainties have been generally ignored. This can be crucial to a proper interpretation of analysis results because instrumental calibration uncertainty is a form of systematic uncertainty. Ignoring it can underestimate error bars and introduce bias into the fitted values of model parameters. Accounting for such uncertainties currently requires extensive case-specific simulations if using existing analysis packages. Here we present general statistical methods that incorporate calibration uncertainties into spectral analysis of high-energy data. We first present a method based on multiple imputation that can be applied with any fitting method, but is necessarily approximate. We then describe a more exact Bayesian approach that works in conjunction with a Markov chain Monte Carlo based fitting. We explore methods for improving computational efficiency, and in particular detail a method of summarizing calibration uncertainties with a principal component analysis of samples of plausible calibration files. This method is implemented using recently codified Chandra effective area uncertainties for low-resolution spectral analysis and is verified using both simulated and actual Chandra data. Our procedure for incorporating effective area uncertainty is easily generalized to other types of calibration uncertainties.
[30]  oai:arXiv.org:1102.3459  [pdf] - 1360717
The White Dwarf Age of NGC 2477
Comments: 24 pages, 8 figures, accepted ApJ
Submitted: 2011-02-16
We present deep photometric observations of the open cluster NGC 2477 using HST/WFPC2. By identifying seven cluster white dwarf candidates, we present an analysis of the white dwarf age of this cluster, using both the traditional method of fitting isochrones to the white dwarf cooling sequence, and by employing a new Bayesian statistical technique that has been developed by our group. This new method performs an objective, simultaneous model fit of the cluster and stellar parameters (namely age, metallicity, distance, reddening, as well as individual stellar masses, mass ratios, and cluster membership) to the photometry. Based on this analysis, we measure a white dwarf age of 1.035 +/- 0.054 +/- 0.087 Gyr (uncertainties represent the goodness of model fits and discrepancy among models, respectively), in good agreement with the cluster's main sequence turnoff age. This work is part of our ongoing work to calibrate main sequence turnoff and white dwarf ages using open clusters, and to improve the precision of cluster ages to the ~5% level.
[31]  oai:arXiv.org:1006.4334  [pdf] - 1360716
On Computing Upper Limits to Source Intensities
Comments: 30 pages, 12 figures, accepted in ApJ
Submitted: 2010-06-22
A common problem in astrophysics is determining how bright a source could be and still not be detected. Despite the simplicity with which the problem can be stated, the solution involves complex statistical issues that require careful analysis. In contrast to the confidence bound, this concept has never been formally analyzed, leading to a great variety of often ad hoc solutions. Here we formulate and describe the problem in a self-consistent manner. Detection significance is usually defined by the acceptable proportion of false positives (the TypeI error), and we invoke the complementary concept of false negatives (the TypeII error), based on the statistical power of a test, to compute an upper limit to the detectable source intensity. To determine the minimum intensity that a source must have for it to be detected, we first define a detection threshold, and then compute the probabilities of detecting sources of various intensities at the given threshold. The intensity that corresponds to the specified TypeII error probability defines that minimum intensity, and is identified as the upper limit. Thus, an upper limit is a characteristic of the detection procedure rather than the strength of any particular source and should not be confused with confidence intervals or other estimates of source intensity. This is particularly important given the large number of catalogs that are being generated from increasingly sensitive surveys. We discuss the differences between these upper limits and confidence bounds. Both measures are useful quantities that should be reported in order to extract the most science from catalogs, though they answer different statistical questions: an upper bound describes an inference range on the source intensity, while an upper limit calibrates the detection process. We provide a recipe for computing upper limits that applies to all detection algorithms.
[32]  oai:arXiv.org:0808.3164  [pdf] - 15611
Searching for Narrow Emission Lines in X-ray Spectra: Computation and Methods
Comments: 43 pages, 11 figures; accepted for publication in ApJ
Submitted: 2008-08-23
The detection and quantification of narrow emission lines in X-ray spectra is a challenging statistical task. The Poisson nature of the photon counts leads to local random fluctuations in the observed spectrum that often results in excess emission in a narrow band of energy resembling a weak narrow line. From a formal statistical perspective, this leads to a (sometimes highly) multimodal likelihood. Many standard statistical procedures are based on (asymptotic) Gaussian approximations to the likelihood and simply cannot be used in such settings. Bayesian methods offer a more direct paradigm for accounting for such complicated likelihood functions but even here multimodal likelihoods pose significant computational challenges. The new Markov chain Monte Carlo (MCMC) methods developed in 2008 by van Dyk and Park, however, are able to fully explore the complex posterior distribution of the location of a narrow line, and thus provide valid statistical inference. Even with these computational tools, standard statistical quantities such as means and standard deviations cannot adequately summarize inference and standard testing procedures cannot be used to test for emission lines. In this paper, we use new efficient MCMC algorithms to fit the location of narrow emission lines, we develop new statistical strategies for summarizing highly multimodal distributions and quantifying valid statistical inference, and we extend the method of posterior predictive p-values proposed by Protassov et al. (2002) to test for the presence of narrow emission lines in X-ray spectra. We illustrate and validate our methods using simulation studies and apply them to the Chandra observations of the high redshift quasar PG1634+706.
[33]  oai:arXiv.org:astro-ph/0606247  [pdf] - 1396112
Bayesian Estimation of Hardness Ratios: Modeling and Computations
Comments: 43 pages, 10 figures, 3 tables; submitted to ApJ
Submitted: 2006-06-10
A commonly used measure to summarize the nature of a photon spectrum is the so-called Hardness Ratio, which compares the number of counts observed in different passbands. The hardness ratio is especially useful to distinguish between and categorize weak sources as a proxy for detailed spectral fitting. However, in this regime classical methods of error propagation fail, and the estimates of spectral hardness become unreliable. Here we develop a rigorous statistical treatment of hardness ratios that properly deals with detected photons as independent Poisson random variables and correctly deals with the non-Gaussian nature of the error propagation. The method is Bayesian in nature, and thus can be generalized to carry out a multitude of source-population--based analyses. We verify our method with simulation studies, and compare it with the classical method. We apply this method to real world examples, such as the identification of candidate quiescent Low-mass X-ray binaries in globular clusters, and tracking the time evolution of a flare on a low-mass star.
[34]  oai:arXiv.org:astro-ph/0203165  [pdf] - 48186
Bayesian Spectral Analysis of Metal Abandance Deficient Stars
Comments: 16 pages, 4 Postscript figures, first 2 pages are separately and use svcon2e.sty which is attached, PennState Conference July 2001
Submitted: 2002-03-11
Metallicity can be measured by analyzing the spectra in the X-ray region and comparing the flux in spectral lines to the flux in the underlying Bremsstrahlung continuum. In this paper we propose new Bayesian methods which directly model the Poisson nature of the data and thus are expected to exhibit improved sampling properties. Our model also accounts for the Poisson nature of background contamination of the observations, image blurring due to instrument response, and the absorption of photons in space. The resulting highly structured hierarchical model is fit using the Gibbs sampler, data augmentation and Metropolis-Hasting. We demonstrate our methods with the X-ray spectral analysis of several "Metal Abundance Deficient" stars. The model is designed to summarize the relative frequency of the energy of photons (X-ray or gamma-ray) arriving at a detector. Independent Poisson distributions are more appropriate to model the counts than the commonly used normal approximation. We model the high energy tail of the ASCA spectrum of each of the stars as a combination of a Bremsstrahlung continuum and ten narrow emission lines, included at positions of known strong lines. Statistical analysis is based on two source observations and one background observation. We use sequential Bayesian analysis for the two source observations; the posterior distribution from the first analysis is used to construct a prior for the second. Sensitivity of the final results to the choise of prior is investigated by altering the prior.
[35]  oai:arXiv.org:astro-ph/0201547  [pdf] - 1361036
Statistics: Handle with Care, Detecting Multiple Model Components with the Likelihood Ratio Test
Comments: Twenty four pages, seven figures. The Astrophysical Journal, May 2002, to appear
Submitted: 2002-01-31
The likelihood ratio test (LRT) and the related $F$ test, do not (even asymptotically) adhere to their nominal $\chi^2$ and $F$ distributions in many statistical tests common in astrophysics, thereby casting many marginal line or source detections and non-detections into doubt. Although there are many legitimate uses of these statistics, in some important cases it can be impossible to compute the correct false positive rate. For example, it has become common practice to use the LRT or the $F$ test for detecting a line in a spectral model or a source above background despite the lack of certain required regularity conditions. In these and other settings that involve testing a hypothesis that is on the boundary of the parameter space, {\it contrary to common practice, the nominal $\chi^2$ distribution for the LRT or the $F$ distribution for the $F$ test should not be used}. In this paper, we characterize an important class of problems where the LRT and the $F$ test fail and illustrate this non-standard behavior. We briefly sketch several possible acceptable alternatives, focusing on Bayesian posterior predictive probability-values. We present this method in some detail, as it is a simple, robust, and intuitive approach. This alternative method is illustrated using the gamma-ray burst of May 8, 1997 (GRB 970508) to investigate the presence of an Fe K emission line during the initial phase of the observation.
[36]  oai:arXiv.org:astro-ph/0008170  [pdf] - 1361035
Analysis of Energy Spectra with Low Photon Counts via Bayesian Posterior Simulation
Comments: 48 pages, 10 figures, to appear in the Astrophysical Journal
Submitted: 2000-08-10
Over the past 10 years Bayesian methods have rapidly grown more popular as several computationally intensive statistical algorithms have become feasible with increased computer power. In this paper, we begin with a general description of the Bayesian paradigm for statistical inference and the various state-of-the-art model fitting techniques that we employ (e.g., Gibbs sampler and Metropolis- Hastings). These algorithms are very flexible and can be used to fit models that account for the highly hierarchical structure inherent in the collection of high-quality spectra and thus can keep pace with the accelerating progress of new space telescope designs. The methods we develop, which will soon be available in the CIAO software package, explicitly model photon arrivals as a Poisson process and, thus, have no difficulty with high resolution low count X-ray and gamma-ray data. We expect these methods to be useful not only for the recently launched Chandra X-ray observatory and XMM but also new generation telescopes such as Constellation X, GLAST, etc. In the context of two examples (Quasar S5 0014+813 and Hybrid-Chromosphere Supergiant Star alpha TrA) we illustrate a new highly structured model and how Bayesian posterior sampling can be used to compute estimates, error bars, and credible intervals for the various model parameters.