Normalized to: Van Dyk, D.
[1]
oai:arXiv.org:1706.03811 [pdf] - 2074374
STACCATO: A Novel Solution to Supernova Photometric Classification with
Biased Training Sets
Submitted: 2017-06-12, last modified: 2020-04-02
We present a new solution to the problem of classifying Type Ia supernovae
from their light curves alone given a spectroscopically confirmed but biased
training set, circumventing the need to obtain an observationally expensive
unbiased training set. We use Gaussian processes (GPs) to model the
supernovae's (SN) light curves, and demonstrate that the choice of covariance
function has only a small influence on the GPs ability to accurately classify
SNe. We extend and improve the approach of Richards et al (2012} -- a diffusion
map combined with a random forest classifier -- to deal specifically with the
case of biassed training sets. We propose a novel method, called STACCATO
(SynThetically Augmented Light Curve ClassificATiOn') that synthetically
augments a biased training set by generating additional training data from the
fitted GPs. Key to the success of the method is the partitioning of the
observations into subgroups based on their propensity score of being included
in the training set. Using simulated light curve data, we show that STACCATO
increases performance, as measured by the area under the Receiver Operating
Characteristic curve (AUC), from 0.93 to 0.96, close to the AUC of 0.977
obtained using the 'gold standard' of an unbiased training set and
significantly improving on the previous best result of 0.88. STACCATO also
increases the true positive rate for SNIa classification by up to a factor of
50 for high-redshift/low brightness SNe.
[2]
oai:arXiv.org:1803.03858 [pdf] - 1904617
Testing One Hypothesis Multiple Times: The Multidimensional Case
Submitted: 2018-03-10, last modified: 2019-06-23
The identification of new rare signals in data, the detection of a sudden
change in a trend, and the selection of competing models, are among the most
challenging problems in statistical practice. These challenges can be tackled
using a test of hypothesis where a nuisance parameter is present only under the
alternative, and a computationally efficient solution can be obtained by the
"Testing One Hypothesis Multiple times" (TOHM) method. In the one-dimensional
setting, a fine discretization of the space of the non-identifiable parameter
is specified, and a global p-value is obtained by approximating the
distribution of the supremum of the resulting stochastic process. In this
paper, we propose a computationally efficient inferential tool to perform TOHM
in the multidimensional setting. Here, the approximations of interest typically
involve the expected Euler Characteristics (EC) of the excursion set of the
underlying random field. We introduce a simple algorithm to compute the EC in
multiple dimensions and for arbitrary large significance levels. This leads to
an highly generalizable computational tool to perform inference under
non-standard regularity conditions.
[3]
oai:arXiv.org:1903.06796 [pdf] - 1850859
Astro2020 Science White Paper: The Next Decade of Astroinformatics and
Astrostatistics
Siemiginowska, A.;
Eadie, G.;
Czekala, I.;
Feigelson, E.;
Ford, E. B.;
Kashyap, V.;
Kuhn, M.;
Loredo, T.;
Ntampaka, M.;
Stevens, A.;
Avelino, A.;
Borne, K.;
Budavari, T.;
Burkhart, B.;
Cisewski-Kehe, J.;
Civano, F.;
Chilingarian, I.;
van Dyk, D. A.;
Fabbiano, G.;
Finkbeiner, D. P.;
Foreman-Mackey, D.;
Freeman, P.;
Fruscione, A.;
Goodman, A. A.;
Graham, M.;
Guenther, H. M.;
Hakkila, J.;
Hernquist, L.;
Huppenkothen, D.;
James, D. J.;
Law, C.;
Lazio, J.;
Lee, T.;
López-Morales, M.;
Mahabal, A. A.;
Mandel, K.;
Meng, X. L.;
Moustakas, J.;
Muna, D.;
Peek, J. E. G.;
Richards, G.;
Portillo, S. K. N.;
Scargle, J.;
de Souza, R. S.;
Speagle, J. S.;
Stassun, K. G.;
Stenning, D. C.;
Taylor, S. R.;
Tremblay, G. R.;
Trimble, V.;
Yanamandra-Fisher, P. A.;
Young, C. A.
Submitted: 2019-03-15
Over the past century, major advances in astronomy and astrophysics have been
largely driven by improvements in instrumentation and data collection. With the
amassing of high quality data from new telescopes, and especially with the
advent of deep and large astronomical surveys, it is becoming clear that future
advances will also rely heavily on how those data are analyzed and interpreted.
New methodologies derived from advances in statistics, computer science, and
machine learning are beginning to be employed in sophisticated investigations
that are not only bringing forth new discoveries, but are placing them on a
solid footing. Progress in wide-field sky surveys, interferometric imaging,
precision cosmology, exoplanet detection and characterization, and many
subfields of stellar, Galactic and extragalactic astronomy, has resulted in
complex data analysis challenges that must be solved to perform scientific
inference. Research in astrostatistics and astroinformatics will be necessary
to develop the state-of-the-art methodology needed in astronomy. Overcoming
these challenges requires dedicated, interdisciplinary research. We recommend:
(1) increasing funding for interdisciplinary projects in astrostatistics and
astroinformatics; (2) dedicating space and time at conferences for
interdisciplinary research and promotion; (3) developing sustainable funding
for long-term astrostatisics appointments; and (4) funding infrastructure
development for data archives and archive support, state-of-the-art algorithms,
and efficient computing.
[4]
oai:arXiv.org:1802.01233 [pdf] - 1833971
Multidimensional Data Driven Classification of Emission-line Galaxies
Submitted: 2018-02-04, last modified: 2019-02-07
We propose a new soft clustering scheme for classifying galaxies in different
activity classes using simultaneously 4 emission-line ratios; log([NII ]/Ha),
log([SII]/Ha), log([OI]/Ha) and log([OIII]/Hb). We fit 20 multivariate Gaussian
distributions to the 4-dimensional distribution of these lines obtained from
the Sloan Digital Sky Survey (SDSS) in order to capture local structures and
subsequently group the multivariate Gaussian distributions to represent the
complex multi-dimensional structure of the joint distribution of galaxy spectra
in the 4 dimensional line ratio space. The main advantages of this method are
the use of all four optical-line ratios simultaneously and the adoption of a
clustering scheme. This maximises the available information, avoids
contradicting classifications, and treats each class as a distribution
resulting in soft classification boundaries and providing the probability for
an object to belong to each class. We also introduce linear multi-dimensional
decision surfaces using support vector machines based on the classification of
our soft clustering scheme. This linear multi-dimensional hard clustering
technique shows high classification accuracy with respect to our
soft-clustering scheme.
[5]
oai:arXiv.org:1809.06173 [pdf] - 1775689
Incorporating Uncertainties in Atomic Data Into the Analysis of Solar
and Stellar Observations: A Case Study in Fe XIII
Submitted: 2018-09-17
Information about the physical properties of astrophysical objects cannot be
measured directly but is inferred by interpreting spectroscopic observations in
the context of atomic physics calculations. Ratios of emission lines, for
example, can be used to infer the electron density of the emitting plasma.
Similarly, the relative intensities of emission lines formed over a wide range
of temperatures yield information on the temperature structure. A critical
component of this analysis is understanding how uncertainties in the underlying
atomic physics propagates to the uncertainties in the inferred plasma
parameters. At present, however, atomic physics databases do not include
uncertainties on the atomic parameters and there is no established methodology
for using them even if they did. In this paper we develop simple models for the
uncertainties in the collision strengths and decay rates for Fe XIII and apply
them to the interpretation of density sensitive lines observed with the EUV
Imagining spectrometer (EIS) on Hinode. We incorporate these uncertainties in a
Bayesian framework. We consider both a pragmatic Bayesian method where the
atomic physics information is unaffected by the observed data, and a fully
Bayesian method where the data can be used to probe the physics. The former
generally increases the uncertainty in the inferred density by about a factor
of 5 compared with models that incorporate only statistical uncertainties. The
latter reduces the uncertainties on the inferred densities, but identifies
areas of possible systematic problems with either the atomic physics or the
observed intensities.
[6]
oai:arXiv.org:1612.04417 [pdf] - 1807649
Projected distances to host galaxy reduce SNIa dispersion
Hill, R.;
Shariff, H.;
Trotta, R.;
Ali-Khan, S.;
Jiao, X.;
Liu, Y.;
Moon, S. K.;
Parker, W.;
Paulus, M.;
van Dyk, D. A.;
Lucy, L. B.
Submitted: 2016-12-13, last modified: 2018-09-07
We use multi-band imagery data from the Sloan Digital Sky Survey (SDSS) to
measure projected distances of 302 supernova type Ia (SNIa) from the centre of
their host galaxies, normalized to the galaxy's brightness scale length, with a
Bayesian approach. We test the hypothesis that SNIas further away from the
centre of their host galaxy are less subject to dust contamination (as the dust
column density in their environment is smaller) and/or come from a more
homogeneous environment. Using the Mann-Whitney U test, we find a statistically
significant difference in the observed colour correction distribution between
SNIas that are near and those that are far from the centre of their host. The
local p-value is 3 x 10^{-3}, which is significant at the 5 per cent level
after look-elsewhere effect correction. We estimate the residual scatter of the
two subgroups to be 0.073 +/- 0.018 for the far SNIas, compared to 0.114 +/-
0.009 for the near SNIas -- an improvement of 30 per cent, albeit with a low
statistical significance of 2sigma. This confirms the importance of host galaxy
properties in correctly interpreting SNIa observations for cosmological
inference.
[7]
oai:arXiv.org:1806.06733 [pdf] - 1717192
Bayesian Hierarchical Modelling of Initial-Final Mass Relations Across
Star Clusters
Submitted: 2018-06-18, last modified: 2018-07-17
The initial-final mass relation (IFMR) of white dwarfs (WDs) plays an
important role in stellar evolution. To derive precise estimates of IFMRs and
explore how they may vary among star clusters, we propose a Bayesian
hierarchical model that pools photo- metric data from multiple star clusters.
After performing a simulation study to show the benefits of the Bayesian
hierarchical model, we apply this model to five star clus- ters: the Hyades,
M67, NGC 188, NGC 2168, and NGC 2477, leading to reasonable and consistent
estimates of IFMRs for these clusters. We illustrate how a cluster-specific
analysis of NGC 188 using its own photometric data can produce an unreasonable
IFMR since its WDs have a narrow range of zero-age main sequence (ZAMS) masses.
However, the Bayesian hierarchical model corrects the cluster-specific analysis
by bor- rowing strength from other clusters, thus generating more reliable
estimates of IFMR parameters. The data analysis presents the benefits of
Bayesian hierarchical modelling over conventional cluster-specific methods,
which motivates us to elaborate the pow- erful statistical techniques in this
article.
[8]
oai:arXiv.org:1703.09164 [pdf] - 1700864
A Hierarchical Model for the Ages of Galactic Halo White Dwarfs
Submitted: 2017-03-27, last modified: 2018-06-18
In astrophysics, we often aim to estimate one or more parameters for each
member object in a population and study the distribution of the fitted
parameters across the population. In this paper, we develop novel methods that
allow us to take advantage of existing software designed for such case-by-case
analyses to simultaneously fit parameters of both the individual objects and
the parameters that quantify their distribution across the population. Our
methods are based on Bayesian hierarchical modelling which is known to produce
parameter estimators for the individual objects that are on average closer to
their true values than estimators based on case-by-case analyses. We verify
this in the context of estimating ages of Galactic halo white dwarfs (WDs) via
a series of simulation studies. Finally, we deploy our new techniques on
optical and near-infrared photometry of ten candidate halo WDs to obtain
estimates of their ages along with an estimate of the mean age of Galactic halo
WDs of [11.25, 12.96] Gyr. Although this sample is small, our technique lays
the ground work for large-scale studies using data from the Gaia mission.
[9]
oai:arXiv.org:1702.08856 [pdf] - 1540614
The ACS Survey of Galactic Globular Clusters XIV: Bayesian
Single-Population Analysis of 69 Globular Clusters
Wagner-Kaiser, R.;
Sarajedini, A.;
von Hippel, T.;
Stenning, D. C.;
van Dyk, D. A.;
Jeffery, E.;
Robinson, E.;
Stein, N.;
Anderson, J.;
Jefferys, W. H.
Submitted: 2017-02-28
We use Hubble Space Telescope (HST) imaging from the ACS Treasury Survey to
determine fits for single population isochrones of 69 Galactic globular
clusters. Using robust Bayesian analysis techniques, we simultaneously
determine ages, distances, absorptions, and helium values for each cluster
under the scenario of a "single" stellar population on model grids with solar
ratio heavy element abundances. The set of cluster parameters is determined in
a consistent and reproducible manner for all clusters using the Bayesian
analysis suite BASE-9. Our results are used to re-visit the age-metallicity
relation. We find correlations with helium and several other parameters such as
metallicity, binary fraction, and proxies for cluster mass. The helium
abundances of the clusters are also considered in the context of CNO abundances
and the multiple population scenario.
[10]
oai:arXiv.org:1602.01462 [pdf] - 1579811
Bayesian Estimates of Astronomical Time Delays between Gravitationally
Lensed Stochastic Light Curves
Submitted: 2016-02-02, last modified: 2017-01-30
The gravitational field of a galaxy can act as a lens and deflect the light
emitted by a more distant object such as a quasar. Strong gravitational lensing
causes multiple images of the same quasar to appear in the sky. Since the light
in each gravitationally lensed image traverses a different path length from the
quasar to the Earth, fluctuations in the source brightness are observed in the
several images at different times. The time delay between these fluctuations
can be used to constrain cosmological parameters and can be inferred from the
time series of brightness data or light curves of each image. To estimate the
time delay, we construct a model based on a state-space representation for
irregularly observed time series generated by a latent continuous-time
Ornstein-Uhlenbeck process. We account for microlensing, an additional source
of independent long-term extrinsic variability, via a polynomial regression.
Our Bayesian strategy adopts a Metropolis-Hastings within Gibbs sampler. We
improve the sampler by using an ancillarity-sufficiency interweaving strategy
and adaptive Markov chain Monte Carlo. We introduce a profile likelihood of the
time delay as an approximation of its marginal posterior distribution. The
Bayesian and profile likelihood approaches complement each other, producing
almost identical results; the Bayesian method is more principled but the
profile likelihood is simpler to implement. We demonstrate our estimation
strategy using simulated data of doubly- and quadruply-lensed quasars, and
observed data from quasars Q0957+561 and J1029+2623.
[11]
oai:arXiv.org:1602.03765 [pdf] - 1530475
On methods for correcting for the look-elsewhere effect in searches for
new physics
Submitted: 2016-02-11, last modified: 2016-12-15
The search for new significant peaks over a energy spectrum often involves a
statistical multiple hypothesis testing problem. Separate tests of hypothesis
are conducted at different locations producing an ensemble of local p-values,
the smallest of which is reported as evidence for the new resonance.
Unfortunately, controlling the false detection rate (type I error rate) of such
procedures may lead to excessively stringent acceptance criteria. In the recent
physics literature, two promising statistical tools have been proposed to
overcome these limitations. In 2005, a method to "find needles in haystacks"
was introduced by Pilla et al. [1], and a second method was later proposed by
Gross and Vitells [2] in the context of the "look elsewhere effect" and trial
factors. We show that, for relatively small sample sizes, the former leads to
an artificial inflation of statistical power that stems from an increase in the
false detection rate, whereas the two methods exhibit similar performance for
large sample sizes. We apply the methods to realistic simulations of the Fermi
Large Area Telescope data, in particular the search for dark matter
annihilation lines. Further, we discuss the counter-intutive scenario where the
look-elsewhere corrections are more conservative than much more computationally
efficient corrections for multiple hypothesis testing. Finally, we provide
general guidelines for navigating the tradeoffs between statistical and
computational efficiency when selecting a statistical procedure for signal
detection.
[12]
oai:arXiv.org:1611.00835 [pdf] - 1510413
A Bayesian Analysis of the Ages of Four Open Clusters
Submitted: 2016-11-02
In this paper we apply a Bayesian technique to determine the best fit of
stellar evolution models to find the main sequence turn off age and other
cluster parameters of four intermediate-age open clusters: NGC 2360, NGC 2477,
NGC 2660, and NGC 3960. Our algorithm utilizes a Markov chain Monte Carlo
technique to fit these various parameters, objectively finding the best-fit
isochrone for each cluster. The result is a high-precision isochrone fit. We
compare these results with the those of traditional "by-eye" isochrone fitting
methods. By applying this Bayesian technique to NGC 2360, NGC 2477, NGC 2660,
and NGC 3960, we determine the ages of these clusters to be 1.35 +/- 0.05, 1.02
+/- 0.02, 1.64 +/- 0.04, and 0.860 +/- 0.04 Gyr, respectively. The results of
this paper continue our effort to determine cluster ages to higher precision
than that offered by these traditional methods of isochrone fitting.
[13]
oai:arXiv.org:1609.03425 [pdf] - 1531556
Detecting Relativistic X-ray Jets in High-Redshift Quasars
McKeough, Kathryn;
Siemiginowska, Aneta;
Cheung, C. C.;
Stawarz, Lukasz;
Kashyap, Vinay L.;
Stein, Nathan;
Stampoulis, Vasileios;
van Dyk, David A.;
Wardle, J. F. C.;
Lee, N. P.;
Harris, D. E.;
Schwartz, D. A.;
Donato, Davide;
Maraschi, Laura;
Tavecchio, Fabrizio
Submitted: 2016-09-12
We analyze Chandra X-ray images of a sample of 11 quasars that are known to
contain kiloparsec scale radio jets. The sample consists of five high-redshift
(z >= 3.6) flat-spectrum radio quasars, and six intermediate redshift (2.1 < z
< 2.9) quasars. The dataset includes four sources with integrated steep radio
spectra and seven with flat radio spectra. A total of 25 radio jet features are
present in this sample. We apply a Bayesian multi-scale image reconstruction
method to detect and measure the X-ray emission from the jets. We compute
deviations from a baseline model that does not include the jet, and compare
observed X-ray images with those computed with simulated images where no jet
features exist. This allows us to compute p-value upper bounds on the
significance that an X- ray jet is detected in a pre-determined region of
interest. We detected 12 of the features unambiguously, and an additional 6
marginally. We also find residual emission in the cores of 3 quasars and in the
background of 1 quasar that suggest the existence of unresolved X-ray jets. The
dependence of the X-ray to radio luminosity ratio on redshift is a potential
diagnostic of the emission mechanism, since the inverse Compton scattering of
cosmic microwave background photons (IC/CMB) is thought to be redshift
dependent, whereas in synchrotron models no clear redshift dependence is
expected. We find that the high-redshift jets have X-ray to radio flux ratios
that are marginally inconsistent with those from lower redshifts, suggesting
that either the X-ray emissions is due to the IC/CMB rather than the
synchrotron process, or that high redshift jets are qualitatively different.
[14]
oai:arXiv.org:1609.01527 [pdf] - 1483531
Bayesian Analysis of Two Stellar Populations in Galactic Globular
Clusters III: Analysis of 30 Clusters
Submitted: 2016-09-06
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival
ACS Treasury observations of 30 Galactic Globular Clusters to characterize two
distinct stellar populations. A sophisticated Bayesian technique is employed to
simultaneously sample the joint posterior distribution of age, distance, and
extinction for each cluster, as well as unique helium values for two
populations within each cluster and the relative proportion of those
populations. We find the helium differences among the two populations in the
clusters fall in the range of ~0.04 to 0.11. Because adequate models varying in
CNO are not presently available, we view these spreads as upper limits and
present them with statistical rather than observational uncertainties. Evidence
supports previous studies suggesting an increase in helium content concurrent
with increasing mass of the cluster and also find that the proportion of the
first population of stars increases with mass as well. Our results are examined
in the context of proposed globular cluster formation scenarios. Additionally,
we leverage our Bayesian technique to shed light on inconsistencies between the
theoretical models and the observed data.
[15]
oai:arXiv.org:1605.08064 [pdf] - 1483355
Standardizing Type Ia supernovae using Near Infrared rebrightening time
Submitted: 2016-05-25
Accurate standardisation of Type Ia supernovae (SNIa) is instrumental to the
usage of SNIa as distance indicators. We analyse a homogeneous sample of 22
low-z SNIa, observed by the Carnegie Supernova Project (CSP) in the optical and
near infra-red (NIR). We study the time of the second peak in the NIR band due
to re-brightening, t2, as an alternative standardisation parameter of SNIa peak
brightness. We use BAHAMAS, a Bayesian hierarchical model for SNIa cosmology,
to determine the residual scatter in the Hubble diagram. We find that in the
absence of a colour correction, t2 is a better standardisation parameter
compared to stretch: t2 has a 1 sigma posterior interval for the Hubble
residual scatter of [0.250, 0.257] , compared to [0.280, 0.287] when stretch
(x1) alone is used. We demonstrate that when employed together with a colour
correction, t2 and stretch lead to similar residual scatter. Using colour,
stretch and t2 jointly as standardisation parameters does not result in any
further reduction in scatter, suggesting that t2 carries redundant information
with respect to stretch and colour. With a much larger SNIa NIR sample at
higher redshift in the future, t2 could be a useful quantity to perform
robustness checks of the standardisation procedure.
[16]
oai:arXiv.org:1605.02810 [pdf] - 1403836
The Power of Principled Bayesian Methods in the Study of Stellar
Evolution
Submitted: 2016-05-09
It takes years of effort employing the best telescopes and instruments to
obtain high-quality stellar photometry, astrometry, and spectroscopy. Stellar
evolution models contain the experience of lifetimes of theoretical
calculations and testing. Yet most astronomers fit these valuable models to
these precious datasets by eye. We show that a principled Bayesian approach to
fitting models to stellar data yields substantially more information over a
range of stellar astrophysics. We highlight advances in determining the ages of
star clusters, mass ratios of binary stars, limitations in the accuracy of
stellar models, post-main-sequence mass loss, and the ages of individual white
dwarfs. We also outline a number of unsolved problems that would benefit from
principled Bayesian analyses.
[17]
oai:arXiv.org:1604.06073 [pdf] - 1443898
Bayesian Analysis of Two Stellar Populations in Galactic Globular
Clusters I: Statistical and Computational Methods
Submitted: 2016-04-20, last modified: 2016-04-21
We develop a Bayesian model for globular clusters composed of multiple
stellar populations, extending earlier statistical models for open clusters
composed of simple (single) stellar populations (vanDyk et al. 2009, Stein et
al. 2013). Specifically, we model globular clusters with two populations that
differ in helium abundance. Our model assumes a hierarchical structuring of the
parameters in which physical properties---age, metallicity, helium abundance,
distance, absorption, and initial mass---are common to (i) the cluster as a
whole or to (ii) individual populations within a cluster, or are unique to
(iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm
is devised for model fitting that greatly improves convergence relative to its
precursor non-adaptive MCMC algorithm. Our model and computational tools are
incorporated into an open-source software suite known as BASE-9. We use
numerical studies to demonstrate that our method can recover parameters of
two-population clusters, and also show model misspecification can potentially
be identified. As a proof of concept, we analyze the two stellar populations of
globular cluster NGC 5272 using our model and methods. (BASE-9 is available
from GitHub: https://github.com/argiopetech/base/releases).
[18]
oai:arXiv.org:1604.06074 [pdf] - 1443899
Bayesian Analysis of Two Stellar Populations in Galactic Globular
Clusters II: NGC 5024, NGC 5272, and NGC 6352
Submitted: 2016-04-20
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival
ACS Treasury observations of Galactic Globular Clusters to find and
characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC
6352. For these three clusters, both single and double-population analyses are
used to determine a best fit isochrone(s). We employ a sophisticated Bayesian
analysis technique to simultaneously fit the cluster parameters (age, distance,
absorption, and metallicity) that characterize each cluster. For the
two-population analysis, unique population level helium values are also fit to
each distinct population of the cluster and the relative proportions of the
populations are determined. We find differences in helium ranging from
$\sim$0.05 to 0.11 for these three clusters. Model grids with solar
$\alpha$-element abundances ([$\alpha$/Fe] =0.0) and enhanced $\alpha$-elements
([$\alpha$/Fe]=0.4) are adopted.
[19]
oai:arXiv.org:1510.05954 [pdf] - 1483226
BAHAMAS: new SNIa analysis reveals inconsistencies with standard
cosmology
Submitted: 2015-10-20, last modified: 2016-04-18
We present results obtained by applying our BAyesian HierArchical Modeling
for the Analysis of Supernova cosmology (BAHAMAS) software package to the 740
spectroscopically confirmed supernovae type Ia (SNIa) from the "Joint
Light-curve Analysis" (JLA) dataset. We simultaneously determine cosmological
parameters and standardization parameters, including host galaxy mass
corrections, residual scatter and object-by-object intrinsic magnitudes.
Combining JLA and Planck Cosmic Microwave Background data, we find significant
discrepancies in cosmological parameter constraints with respect to the
standard analysis: we find Omega_M = 0.399+/-0.027, 2.8\sigma\ higher than
previously reported and w = -0.910+/-0.045, 1.6\sigma\ higher than the standard
analysis. We determine the residual scatter to be sigma_res = 0.104+/-0.005.
We confirm (at the 95% probability level) the existence of two
sub-populations segregated by host galaxy mass, separated at
log_{10}(M/M_solar) = 10, differing in mean intrinsic magnitude by
0.055+/-0.022 mag, lower than previously reported. Cosmological parameter
constraints are however unaffected by inclusion of host galaxy mass
corrections. We find ~4\sigma\ evidence for a sharp drop in the value of the
color correction parameter, beta(z), at a redshift z_trans = 0.662+/-0.055. We
rule out some possible explanations for this behaviour, which remains
unexplained.
[20]
oai:arXiv.org:1509.01010 [pdf] - 1361320
A method for comparing non-nested models with application to
astrophysical searches for new physics
Submitted: 2015-09-03, last modified: 2016-02-19
Searches for unknown physics and decisions between competing astrophysical
models to explain data both rely on statistical hypothesis testing. The usual
approach in searches for new physical phenomena is based on the statistical
Likelihood Ratio Test (LRT) and its asymptotic properties. In the common
situation, when neither of the two models under comparison is a special case of
the other i.e., when the hypotheses are non-nested, this test is not
applicable. In astrophysics, this problem occurs when two models that reside in
different parameter spaces are to be compared. An important example is the
recently reported excess emission in astrophysical $\gamma$-rays and the
question whether its origin is known astrophysics or dark matter. We develop
and study a new, simple, generally applicable, frequentist method and validate
its statistical properties using a suite of simulations studies. We exemplify
it on realistic simulated data of the Fermi-LAT $\gamma$-ray satellite, where
non-nested hypotheses testing appears in the search for particle dark matter.
[21]
oai:arXiv.org:1512.04273 [pdf] - 1326857
Preprocessing Solar Images while Preserving their Latent Structure
Submitted: 2015-12-14
Telescopes such as the Atmospheric Imaging Assembly aboard the Solar Dynamics
Observatory, a NASA satellite, collect massive streams of high resolution
images of the Sun through multiple wavelength filters. Reconstructing
pixel-by-pixel thermal properties based on these images can be framed as an
ill-posed inverse problem with Poisson noise, but this reconstruction is
computationally expensive and there is disagreement among researchers about
what regularization or prior assumptions are most appropriate. This article
presents an image segmentation framework for preprocessing such images in order
to reduce the data volume while preserving as much thermal information as
possible for later downstream analyses. The resulting segmented images reflect
thermal properties but do not depend on solving the ill-posed inverse problem.
This allows users to avoid the Poisson inverse problem altogether or to tackle
it on each of $\sim$10 segments rather than on each of $\sim$10$^7$ pixels,
reducing computing time by a factor of $\sim$10$^6$. We employ a parametric
class of dissimilarities that can be expressed as cosine dissimilarity
functions or Hellinger distances between nonlinearly transformed vectors of
multi-passband observations in each pixel. We develop a decision theoretic
framework for choosing the dissimilarity that minimizes the expected loss that
arises when estimating identifiable thermal properties based on segmented
images rather than on a pixel-by-pixel basis. We also examine the efficacy of
different dissimilarities for recovering clusters in the underlying thermal
properties. The expected losses are computed under scientifically motivated
prior distributions. Two simulation studies guide our choices of dissimilarity
function. We illustrate our method by segmenting images of a coronal hole
observed on 26 February 2015.
[22]
oai:arXiv.org:1508.07083 [pdf] - 1325978
Detecting Abrupt Changes in the Spectra of High-Energy Astrophysical
Sources
Submitted: 2015-08-27, last modified: 2015-12-10
Variable-intensity astronomical sources are the result of complex and often
extreme physical processes. Abrupt changes in source intensity are typically
accompanied by equally sudden spectral shifts, i.e., sudden changes in the
wavelength distribution of the emission. This article develops a method for
modeling photon counts collected from observation of such sources. We embed
change points into a marked Poisson process, where photon wavelengths are
regarded as marks and both the Poisson intensity parameter and the distribution
of the marks are allowed to change. To the best of our knowledge this is the
first effort to embed change points into a marked Poisson process. Between the
change points, the spectrum is modeled non-parametrically using a mixture of a
smooth radial basis expansion and a number of local deviations from the smooth
term representing spectral emission lines. Because the model is over
parameterized we employ an $\ell_1$ penalty. The tuning parameter in the
penalty and the number of change points are determined via the minimum
description length principle. Our method is validated via a series of
simulation studies and its practical utility is illustrated in the analysis of
the ultra-fast rotating yellow giant star known as FK Com.
[23]
oai:arXiv.org:1510.04662 [pdf] - 1360789
Detecting Unspecified Structure in Low-Count Images
Submitted: 2015-10-15
Unexpected structure in images of astronomical sources often presents itself
upon visual inspection of the image, but such apparent structure may either
correspond to true features in the source or be due to noise in the data. This
paper presents a method for testing whether inferred structure in an image with
Poisson noise represents a significant departure from a baseline (null) model
of the image. To infer image structure, we conduct a Bayesian analysis of a
full model that uses a multiscale component to allow flexible departures from
the posited null model. As a test statistic, we use a tail probability of the
posterior distribution under the full model. This choice of test statistic
allows us to estimate a computationally efficient upper bound on a p-value that
enables us to draw strong conclusions even when there are limited computational
resources that can be devoted to simulations under the null model. We
demonstrate the statistical performance of our method on simulated images.
Applying our method to an X-ray image of the quasar 0730+257, we find
significant evidence against the null model of a single point source and
uniform background, lending support to the claim of an X-ray jet.
[24]
oai:arXiv.org:1411.7447 [pdf] - 1579567
Disentangling Overlapping Astronomical Sources using Spatial and
Spectral Information
Submitted: 2014-11-26, last modified: 2015-05-21
We present a powerful new algorithm that combines both spatial information
(event locations and the point spread function) and spectral information
(photon energies) to separate photons from overlapping sources. We use Bayesian
statistical methods to simultaneously infer the number of overlapping sources,
to probabilistically separate the photons among the sources, and to fit the
parameters describing the individual sources. Using the Bayesian joint
posterior distribution, we are able to coherently quantify the uncertainties
associated with all these parameters. The advantages of combining spatial and
spectral information are demonstrated through a simulation study. The utility
of the approach is then illustrated by analysis of observations of FK Aqr and
FL Aqr with the XMM-Newton Observatory and the central region of the Orion
Nebula Cluster with the Chandra X-ray Observatory.
[25]
oai:arXiv.org:1409.1254 [pdf] - 930941
Strong Lens Time Delay Challenge: II. Results of TDC1
Liao, Kai;
Treu, Tommaso;
Marshall, Phil;
Fassnacht, Christopher D.;
Rumbaugh, Nick;
Dobler, Gregory;
Aghamousa, Amir;
Bonvin, Vivien;
Courbin, Frederic;
Hojjati, Alireza;
Jackson, Neal;
Kashyap, Vinay;
Kumar, S. Rathna;
Linder, Eric;
Mandel, Kaisey;
Meng, Xiao-Li;
Meylan, Georges;
Moustakas, Leonidas A.;
Prabhu, Tushar P.;
Romero-Wolf, Andrew;
Shafieloo, Arman;
Siemiginowska, Aneta;
Stalin, Chelliah S.;
Tak, Hyungsuk;
Tewes, Malte;
van Dyk, David
Submitted: 2014-09-03, last modified: 2014-12-11
We present the results of the first strong lens time delay challenge. The
motivation, experimental design, and entry level challenge are described in a
companion paper. This paper presents the main challenge, TDC1, which consisted
of analyzing thousands of simulated light curves blindly. The observational
properties of the light curves cover the range in quality obtained for current
targeted efforts (e.g.,~COSMOGRAIL) and expected from future synoptic surveys
(e.g.,~LSST), and include simulated systematic errors. \nteamsA\ teams
participated in TDC1, submitting results from \nmethods\ different method
variants. After a describing each method, we compute and analyze basic
statistics measuring accuracy (or bias) $A$, goodness of fit $\chi^2$,
precision $P$, and success rate $f$. For some methods we identify outliers as
an important issue. Other methods show that outliers can be controlled via
visual inspection or conservative quality control. Several methods are
competitive, i.e., give $|A|<0.03$, $P<0.03$, and $\chi^2<1.5$, with some of
the methods already reaching sub-percent accuracy. The fraction of light curves
yielding a time delay measurement is typically in the range $f = $20--40\%. It
depends strongly on the quality of the data: COSMOGRAIL-quality cadence and
light curve lengths yield significantly higher $f$ than does sparser sampling.
Taking the results of TDC1 at face value, we estimate that LSST should provide
around 400 robust time-delay measurements, each with $P<0.03$ and $|A|<0.01$,
comparable to current lens modeling uncertainties. In terms of observing
strategies, we find that $A$ and $f$ depend mostly on season length, while P
depends mostly on cadence and campaign duration.
[26]
oai:arXiv.org:1411.3786 [pdf] - 898736
Bayesian Analysis for Stellar Evolution with Nine Parameters (BASE-9):
User's Manual
Submitted: 2014-11-13
BASE-9 is a Bayesian software suite that recovers star cluster and stellar
parameters from photometry. BASE-9 is useful for analyzing single-age,
single-metallicity star clusters, binaries, or single stars, and for simulating
such systems. BASE-9 uses Markov chain Monte Carlo and brute-force numerical
integration techniques to estimate the posterior probability distributions for
the age, metallicity, helium abundance, distance modulus, and line-of-sight
absorption for a cluster, and the mass, binary mass ratio, and cluster
membership probability for every stellar object. BASE-9 is provided as open
source code on a version-controlled web server. The executables are also
available as Amazon Elastic Compute Cloud images. This manual provides
potential users with an overview of BASE-9, including instructions for
installation and use.
[27]
oai:arXiv.org:1307.7145 [pdf] - 1360728
A Bayesian Approach to Deriving Ages of Individual Field White Dwarfs
Submitted: 2013-07-26
We apply a self-consistent and robust Bayesian statistical approach to
determining the ages, distances, and ZAMS masses of 28 field DA white dwarfs
with ages of approximately 4 to 8 Gyrs. Our technique requires only quality
optical and near-IR photometry to derive ages with < 15% uncertainties,
generally with little sensitivity to our choice of modern initial-final mass
relation. We find that age, distance, and ZAMS mass are correlated in a manner
that is too complex to be captured by traditional error propagation techniques.
We further find that the posterior distributions of age are often asymmetric,
indicating that the standard approach to deriving WD ages can yield misleading
results.
[28]
oai:arXiv.org:1208.1706 [pdf] - 1150592
A Bayesian Analysis of the Correlations Among Sunspot Cycles
Submitted: 2012-08-08
Sunspot numbers form a comprehensive, long-duration proxy of solar activity
and have been used numerous times to empirically investigate the properties of
the solar cycle. A number of correlations have been discovered over the 24
cycles for which observational records are available. Here we carry out a
sophisticated statistical analysis of the sunspot record that reaffirms these
correlations, and sets up an empirical predictive framework for future cycles.
An advantage of our approach is that it allows for rigorous assessment of both
the statistical significance of various cycle features and the uncertainty
associated with predictions. We summarize the data into three sequential
relations that estimate the amplitude, duration, and time of rise to maximum
for any cycle, given the values from the previous cycle. We find that there is
no indication of a persistence in predictive power beyond one cycle, and
conclude that the dynamo does not retain memory beyond one cycle. Based on
sunspot records up to October 2011, we obtain, for Cycle 24, an estimated
maximum smoothed monthly sunspot number of 97 +- 15, to occur in
January--February 2014 +- 6 months.
[29]
oai:arXiv.org:1102.4610 [pdf] - 1360718
Accounting for Calibration Uncertainties in X-ray Analysis: Effective
Areas in Spectral Fitting
Lee, Hyunsook;
Kashyap, Vinay L.;
van Dyk, David A.;
Connors, Alanna;
Drake, Jeremy J.;
Izem, Rima;
Meng, Xiao-Li;
Min, Shandong;
Park, Taeyoung;
Ratzlaff, Pete;
Siemiginowska, Aneta;
Zezas, Andreas
Submitted: 2011-02-22
While considerable advance has been made to account for statistical
uncertainties in astronomical analyses, systematic instrumental uncertainties
have been generally ignored. This can be crucial to a proper interpretation of
analysis results because instrumental calibration uncertainty is a form of
systematic uncertainty. Ignoring it can underestimate error bars and introduce
bias into the fitted values of model parameters. Accounting for such
uncertainties currently requires extensive case-specific simulations if using
existing analysis packages. Here we present general statistical methods that
incorporate calibration uncertainties into spectral analysis of high-energy
data. We first present a method based on multiple imputation that can be
applied with any fitting method, but is necessarily approximate. We then
describe a more exact Bayesian approach that works in conjunction with a Markov
chain Monte Carlo based fitting. We explore methods for improving computational
efficiency, and in particular detail a method of summarizing calibration
uncertainties with a principal component analysis of samples of plausible
calibration files. This method is implemented using recently codified Chandra
effective area uncertainties for low-resolution spectral analysis and is
verified using both simulated and actual Chandra data. Our procedure for
incorporating effective area uncertainty is easily generalized to other types
of calibration uncertainties.
[30]
oai:arXiv.org:1102.3459 [pdf] - 1360717
The White Dwarf Age of NGC 2477
Submitted: 2011-02-16
We present deep photometric observations of the open cluster NGC 2477 using
HST/WFPC2. By identifying seven cluster white dwarf candidates, we present an
analysis of the white dwarf age of this cluster, using both the traditional
method of fitting isochrones to the white dwarf cooling sequence, and by
employing a new Bayesian statistical technique that has been developed by our
group. This new method performs an objective, simultaneous model fit of the
cluster and stellar parameters (namely age, metallicity, distance, reddening,
as well as individual stellar masses, mass ratios, and cluster membership) to
the photometry. Based on this analysis, we measure a white dwarf age of 1.035
+/- 0.054 +/- 0.087 Gyr (uncertainties represent the goodness of model fits and
discrepancy among models, respectively), in good agreement with the cluster's
main sequence turnoff age. This work is part of our ongoing work to calibrate
main sequence turnoff and white dwarf ages using open clusters, and to improve
the precision of cluster ages to the ~5% level.
[31]
oai:arXiv.org:1006.4334 [pdf] - 1360716
On Computing Upper Limits to Source Intensities
Submitted: 2010-06-22
A common problem in astrophysics is determining how bright a source could be
and still not be detected. Despite the simplicity with which the problem can be
stated, the solution involves complex statistical issues that require careful
analysis. In contrast to the confidence bound, this concept has never been
formally analyzed, leading to a great variety of often ad hoc solutions. Here
we formulate and describe the problem in a self-consistent manner. Detection
significance is usually defined by the acceptable proportion of false positives
(the TypeI error), and we invoke the complementary concept of false negatives
(the TypeII error), based on the statistical power of a test, to compute an
upper limit to the detectable source intensity. To determine the minimum
intensity that a source must have for it to be detected, we first define a
detection threshold, and then compute the probabilities of detecting sources of
various intensities at the given threshold. The intensity that corresponds to
the specified TypeII error probability defines that minimum intensity, and is
identified as the upper limit. Thus, an upper limit is a characteristic of the
detection procedure rather than the strength of any particular source and
should not be confused with confidence intervals or other estimates of source
intensity. This is particularly important given the large number of catalogs
that are being generated from increasingly sensitive surveys. We discuss the
differences between these upper limits and confidence bounds. Both measures are
useful quantities that should be reported in order to extract the most science
from catalogs, though they answer different statistical questions: an upper
bound describes an inference range on the source intensity, while an upper
limit calibrates the detection process. We provide a recipe for computing upper
limits that applies to all detection algorithms.
[32]
oai:arXiv.org:0808.3164 [pdf] - 15611
Searching for Narrow Emission Lines in X-ray Spectra: Computation and
Methods
Submitted: 2008-08-23
The detection and quantification of narrow emission lines in X-ray spectra is
a challenging statistical task. The Poisson nature of the photon counts leads
to local random fluctuations in the observed spectrum that often results in
excess emission in a narrow band of energy resembling a weak narrow line. From
a formal statistical perspective, this leads to a (sometimes highly) multimodal
likelihood. Many standard statistical procedures are based on (asymptotic)
Gaussian approximations to the likelihood and simply cannot be used in such
settings. Bayesian methods offer a more direct paradigm for accounting for such
complicated likelihood functions but even here multimodal likelihoods pose
significant computational challenges. The new Markov chain Monte Carlo (MCMC)
methods developed in 2008 by van Dyk and Park, however, are able to fully
explore the complex posterior distribution of the location of a narrow line,
and thus provide valid statistical inference. Even with these computational
tools, standard statistical quantities such as means and standard deviations
cannot adequately summarize inference and standard testing procedures cannot be
used to test for emission lines. In this paper, we use new efficient MCMC
algorithms to fit the location of narrow emission lines, we develop new
statistical strategies for summarizing highly multimodal distributions and
quantifying valid statistical inference, and we extend the method of posterior
predictive p-values proposed by Protassov et al. (2002) to test for the
presence of narrow emission lines in X-ray spectra. We illustrate and validate
our methods using simulation studies and apply them to the Chandra observations
of the high redshift quasar PG1634+706.
[33]
oai:arXiv.org:astro-ph/0606247 [pdf] - 1396112
Bayesian Estimation of Hardness Ratios: Modeling and Computations
Submitted: 2006-06-10
A commonly used measure to summarize the nature of a photon spectrum is the
so-called Hardness Ratio, which compares the number of counts observed in
different passbands. The hardness ratio is especially useful to distinguish
between and categorize weak sources as a proxy for detailed spectral fitting.
However, in this regime classical methods of error propagation fail, and the
estimates of spectral hardness become unreliable. Here we develop a rigorous
statistical treatment of hardness ratios that properly deals with detected
photons as independent Poisson random variables and correctly deals with the
non-Gaussian nature of the error propagation. The method is Bayesian in nature,
and thus can be generalized to carry out a multitude of
source-population--based analyses. We verify our method with simulation
studies, and compare it with the classical method. We apply this method to real
world examples, such as the identification of candidate quiescent Low-mass
X-ray binaries in globular clusters, and tracking the time evolution of a flare
on a low-mass star.
[34]
oai:arXiv.org:astro-ph/0203165 [pdf] - 48186
Bayesian Spectral Analysis of Metal Abandance Deficient Stars
Submitted: 2002-03-11
Metallicity can be measured by analyzing the spectra in the X-ray region and
comparing the flux in spectral lines to the flux in the underlying
Bremsstrahlung continuum. In this paper we propose new Bayesian methods which
directly model the Poisson nature of the data and thus are expected to exhibit
improved sampling properties. Our model also accounts for the Poisson nature of
background contamination of the observations, image blurring due to instrument
response, and the absorption of photons in space. The resulting highly
structured hierarchical model is fit using the Gibbs sampler, data augmentation
and Metropolis-Hasting. We demonstrate our methods with the X-ray spectral
analysis of several "Metal Abundance Deficient" stars. The model is designed to
summarize the relative frequency of the energy of photons (X-ray or gamma-ray)
arriving at a detector. Independent Poisson distributions are more appropriate
to model the counts than the commonly used normal approximation. We model the
high energy tail of the ASCA spectrum of each of the stars as a combination of
a Bremsstrahlung continuum and ten narrow emission lines, included at positions
of known strong lines. Statistical analysis is based on two source observations
and one background observation. We use sequential Bayesian analysis for the two
source observations; the posterior distribution from the first analysis is used
to construct a prior for the second. Sensitivity of the final results to the
choise of prior is investigated by altering the prior.
[35]
oai:arXiv.org:astro-ph/0201547 [pdf] - 1361036
Statistics: Handle with Care, Detecting Multiple Model Components with
the Likelihood Ratio Test
Submitted: 2002-01-31
The likelihood ratio test (LRT) and the related $F$ test, do not (even
asymptotically) adhere to their nominal $\chi^2$ and $F$ distributions in many
statistical tests common in astrophysics, thereby casting many marginal line or
source detections and non-detections into doubt. Although there are many
legitimate uses of these statistics, in some important cases it can be
impossible to compute the correct false positive rate. For example, it has
become common practice to use the LRT or the $F$ test for detecting a line in a
spectral model or a source above background despite the lack of certain
required regularity conditions. In these and other settings that involve
testing a hypothesis that is on the boundary of the parameter space, {\it
contrary to common practice, the nominal $\chi^2$ distribution for the LRT or
the $F$ distribution for the $F$ test should not be used}. In this paper, we
characterize an important class of problems where the LRT and the $F$ test fail
and illustrate this non-standard behavior. We briefly sketch several possible
acceptable alternatives, focusing on Bayesian posterior predictive
probability-values. We present this method in some detail, as it is a simple,
robust, and intuitive approach. This alternative method is illustrated using
the gamma-ray burst of May 8, 1997 (GRB 970508) to investigate the presence of
an Fe K emission line during the initial phase of the observation.
[36]
oai:arXiv.org:astro-ph/0008170 [pdf] - 1361035
Analysis of Energy Spectra with Low Photon Counts via Bayesian Posterior
Simulation
Submitted: 2000-08-10
Over the past 10 years Bayesian methods have rapidly grown more popular as
several computationally intensive statistical algorithms have become feasible
with increased computer power. In this paper, we begin with a general
description of the Bayesian paradigm for statistical inference and the various
state-of-the-art model fitting techniques that we employ (e.g., Gibbs sampler
and Metropolis- Hastings). These algorithms are very flexible and can be used
to fit models that account for the highly hierarchical structure inherent in
the collection of high-quality spectra and thus can keep pace with the
accelerating progress of new space telescope designs. The methods we develop,
which will soon be available in the CIAO software package, explicitly model
photon arrivals as a Poisson process and, thus, have no difficulty with high
resolution low count X-ray and gamma-ray data. We expect these methods to be
useful not only for the recently launched Chandra X-ray observatory and XMM but
also new generation telescopes such as Constellation X, GLAST, etc. In the
context of two examples (Quasar S5 0014+813 and Hybrid-Chromosphere Supergiant
Star alpha TrA) we illustrate a new highly structured model and how Bayesian
posterior sampling can be used to compute estimates, error bars, and credible
intervals for the various model parameters.