Normalized to: Schafer, C.
[1]
oai:arXiv.org:1905.05116 [pdf] - 1999579
Petabytes to Science
Bauer, Amanda E.;
Bellm, Eric C.;
Bolton, Adam S.;
Chaudhuri, Surajit;
Connolly, A. J.;
Cruz, Kelle L.;
Desai, Vandana;
Drlica-Wagner, Alex;
Economou, Frossie;
Gaffney, Niall;
Kavelaars, J.;
Kinney, J.;
Li, Ting S.;
Lundgren, B.;
Margutti, R.;
Narayan, G.;
Nord, B.;
Norman, Dara J.;
O'Mullane, W.;
Padhi, S.;
Peek, J. E. G.;
Schafer, C.;
Schwamb, Megan E.;
Smith, Arfon M.;
Tollerud, Erik J.;
Weijmans, Anne-Marie;
Szalay, Alexander S.
Submitted: 2019-05-13, last modified: 2019-11-17
A Kavli foundation sponsored workshop on the theme \emph{Petabytes to
Science} was held 12$^{th}$ to 14$^{th}$ of February 2019 in Las Vegas. The aim
of the this workshop was to discuss important trends and technologies which may
support astronomy. We also tackled how to better shape the workforce for the
new trends and how we should approach education and public outreach. This
document was coauthored during the workshop and edited in the weeks after. It
comprises the discussions and highlights many recommendations which came out of
the workshop.
We shall distill parts of this document and formulate potential white papers
for the decadal survey.
[2]
oai:arXiv.org:1911.02479 [pdf] - 1994455
Algorithms and Statistical Models for Scientific Discovery in the
Petabyte Era
Nord, Brian;
Connolly, Andrew J.;
Kinney, Jamie;
Kubica, Jeremy;
Narayan, Gautaum;
Peek, Joshua E. G.;
Schafer, Chad;
Tollerud, Erik J.;
Avestruz, Camille;
Babu, G. Jogesh;
Birrer, Simon;
Burke, Douglas;
Caldeira, João;
Caldwell, Douglas A.;
Carlberg, Joleen K.;
Chen, Yen-Chi;
Dong, Chuanfei;
Feigelson, Eric D.;
Golkhou, V. Zach;
Kashyap, Vinay;
Li, T. S.;
Loredo, Thomas;
Lucie-Smith, Luisa;
Mandel, Kaisey S.;
Martínez-Galarza, J. R.;
Miller, Adam A.;
Natarajan, Priyamvada;
Ntampaka, Michelle;
Ptak, Andy;
Rapetti, David;
Shamir, Lior;
Siemiginowska, Aneta;
Sipőcz, Brigitta M.;
Smith, Arfon M.;
Tran, Nhan;
Vilalta, Ricardo;
Walkowicz, Lucianne M.;
ZuHone, John
Submitted: 2019-11-04
The field of astronomy has arrived at a turning point in terms of size and
complexity of both datasets and scientific collaboration. Commensurately,
algorithms and statistical models have begun to adapt --- e.g., via the onset
of artificial intelligence --- which itself presents new challenges and
opportunities for growth. This white paper aims to offer guidance and ideas for
how we can evolve our technical and collaborative frameworks to promote
efficient algorithmic development and take advantage of opportunities for
scientific discovery in the petabyte era. We discuss challenges for discovery
in large and complex data sets; challenges and requirements for the next stage
of development of statistical methodologies and algorithmic tool sets; how we
might change our paradigms of collaboration and education; and the ethical
implications of scientists' contributions to widely applicable algorithms and
computational modeling. We start with six distinct recommendations that are
supported by the commentary following them. This white paper is related to a
larger corpus of effort that has taken place within and around the Petabytes to
Science Workshops (https://petabytestoscience.github.io/).
[3]
oai:arXiv.org:1910.08376 [pdf] - 1982431
The Growing Importance of a Tech Savvy Astronomy and Astrophysics
Workforce
Norman, Dara;
Cruz, Kelle;
Desai, Vandana;
Lundgren, Britt;
Bellm, Eric;
Economou, Frossie;
Smith, Arfon;
Bauer, Amanda;
Nord, Brian;
Schafer, Chad;
Narayan, Gautham;
Li, Ting;
Tollerud, Erik;
Sipocz, Brigitta;
Stevance, Heloise;
Pickering, Timothy;
Sinha, Manodeep;
Harrington, Joseph;
Kartaltepe, Jeyhan;
Vohl, Dany;
Price-Whelan, Adrian;
Cherinka, Brian;
Chan, Chi-kwan;
Weiner, Benjamin;
Modjaz, Maryam;
Bianco, Federica;
Kerzendorf, Wolfgang;
Laginja, Iva;
Dong, Chuanfei
Submitted: 2019-10-17
Fundamental coding and software development skills are increasingly necessary
for success in nearly every aspect of astronomical and astrophysical research
as large surveys and high resolution simulations become the norm. However,
professional training in these skills is inaccessible or impractical for many
members of our community. Students and professionals alike have been expected
to acquire these skills on their own, apart from formal classroom curriculum or
on-the-job training. Despite the recognized importance of these skills, there
is little opportunity to develop them - even for interested researchers. To
ensure a workforce capable of taking advantage of the computational resources
and the large volumes of data coming in the next decade, we must identify and
support ways to make software development training widely accessible to
community members, regardless of affiliation or career level. To develop and
sustain a technology capable astronomical and astrophysical workforce, we
recommend that agencies make funding and other resources available in order to
encourage, support and, in some cases, require progress on necessary training,
infrastructure and policies. In this white paper, we focus on recommendations
for how funding agencies can lead in the promotion of activities to support the
astronomy and astrophysical workforce in the 2020s.
[4]
oai:arXiv.org:1909.11714 [pdf] - 1969065
Realizing the potential of astrostatistics and astroinformatics
Eadie, Gwendolyn;
Loredo, Thomas J.;
Mahabal, Ashish A.;
Siemiginowska, Aneta;
Feigelson, Eric;
Ford, Eric B.;
Djorgovski, S. G.;
Graham, Matthew;
Ivezic, Zeljko;
Borne, Kirk;
Cisewski-Kehe, Jessi;
Peek, J. E. G.;
Schafer, Chad;
Yanamandra-Fisher, Padma A.;
Young, C. Alex
Submitted: 2019-09-25
This Astro2020 State of the Profession Consideration White Paper highlights
the growth of astrostatistics and astroinformatics in astronomy, identifies key
issues hampering the maturation of these new subfields, and makes
recommendations for structural improvements at different levels that, if acted
upon, will make significant positive impacts across astronomy.
[5]
oai:arXiv.org:1903.09656 [pdf] - 1956476
How to Optimally Constrain Galaxy Assembly Bias: Supplement Projected
Correlation Functions with Count-in-cells Statistics
Submitted: 2019-03-22, last modified: 2019-09-06
Most models for the connection between galaxies and their haloes ignore the
possibility that galaxy properties may be correlated with halo properties other
than mass, a phenomenon known as galaxy assembly bias. Yet, it is known that
such correlations can lead to systematic errors in the interpretation of survey
data. At present, the degree to which galaxy assembly bias may be present in
the real Universe, and the best strategies for constraining it remain
uncertain. We study the ability of several observables to constrain galaxy
assembly bias from redshift survey data using the decorated halo occupation
distribution (dHOD), an empirical model of the galaxy--halo connection that
incorporates assembly bias. We cover an expansive set of observables, including
the projected two-point correlation function $w_{\mathrm{p}}(r_{\mathrm{p}})$,
the galaxy--galaxy lensing signal $\Delta \Sigma(r_{\mathrm{p}})$, the void
probability function $\mathrm{VPF}(r)$, the distributions of
counts-in-cylinders $P(N_{\mathrm{CIC}})$, and counts-in-annuli
$P(N_{\mathrm{CIA}})$, and the distribution of the ratio of counts in cylinders
of different sizes $P(N_2/N_5)$. We find that despite the frequent use of the
combination $w_{\mathrm{p}}(r_{\mathrm{p}})+\Delta \Sigma(r_{\mathrm{p}})$ in
interpreting galaxy data, the count statistics, $P(N_{\mathrm{CIC}})$ and
$P(N_{\mathrm{CIA}})$, are generally more efficient in constraining galaxy
assembly bias when combined with $w_{\mathrm{p}}(r_{\mathrm{p}})$. Constraints
based upon $w_{\mathrm{p}}(r_{\mathrm{p}})$ and $\Delta \Sigma(r_{\mathrm{p}})$
share common degeneracy directions in the parameter space, while combinations
of $w_{\mathrm{p}}(r_{\mathrm{p}})$ with the count statistics are more
complementary. Therefore, we strongly suggest that count statistics should be
used to complement the canonical observables in future studies of the
galaxy--halo connection.
[6]
oai:arXiv.org:1907.09027 [pdf] - 1920207
Better support for collaborations preparing for large-scale projects:
the case study of the LSST Science Collaborations Astro2020 APC White Paper
Bianco, Federica B.;
Banerji, Manda;
Bochanski, John;
Brandt, William N.;
Burchat, Patricia;
Gizis, John;
Ivezić, Zeljko;
Keaton, Charles;
Kaviraj, Sugata;
Loredo, Tom;
Mandelbaum, Rachel;
Marshall, Phil;
McGehee, Peregrine;
Schafer, Chad;
Schwamb, Megan E.;
Sokoloski, Jennifer L;
Strauss, Michael A.;
Street, Rachel;
Trilling, David;
Verma, Aprajita
Submitted: 2019-07-21
Through the lens of the LSST Science Collaborations' experience, this paper
advocates for new and improved ways to fund large, complex collaborations at
the interface of data science and astrophysics as they work in preparation for
and on peta-scale, complex surveys, of which LSST is a prime example. We
advocate for the establishment of programs to support both research and
infrastructure development that enables innovative collaborative research on
such scales.
[7]
oai:arXiv.org:1907.07195 [pdf] - 1918436
FOBOS: A Next-Generation Spectroscopic Facility at the W. M. Keck
Observatory
Bundy, K.;
Westfall, K.;
MacDonald, N.;
Kupke, R.;
Savage, M.;
Poppett, C.;
Alabi, A.;
Becker, G.;
Burchett, J.;
Capak, P.;
Coil, A.;
Cooper, M.;
Cowley, D.;
Deich, W.;
Dillon, D.;
Edelstein, J.;
Guhathakurta, P.;
Hennawi, J.;
Kassis, M.;
Lee, K. -G.;
Masters, D.;
Miller, T.;
Newman, J.;
O'Meara, J.;
Prochaska, J. X.;
Rau, M.;
Rhodes, J.;
Rich, R. M.;
Rockosi, C.;
Romanowsky, A.;
Schafer, C.;
Schlegel, D.;
Shapley, A.;
Siana, B.;
Ting, Y. -S.;
Weisz, D.;
White, M.;
Williams, B.;
Wilson, G.;
Wilson, M.;
Yan, R.
Submitted: 2019-07-16
High-multiplex and deep spectroscopic follow-up of upcoming panoramic
deep-imaging surveys like LSST, Euclid, and WFIRST is a widely recognized and
increasingly urgent necessity. No current or planned facility at a U.S.
observatory meets the sensitivity, multiplex, and rapid-response time needed to
exploit these future datasets. FOBOS, the Fiber-Optic Broadband Optical
Spectrograph, is a near-term fiber-based facility that addresses these
spectroscopic needs by optimizing depth over area and exploiting the aperture
advantage of the existing 10m Keck II Telescope. The result is an instrument
with a uniquely blue-sensitive wavelength range (0.31-1.0 um) at R~3500,
high-multiplex (1800 fibers), and a factor 1.7 greater survey speed and
order-of-magnitude greater sampling density than Subaru's Prime Focus
Spectrograph (PFS). In the era of panoramic deep imaging, FOBOS will excel at
building the deep, spectroscopic reference data sets needed to interpret vast
imaging data. At the same time, its flexible focal plane, including a mode with
25 deployable integral-field units (IFUs) across a 20 arcmin diameter field,
enables an expansive range of scientific investigations. Its key programmatic
areas include (1) nested stellar-parameter training sets that enable studies of
the Milky Way and M31 halo sub-structure, as well as local group dwarf
galaxies, (2) a comprehensive picture of galaxy formation thanks to detailed
mapping of the baryonic environment at z~2 and statistical linking of evolving
populations to the present day, and (3) dramatic enhancements in cosmological
constraints via precise photometric redshifts and determined redshift
distributions. In combination with Keck I instrumentation, FOBOS also provides
instant access to medium-resolution spectroscopy for transient sources with
full coverage from the UV to the K-band.
[8]
oai:arXiv.org:1907.06981 [pdf] - 1917113
Astro2020 APC White Paper: Elevating the Role of Software as a Product
of the Research Enterprise
Smith, Arfon M.;
Norman, Dara;
Cruz, Kelle;
Desai, Vandana;
Bellm, Eric;
Lundgren, Britt;
Economou, Frossie;
Nord, Brian D.;
Schafer, Chad;
Narayan, Gautham;
Harrington, Joseph;
Tollerud, Erik;
Sipőcz, Brigitta;
Pickering, Timothy;
Peeples, Molly S.;
Berriman, Bruce;
Teuben, Peter;
Rodriguez, David;
Gradvohl, Andre;
Shamir, Lior;
Allen, Alice;
Brownstein, Joel R.;
Ginsburg, Adam;
Sinha, Manodeep;
Hummels, Cameron;
Smith, Britton;
Stevance, Heloise;
Price-Whelan, Adrian;
Cherinka, Brian;
Chan, Chi-kwan;
Kartaltepe, Jeyhan;
Turk, Matthew;
Weiner, Benjamin;
Modjaz, Maryam;
Nemiroff, Robert J.;
Kerzendorf, Wolfgang;
Laginja, Iva;
Dong, Chuanfei;
Merín, Bruno;
Sobeck, Jennifer;
Buzasi, Derek;
Faherty, Jacqueline K;
Momcheva, Ivelina;
Connolly, Andrew;
Golkhou, V. Zach
Submitted: 2019-07-14
Software is a critical part of modern research, and yet there are
insufficient mechanisms in the scholarly ecosystem to acknowledge, cite, and
measure the impact of research software. The majority of academic fields rely
on a one-dimensional credit model whereby academic articles (and their
associated citations) are the dominant factor in the success of a researcher's
career. In the petabyte era of astronomical science, citing software and
measuring its impact enables academia to retain and reward researchers that
make significant software contributions. These highly skilled researchers must
be retained to maximize the scientific return from petabyte-scale datasets.
Evolving beyond the one-dimensional credit model requires overcoming several
key challenges, including the current scholarly ecosystem and scientific
culture issues. This white paper will present these challenges and suggest
practical solutions for elevating the role of software as a product of the
research enterprise.
[9]
oai:arXiv.org:1904.11306 [pdf] - 1873325
A Preferential Attachment Model for the Stellar Initial Mass Function
Submitted: 2019-04-25
Accurate specification of a likelihood function is becoming increasingly
difficult in many inference problems in astronomy. As sample sizes resulting
from astronomical surveys continue to grow, deficiencies in the likelihood
function lead to larger biases in key parameter estimates. These deficiencies
result from the oversimplification of the physical processes that generated the
data, and from the failure to account for observational limitations.
Unfortunately, realistic models often do not yield an analytical form for the
likelihood. The estimation of a stellar initial mass function (IMF) is an
important example. The stellar IMF is the mass distribution of stars initially
formed in a given cluster of stars, a population which is not directly
observable due to stellar evolution and other disruptions and observational
limitations of the cluster. There are several difficulties with specifying a
likelihood in this setting since the physical processes and observational
challenges result in measurable masses that cannot legitimately be considered
independent draws from an IMF. This work improves inference of the IMF by using
an approximate Bayesian computation approach that both accounts for
observational and astrophysical effects and incorporates a physically-motivated
model for star cluster formation. The methodology is illustrated via a
simulation study, demonstrating that the proposed approach can recover the true
posterior in realistic situations, and applied to observations from
astrophysical simulation data.
[10]
oai:arXiv.org:1805.00019 [pdf] - 1712945
Moonfalls: Collisions between the Earth and its past moons
Submitted: 2018-04-30
During the last stages of the terrestrial planet formation, planets grow
mainly through giant-impacts with large planetary embryos. The Earth's Moon was
suggested to form through one of these impacts. However, since the proto-Earth
has experienced many giant-impacts, several moons are naturally expected to
form through a sequence of multiple (including smaller scale) impacts. Each
impact potentially forms a sub-Lunar mass moonlet that interacts
gravitationally with the proto-Earth and possibly with previously-formed
moonlets. Such interactions result in either moonlet-moonlet mergers, moonlet
ejections or infall of moonlets on the Earth. The latter possibility, leading
to low-velocity moonlet-Earth collisions is explored here for the first time.
We make use of SPH simulations and consider a range of moonlet masses,
collision impact-angles and initial proto-Earth rotation rates. We find that
grazing/tidal-collisions are the most frequent and produce comparable fractions
of accreted-material and debris. The latter typically clump in smaller moonlets
that can potentially later interact with other moonlets. Other collision
geometries are more rare. Head-on collisions do not produce much debris and are
effectively perfect mergers. Intermediate impact angles result in debris
mass-fractions in the range of 2-25% where most of the material is unbound.
Retrograde collisions produce more debris than prograde collisions, whose
fractions depend on the proto-Earth initial rotation rate. Moonfalls can
slightly change the rotation-rate of the proto-Earth. Accreted moonfall
material is highly localized, potentially explaining the isotopic
heterogeneities in highly siderophile elements in terrestrial rocks, and
possibly forming primordial super-continent topographic features. Our results
can be used for simple scaling laws and applied to n-body studies of the
formation of the Earth and Moon.
[11]
oai:arXiv.org:1610.01661 [pdf] - 1492861
Maximizing Science in the Era of LSST: A Community-Based Study of Needed
US Capabilities
Najita, Joan;
Willman, Beth;
Finkbeiner, Douglas P.;
Foley, Ryan J.;
Hawley, Suzanne;
Newman, Jeffrey A.;
Rudnick, Gregory;
Simon, Joshua D.;
Trilling, David;
Street, Rachel;
Bolton, Adam;
Angus, Ruth;
Bell, Eric F.;
Buzasi, Derek;
Ciardi, David;
Davenport, James R. A.;
Dawson, Will;
Dickinson, Mark;
Drlica-Wagner, Alex;
Elias, Jay;
Erb, Dawn;
Feaga, Lori;
Fong, Wen-fai;
Gawiser, Eric;
Giampapa, Mark;
Guhathakurta, Puragra;
Hoffman, Jennifer L.;
Hsieh, Henry;
Jennings, Elise;
Johnston, Kathryn V.;
Kashyap, Vinay;
Li, Ting S.;
Linder, Eric;
Mandelbaum, Rachel;
Marshall, Phil;
Matheson, Thomas;
Meibom, Soren;
Miller, Bryan W.;
O'Meara, John;
Reddy, Vishnu;
Ridgway, Steve;
Rockosi, Constance M.;
Sand, David J.;
Schafer, Chad;
Schmidt, Sam;
Sesar, Branimir;
Sheppard, Scott S.;
Thomas, Cristina A.;
Tollerud, Erik J.;
Trump, Jon;
von der Linden, Anja
Submitted: 2016-10-05
The Large Synoptic Survey Telescope (LSST) will be a discovery machine for
the astronomy and physics communities, revealing astrophysical phenomena from
the Solar System to the outer reaches of the observable Universe. While many
discoveries will be made using LSST data alone, taking full scientific
advantage of LSST will require ground-based optical-infrared (OIR) supporting
capabilities, e.g., observing time on telescopes, instrumentation, computing
resources, and other infrastructure. This community-based study identifies,
from a science-driven perspective, capabilities that are needed to maximize
LSST science. Expanding on the initial steps taken in the 2015 OIR System
Report, the study takes a detailed, quantitative look at the capabilities
needed to accomplish six representative LSST-enabled science programs that
connect closely with scientific priorities from the 2010 decadal surveys. The
study prioritizes the resources needed to accomplish the science programs and
highlights ways that existing, planned, and future resources could be
positioned to accomplish the science goals.
[12]
oai:arXiv.org:1509.05619 [pdf] - 1279380
Prediction of galaxy ellipticities and reduction of shape noise in
cosmic shear measurements
Submitted: 2015-09-18
The intrinsic scatter in the ellipticities of galaxies about the mean shape,
known as "shape noise," is the most important source of noise in weak lensing
shear measurements. Several approaches to reducing shape noise have recently
been put forward, using information beyond photometry, such as radio
polarization and optical spectroscopy. Here we investigate how well the
intrinsic ellipticities of galaxies can be predicted using other, exclusively
photometric parameters. These parameters (such as galaxy colours) are already
available in the data and do not necessitate additional, often expensive
observations. We apply two regression techniques, generalized additive models
(GAM) and projection pursuit regression (PPR) to the publicly released data
catalog of galaxy properties from CFHTLenS. In our simple analysis we find that
the individual galaxy ellipticities can indeed be predicted from other
photometric parameters to better precision than the scatter about the mean
ellipticity. This means that without additional observations beyond photometry
the ellipticity contribution to the shear can be measured to higher precision,
comparable to using a larger sample of galaxies. Our best-fit model, achieved
using PPR, yields a gain equivalent to having 114.3% more galaxies. Using only
parameters unaffected by lensing (e.g.~surface brightness, colour), the gain is
only ~12%.
[13]
oai:arXiv.org:1206.2563 [pdf] - 1124064
Likelihood-Free Cosmological Inference with Type Ia Supernovae:
Approximate Bayesian Computation for a Complete Treatment of Uncertainty
Submitted: 2012-06-12, last modified: 2013-01-29
Cosmological inference becomes increasingly difficult when complex
data-generating processes cannot be modeled by simple probability
distributions. With the ever-increasing size of data sets in cosmology, there
is increasing burden placed on adequate modeling; systematic errors in the
model will dominate where previously these were swamped by statistical errors.
For example, Gaussian distributions are an insufficient representation for
errors in quantities like photometric redshifts. Likewise, it can be difficult
to quantify analytically the distribution of errors that are introduced in
complex fitting codes. Without a simple form for these distributions, it
becomes difficult to accurately construct a likelihood function for the data as
a function of parameters of interest. Approximate Bayesian computation (ABC)
provides a means of probing the posterior distribution when direct calculation
of a sufficiently accurate likelihood is intractable. ABC allows one to bypass
direct calculation of the likelihood but instead relies upon the ability to
simulate the forward process that generated the data. These simulations can
naturally incorporate priors placed on nuisance parameters, and hence these can
be marginalized in a natural way. We present and discuss ABC methods in the
context of supernova cosmology using data from the SDSS-II Supernova Survey.
Assuming a flat cosmology and constant dark energy equation of state we
demonstrate that ABC can recover an accurate posterior distribution. Finally we
show that ABC can still produce an accurate posterior distribution when we
contaminate the sample with Type IIP supernovae.
[14]
oai:arXiv.org:1105.6344 [pdf] - 489781
Prototype selection for parameter estimation in complex models
Submitted: 2011-05-31, last modified: 2012-03-20
Parameter estimation in astrophysics often requires the use of complex
physical models. In this paper we study the problem of estimating the
parameters that describe star formation history (SFH) in galaxies. Here,
high-dimensional spectral data from galaxies are appropriately modeled as
linear combinations of physical components, called simple stellar populations
(SSPs), plus some nonlinear distortions. Theoretical data for each SSP is
produced for a fixed parameter vector via computer modeling. Though the
parameters that define each SSP are continuous, optimizing the signal model
over a large set of SSPs on a fine parameter grid is computationally infeasible
and inefficient. The goal of this study is to estimate the set of parameters
that describes the SFH of each galaxy. These target parameters, such as the
average ages and chemical compositions of the galaxy's stellar populations, are
derived from the SSP parameters and the component weights in the signal model.
Here, we introduce a principled approach of choosing a small basis of SSP
prototypes for SFH parameter estimation. The basic idea is to quantize the
vector space and effective support of the model components. In addition to
greater computational efficiency, we achieve better estimates of the SFH target
parameters. In simulations, our proposed quantization method obtains a
substantial improvement in estimating the target parameters over the common
method of employing a parameter grid. Sparse coding techniques are not
appropriate for this problem without proper constraints, while constrained
sparse coding methods perform poorly for parameter estimation because their
objective is signal reconstruction, not estimation of the target parameters.
[15]
oai:arXiv.org:1103.6034 [pdf] - 1053061
Semi-supervised Learning for Photometric Supernova Classification
Submitted: 2011-03-30, last modified: 2011-09-27
We present a semi-supervised method for photometric supernova typing. Our
approach is to first use the nonlinear dimension reduction technique diffusion
map to detect structure in a database of supernova light curves and
subsequently employ random forest classification on a spectroscopically
confirmed training set to learn a model that can predict the type of each newly
observed supernova. We demonstrate that this is an effective method for
supernova typing. As supernova numbers increase, our semi-supervised method
efficiently utilizes this information to improve classification, a property not
enjoyed by template based methods. Applied to supernova data simulated by
Kessler et al. (2010b) to mimic those of the Dark Energy Survey, our methods
achieve (cross-validated) 95% Type Ia purity and 87% Type Ia efficiency on the
spectroscopic sample, but only 50% Type Ia purity and 50% efficiency on the
photometric sample due to their spectroscopic follow-up strategy. To improve
the performance on the photometric sample, we search for better spectroscopic
follow-up procedures by studying the sensitivity of our machine learned
supernova classification on the specific strategy used to obtain training sets.
With a fixed amount of spectroscopic follow-up time, we find that deeper
magnitude-limited spectroscopic surveys are better for producing training sets.
For supernova Ia (II-P) typing, we obtain a 44% (1%) increase in purity to 72%
(87%) and 30% (162%) increase in efficiency to 65% (84%) of the sample using a
25th (24.5th) magnitude-limited survey instead of the shallower spectroscopic
sample used in the original simulations. When redshift information is
available, we incorporate it into our analysis using a novel method of altering
the diffusion map representation of the supernovae. Incorporating host
redshifts leads to a 5% improvement in Type Ia purity and 13% improvement in
Type Ia efficiency.
[16]
oai:arXiv.org:0906.0995 [pdf] - 1002464
Photometric Redshift Estimation Using Spectral Connectivity Analysis
Submitted: 2009-06-04
The development of fast and accurate methods of photometric redshift
estimation is a vital step towards being able to fully utilize the data of
next-generation surveys within precision cosmology. In this paper we apply a
specific approach to spectral connectivity analysis (SCA; Lee & Wasserman 2009)
called diffusion map. SCA is a class of non-linear techniques for transforming
observed data (e.g., photometric colours for each galaxy, where the data lie on
a complex subset of p-dimensional space) to a simpler, more natural coordinate
system wherein we apply regression to make redshift predictions. As SCA relies
upon eigen-decomposition, our training set size is limited to ~ 10,000
galaxies; we use the Nystrom extension to quickly estimate diffusion
coordinates for objects not in the training set. We apply our method to 350,738
SDSS main sample galaxies, 29,816 SDSS luminous red galaxies, and 5,223
galaxies from DEEP2 with CFHTLS ugriz photometry. For all three datasets, we
achieve prediction accuracies on par with previous analyses, and find that use
of the Nystrom extension leads to a negligible loss of prediction accuracy
relative to that achieved with the training sets. As in some previous analyses
(e.g., Collister & Lahav 2004, Ball et al. 2008), we observe that our
predictions are generally too high (low) in the low (high) redshift regimes. We
demonstrate that this is a manifestation of attenuation bias, wherein
measurement error (i.e., uncertainty in diffusion coordinates due to
uncertainty in the measured fluxes/magnitudes) reduces the slope of the
best-fit regression line. Mitigation of this bias is necessary if we are to use
photometric redshift estimates produced by computationally efficient empirical
methods in precision cosmology.
[17]
oai:arXiv.org:0905.4683 [pdf] - 1002383
Accurate parameter estimation for star formation history in galaxies
using SDSS spectra
Submitted: 2009-05-28
To further our knowledge of the complex physical process of galaxy formation,
it is essential that we characterize the formation and evolution of large
databases of galaxies. The spectral synthesis STARLIGHT code of Cid Fernandes
et al. (2004) was designed for this purpose. Results of STARLIGHT are highly
dependent on the choice of input basis of simple stellar population (SSP)
spectra. Speed of the code, which uses random walks through the parameter
space, scales as the square of the number of basis spectra, making it
computationally necessary to choose a small number of SSPs that are coarsely
sampled in age and metallicity. In this paper, we develop methods based on
diffusion map (Lafon & Lee, 2006) that, for the first time, choose appropriate
bases of prototype SSP spectra from a large set of SSP spectra designed to
approximate the continuous grid of age and metallicity of SSPs of which
galaxies are truly composed. We show that our techniques achieve better
accuracy of physical parameter estimation for simulated galaxies. Specifically,
we show that our methods significantly decrease the age-metallicity degeneracy
that is common in galaxy population synthesis methods. We analyze a sample of
3046 galaxies in SDSS DR6 and compare the parameter estimates obtained from
different basis choices.
[18]
oai:arXiv.org:0807.2900 [pdf] - 314999
Exploiting Low-Dimensional Structure in Astronomical Spectra
Submitted: 2008-07-18
Dimension-reduction techniques can greatly improve statistical inference in
astronomy. A standard approach is to use Principal Components Analysis (PCA).
In this work we apply a recently-developed technique, diffusion maps, to
astronomical spectra for data parameterization and dimensionality reduction,
and develop a robust, eigenmode-based framework for regression. We show how our
framework provides a computationally efficient means by which to predict
redshifts of galaxies, and thus could inform more expensive redshift estimators
such as template cross-correlation. It also provides a natural means by which
to identify outliers (e.g., misclassified spectra, spectra with anomalous
features). We analyze 3835 SDSS spectra and show how our framework yields a
more than 95% reduction in dimensionality. Finally, we show that the prediction
error of the diffusion map-based regression approach is markedly smaller than
that of a similar approach based on PCA, clearly demonstrating the superiority
of diffusion maps over PCA for this regression task.
[19]
oai:arXiv.org:astro-ph/0702401 [pdf] - 316849
A Statistical Method for Estimating Luminosity Functions using Truncated
Data
Submitted: 2007-02-14
The observational limitations of astronomical surveys lead to significant
statistical inference challenges. One such challenge is the estimation of
luminosity functions given redshift $z$ and absolute magnitude $M$ measurements
from an irregularly truncated sample of objects. This is a bivariate density
estimation problem; we develop here a statistically rigorous method which (1)
does not assume a strict parametric form for the bivariate density; (2) does
not assume independence between redshift and absolute magnitude (and hence
allows evolution of the luminosity function with redshift); (3) does not
require dividing the data into arbitrary bins; and (4) naturally incorporates a
varying selection function. We accomplish this by decomposing the bivariate
density into nonparametric and parametric portions. There is a simple way of
estimating the integrated mean squared error of the estimator; smoothing
parameters are selected to minimize this quantity. Results are presented from
the analysis of a sample of quasars.