Normalized to: Izbicki, R.
[1]
oai:arXiv.org:2001.03621 [pdf] - 2029805
Evaluation of probabilistic photometric redshift estimation approaches
for LSST
Schmidt, S. J.;
Malz, A. I.;
Soo, J. Y. H.;
Almosallam, I. A.;
Brescia, M.;
Cavuoti, S.;
Cohen-Tanugi, J.;
Connolly, A. J.;
DeRose, J.;
Freeman, P. E.;
Graham, M. L.;
Iyer, K. G.;
Jarvis, M. J.;
Kalmbach, J. B.;
Kovacs, E.;
Lee, A. B.;
Longo, G.;
Morrison, C. B.;
Newman, J. A.;
Nourbakhsh, E.;
Nuss, E.;
Pospisil, T.;
Tranin, H.;
Wechsler, R. H.;
Zhou, R.;
Izbicki, R.;
Collaboration, The LSST Dark Energy Science
Submitted: 2020-01-10
Many scientific investigations of photometric galaxy surveys require redshift
estimates, whose uncertainty properties are best encapsulated by photometric
redshift (photo-z) posterior probability density functions (PDFs). A plethora
of photo-z PDF estimation methodologies abound, producing discrepant results
with no consensus on a preferred approach. We present the results of a
comprehensive experiment comparing twelve photo-z algorithms applied to mock
data produced for the Large Synoptic Survey Telescope (LSST) Dark Energy
Science Collaboration (DESC). By supplying perfect prior information, in the
form of the complete template library and a representative training set as
inputs to each code, we demonstrate the impact of the assumptions underlying
each technique on the output photo-z PDFs. In the absence of a notion of true,
unbiased photo-z PDFs, we evaluate and interpret multiple metrics of the
ensemble properties of the derived photo-z PDFs as well as traditional
reductions to photo-z point estimates. We report systematic biases and overall
over/under-breadth of the photo-z PDFs of many popular codes, which may
indicate avenues for improvement in the algorithms or implementations.
Furthermore, we raise attention to the limitations of established metrics for
assessing photo-z PDF accuracy; though we identify the conditional density
estimate (CDE) loss as a promising metric of photo-z PDF performance in the
case where true redshifts are available but true photo-z PDFs are not, we
emphasize the need for science-specific performance metrics.
[2]
oai:arXiv.org:1908.11523 [pdf] - 2031949
Conditional Density Estimation Tools in Python and R with Applications
to Photometric Redshifts and Likelihood-Free Cosmological Inference
Submitted: 2019-08-29, last modified: 2019-12-20
It is well known in astronomy that propagating non-Gaussian prediction
uncertainty in photometric redshift estimates is key to reducing bias in
downstream cosmological analyses. Similarly, likelihood-free inference
approaches, which are beginning to emerge as a tool for cosmological analysis,
require a characterization of the full uncertainty landscape of the parameters
of interest given observed data. However, most machine learning (ML) or
training-based methods with open-source software target point prediction or
classification, and hence fall short in quantifying uncertainty in complex
regression and parameter inference settings. As an alternative to methods that
focus on predicting the response (or parameters) $\mathbf{y}$ from features
$\mathbf{x}$, we provide nonparametric conditional density estimation (CDE)
tools for approximating and validating the entire probability density function
(PDF) $\mathrm{p}(\mathbf{y}|\mathbf{x})$ of $\mathbf{y}$ given (i.e.,
conditional on) $\mathbf{x}$. As there is no one-size-fits-all CDE method, the
goal of this work is to provide a comprehensive range of statistical tools and
open-source software for nonparametric CDE and method assessment which can
accommodate different types of settings and be easily fit to the problem at
hand. Specifically, we introduce four CDE software packages in
$\texttt{Python}$ and $\texttt{R}$ based on ML prediction methods adapted and
optimized for CDE: $\texttt{NNKCDE}$, $\texttt{RFCDE}$, $\texttt{FlexCode}$,
and $\texttt{DeepCDE}$. Furthermore, we present the $\texttt{cdetools}$
package, which includes functions for computing a CDE loss function for tuning
and assessing the quality of individual PDFs, along with diagnostic functions.
We provide sample code in $\texttt{Python}$ and $\texttt{R}$ as well as
examples of applications to photometric redshift estimation and likelihood-free
cosmological inference via CDE.
[3]
oai:arXiv.org:1703.09242 [pdf] - 1582160
A Unified Framework for Constructing, Tuning and Assessing Photometric
Redshift Density Estimates in a Selection Bias Setting
Submitted: 2017-03-27
Photometric redshift estimation is an indispensable tool of precision
cosmology. One problem that plagues the use of this tool in the era of
large-scale sky surveys is that the bright galaxies that are selected for
spectroscopic observation do not have properties that match those of (far more
numerous) dimmer galaxies; thus, ill-designed empirical methods that produce
accurate and precise redshift estimates for the former generally will not
produce good estimates for the latter. In this paper, we provide a principled
framework for generating conditional density estimates (i.e. photometric
redshift PDFs) that takes into account selection bias and the covariate shift
that this bias induces. We base our approach on the assumption that the
probability that astronomers label a galaxy (i.e. determine its spectroscopic
redshift) depends only on its measured (photometric and perhaps other)
properties x and not on its true redshift. With this assumption, we can
explicitly write down risk functions that allow us to both tune and compare
methods for estimating importance weights (i.e. the ratio of densities of
unlabeled and labeled galaxies for different values of x) and conditional
densities. We also provide a method for combining multiple conditional density
estimates for the same galaxy into a single estimate with better properties. We
apply our risk functions to an analysis of approximately one million galaxies,
mostly observed by SDSS, and demonstrate through multiple diagnostic tests that
our method achieves good conditional density estimates for the unlabeled
galaxies.
[4]
oai:arXiv.org:1604.01339 [pdf] - 1386744
Photo-z Estimation: An Example of Nonparametric Conditional Density
Estimation under Selection Bias
Submitted: 2016-04-05
Redshift is a key quantity for inferring cosmological model parameters. In
photometric redshift estimation, cosmologists use the coarse data collected
from the vast majority of galaxies to predict the redshift of individual
galaxies. To properly quantify the uncertainty in the predictions, however, one
needs to go beyond standard regression and instead estimate the full
conditional density f(z|x) of a galaxy's redshift z given its photometric
covariates x. The problem is further complicated by selection bias: usually
only the rarest and brightest galaxies have known redshifts, and these galaxies
have characteristics and measured covariates that do not necessarily match
those of more numerous and dimmer galaxies of unknown redshift. Unfortunately,
there is not much research on how to best estimate complex multivariate
densities in such settings. Here we describe a general framework for properly
constructing and assessing nonparametric conditional density estimators under
selection bias, and for combining two or more estimators for optimal
performance. We propose new improved photo-z estimators and illus- trate our
methods on data from the Sloan Data Sky Survey and an application to
galaxy-galaxy lensing. Although our main application is photo-z estimation, our
methods are relevant to any high-dimensional regression setting with
complicated asymmetric and multimodal distributions in the response variable.
[5]
oai:arXiv.org:1306.1238 [pdf] - 1171844
New Image Statistics for Detecting Disturbed Galaxy Morphologies at High
Redshift
Submitted: 2013-06-05
Testing theories of hierarchical structure formation requires estimating the
distribution of galaxy morphologies and its change with redshift. One aspect of
this investigation involves identifying galaxies with disturbed morphologies
(e.g., merging galaxies). This is often done by summarizing galaxy images
using, e.g., the CAS and Gini-M20 statistics of Conselice (2003) and Lotz et
al. (2004), respectively, and associating particular statistic values with
disturbance. We introduce three statistics that enhance detection of disturbed
morphologies at high-redshift (z ~ 2): the multi-mode (M), intensity (I), and
deviation (D) statistics. We show their effectiveness by training a
machine-learning classifier, random forest, using 1,639 galaxies observed in
the H band by the Hubble Space Telescope WFC3, galaxies that had been
previously classified by eye by the CANDELS collaboration (Grogin et al. 2011,
Koekemoer et al. 2011). We find that the MID statistics (and the A statistic of
Conselice 2003) are the most useful for identifying disturbed morphologies.
We also explore whether human annotators are useful for identifying disturbed
morphologies. We demonstrate that they show limited ability to detect
disturbance at high redshift, and that increasing their number beyond
approximately 10 does not provably yield better classification performance. We
propose a simulation-based model-fitting algorithm that mitigates these issues
by bypassing annotation.