Normalized to: Tcheng, D.
[1]
oai:arXiv.org:0804.3413 [pdf] - 11960
Robust Machine Learning Applied to Astronomical Datasets III:
Probabilistic Photometric Redshifts for Galaxies and Quasars in the SDSS and
GALEX
Submitted: 2008-04-21
We apply machine learning in the form of a nearest neighbor instance-based
algorithm (NN) to generate full photometric redshift probability density
functions (PDFs) for objects in the Fifth Data Release of the Sloan Digital Sky
Survey (SDSS DR5). We use a conceptually simple but novel application of NN to
generate the PDFs - perturbing the object colors by their measurement error -
and using the resulting instances of nearest neighbor distributions to generate
numerous individual redshifts. When the redshifts are compared to existing SDSS
spectroscopic data, we find that the mean value of each PDF has a dispersion
between the photometric and spectroscopic redshift consistent with other
machine learning techniques, being sigma = 0.0207 +/- 0.0001 for main sample
galaxies to r < 17.77 mag, sigma = 0.0243 +/- 0.0002 for luminous red galaxies
to r < ~19.2 mag, and sigma = 0.343 +/- 0.005 for quasars to i < 20.3 mag. The
PDFs allow the selection of subsets with improved statistics. For quasars, the
improvement is dramatic: for those with a single peak in their probability
distribution, the dispersion is reduced from 0.343 to sigma = 0.117 +/- 0.010,
and the photometric redshift is within 0.3 of the spectroscopic redshift for
99.3 +/- 0.1% of the objects. Thus, for this optical quasar sample, we can
virtually eliminate 'catastrophic' photometric redshift estimates. In addition
to the SDSS sample, we incorporate ultraviolet photometry from the Third Data
Release of the Galaxy Evolution Explorer All-Sky Imaging Survey (GALEX AIS GR3)
to create PDFs for objects seen in both surveys. For quasars, the increased
coverage of the observed frame UV of the SED results in significant improvement
over the full SDSS sample, with sigma = 0.234 +/- 0.010. We demonstrate that
this improvement is genuine. [Abridged]
[2]
oai:arXiv.org:astro-ph/0612471 [pdf] - 316659
Robust Machine Learning Applied to Astronomical Datasets II: Quantifying
Photometric Redshifts for Quasars Using Instance-Based Learning
Submitted: 2006-12-17, last modified: 2007-03-22
We apply instance-based machine learning in the form of a k-nearest neighbor
algorithm to the task of estimating photometric redshifts for 55,746 objects
spectroscopically classified as quasars in the Fifth Data Release of the Sloan
Digital Sky Survey. We compare the results obtained to those from an empirical
color-redshift relation (CZR). In contrast to previously published results
using CZRs, we find that the instance-based photometric redshifts are assigned
with no regions of catastrophic failure. Remaining outliers are simply
scattered about the ideal relation, in a similar manner to the pattern seen in
the optical for normal galaxies at redshifts z < ~1. The instance-based
algorithm is trained on a representative sample of the data and
pseudo-blind-tested on the remaining unseen data. The variance between the
photometric and spectroscopic redshifts is sigma^2 = 0.123 +/- 0.002 (compared
to sigma^2 = 0.265 +/- 0.006 for the CZR), and 54.9 +/- 0.7%, 73.3 +/- 0.6%,
and 80.7 +/- 0.3% of the objects are within delta z < 0.1, 0.2, and 0.3
respectively. We also match our sample to the Second Data Release of the Galaxy
Evolution Explorer legacy data and the resulting 7,642 objects show a further
improvement, giving a variance of sigma^2 = 0.054 +/- 0.005, and 70.8 +/- 1.2%,
85.8 +/- 1.0%, and 90.8 +/- 0.7% of objects within delta z < 0.1, 0.2, and 0.3.
We show that the improvement is indeed due to the extra information provided by
GALEX, by training on the same dataset using purely SDSS photometry, which has
a variance of sigma^2 = 0.090 +/- 0.007. Each set of results represents a
realistic standard for application to further datasets for which the spectra
are representative.
[3]
oai:arXiv.org:astro-ph/0606541 [pdf] - 82981
Robust Machine Learning Applied to Astronomical Datasets I: Star-Galaxy
Classification of the SDSS DR3 Using Decision Trees
Submitted: 2006-06-21
We provide classifications for all 143 million non-repeat photometric objects
in the Third Data Release of the Sloan Digital Sky Survey (SDSS) using decision
trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate
that these star/galaxy classifications are expected to be reliable for
approximately 22 million objects with r < ~20. The general machine learning
environment Data-to-Knowledge and supercomputing resources enabled extensive
investigation of the decision tree parameter space. This work presents the
first public release of objects classified in this way for an entire SDSS data
release. The objects are classified as either galaxy, star or nsng (neither
star nor galaxy), with an associated probability for each class. To demonstrate
how to effectively make use of these classifications, we perform several
important tests. First, we detail selection criteria within the probability
space defined by the three classes to extract samples of stars and galaxies to
a given completeness and efficiency. Second, we investigate the efficacy of the
classifications and the effect of extrapolating from the spectroscopic regime
by performing blind tests on objects in the SDSS, 2dF Galaxy Redshift and 2dF
QSO Redshift (2QZ) surveys. Given the photometric limits of our spectroscopic
training data, we effectively begin to extrapolate past our star-galaxy
training set at r ~ 18. By comparing the number counts of our training sample
with the classified sources, however, we find that our efficiencies appear to
remain robust to r ~ 20. As a result, we expect our classifications to be
accurate for 900,000 galaxies and 6.7 million stars, and remain robust via
extrapolation for a total of 8.0 million galaxies and 13.9 million stars.
[Abridged]