Normalized to: Roweis, S.
[1]
oai:arXiv.org:0905.2979 [pdf] - 391972
Extreme deconvolution: Inferring complete distribution functions from
noisy, heterogeneous and incomplete observations
Submitted: 2009-05-19, last modified: 2011-07-29
We generalize the well-known mixtures of Gaussians approach to density
estimation and the accompanying Expectation--Maximization technique for finding
the maximum likelihood parameters of the mixture to the case where each data
point carries an individual $d$-dimensional uncertainty covariance and has
unique missing data properties. This algorithm reconstructs the
error-deconvolved or "underlying" distribution function common to all samples,
even when the individual data points are samples from different distributions,
obtained by convolving the underlying distribution with the heteroskedastic
uncertainty distribution of the data point and projecting out the missing data
directions. We show how this basic algorithm can be extended with conjugate
priors on all of the model parameters and a "split-and-merge" procedure
designed to avoid local maxima of the likelihood. We demonstrate the full
method by applying it to the problem of inferring the three-dimensional
velocity distribution of stars near the Sun from noisy two-dimensional,
transverse velocity measurements from the Hipparcos satellite.
[2]
oai:arXiv.org:0910.2233 [pdf] - 902030
Astrometry.net: Blind astrometric calibration of arbitrary astronomical
images
Submitted: 2009-10-12
We have built a reliable and robust system that takes as input an
astronomical image, and returns as output the pointing, scale, and orientation
of that image (the astrometric calibration or WCS information). The system
requires no first guess, and works with the information in the image pixels
alone; that is, the problem is a generalization of the "lost in space" problem
in which nothing--not even the image scale--is known. After robust source
detection is performed in the input image, asterisms (sets of four or five
stars) are geometrically hashed and compared to pre-indexed hashes to generate
hypotheses about the astrometric calibration. A hypothesis is only accepted as
true if it passes a Bayesian decision theory test against a background
hypothesis. With indices built from the USNO-B Catalog and designed for
uniformity of coverage and redundancy, the success rate is 99.9% for
contemporary near-ultraviolet and visual imaging survey data, with no false
positives. The failure rate is consistent with the incompleteness of the USNO-B
Catalog; augmentation with indices built from the 2MASS Catalog brings the
completeness to 100% with no false positives. We are using this system to
generate consistent and standards-compliant meta-data for digital and digitized
imaging from plate repositories, automated observatories, individual scientific
investigators, and hobbyists. This is the first step in a program of making it
possible to trust calibration meta-data for astronomical data of arbitrary
provenance.
[3]
oai:arXiv.org:0905.2980 [pdf] - 24396
The velocity distribution of nearby stars from Hipparcos data I. The
significance of the moving groups
Submitted: 2009-05-18, last modified: 2009-07-20
We present a three-dimensional reconstruction of the velocity distribution of
nearby stars (<~ 100 pc) using a maximum likelihood density estimation
technique applied to the two-dimensional tangential velocities of stars. The
underlying distribution is modeled as a mixture of Gaussian components. The
algorithm reconstructs the error-deconvolved distribution function, even when
the individual stars have unique error and missing-data properties. We apply
this technique to the tangential velocity measurements from a kinematically
unbiased sample of 11,865 main sequence stars observed by the Hipparcos
satellite. We explore various methods for validating the complexity of the
resulting velocity distribution function, including criteria based on Bayesian
model selection and how accurately our reconstruction predicts the radial
velocities of a sample of stars from the Geneva-Copenhagen survey (GCS). Using
this very conservative external validation test based on the GCS, we find that
there is little evidence for structure in the distribution function beyond the
moving groups established prior to the Hipparcos mission. This is in sharp
contrast with internal tests performed here and in previous analyses, which
point consistently to maximal structure in the velocity distribution. We
quantify the information content of the radial velocity measurements and find
that the mean amount of new information gained from a radial velocity
measurement of a single star is significant. This argues for complementary
radial velocity surveys to upcoming astrometric surveys.
[4]
oai:arXiv.org:0805.0759 [pdf] - 12358
Blind Date: Using proper motions to determine the ages of historical
images
Submitted: 2008-05-06
Astrometric calibration is based on patterns of cataloged stars and therefore
effectively assumes a particular epoch, which can be substantially incorrect
for historical images. With the known proper motions of stars we can "run back
the clock" to an approximation of the night sky in any given year, and in
principle the year that best fits stellar patterns in any given image is an
estimate of the year in which that image was taken. In this paper we use 47
scanned photographic images of M44 spanning years 1910-1975 to demonstrate this
technique. We use only the pixel information in the images; we use no prior
information or meta-data about image pointing, scale, orientation, or date.
Blind Date returns date meta-data for the input images. It also improves the
astrometric calibration of the image because the final astrometric calibration
is performed at the appropriate epoch. The accuracy and reliability of Blind
Date are functions of image size, pointing, angular resolution, and depth;
performance is related to the sum of proper-motion signal-to-noise ratios for
catalog stars measured in the input image. All of the science-quality images
and 85 percent of the low-quality images in our sample of photographic plate
images of M44 have their dates reliably determined to within a decade, many to
within months.
[5]
oai:arXiv.org:0709.2358 [pdf] - 4974
Cleaning the USNO-B Catalog through automatic detection of optical
artifacts
Submitted: 2007-09-14, last modified: 2008-01-20
The USNO-B Catalog contains spurious entries that are caused by diffraction
spikes and circular reflection halos around bright stars in the original
imaging data. These spurious entries appear in the Catalog as if they were real
stars; they are confusing for some scientific tasks. The spurious entries can
be identified by simple computer vision techniques because they produce
repeatable patterns on the sky. Some techniques employed here are variants of
the Hough transform, one of which is sensitive to (two-dimensional)
overdensities of faint stars in thin right-angle cross patterns centered on
bright ($<13 \mag$) stars, and one of which is sensitive to thin annular
overdensities centered on very bright ($<7 \mag$) stars. After enforcing
conservative statistical requirements on spurious-entry identifications, we
find that of the 1,042,618,261 entries in the USNO-B Catalog, 24,148,382 of
them ($2.3 \percent$) are identified as spurious by diffraction-spike criteria
and 196,133 ($0.02 \percent$) are identified as spurious by reflection-halo
criteria. The spurious entries are often detected in more than 2 bands and are
not overwhelmingly outliers in any photometric properties; they therefore
cannot be rejected easily on other grounds, i.e., without the use of computer
vision techniques. We demonstrate our method, and return to the community in
electronic form a table of spurious entries in the Catalog.
[6]
oai:arXiv.org:astro-ph/0703454 [pdf] - 90263
An Improved Photometric Calibration of the Sloan Digital Sky Survey
Imaging Data
Padmanabhan, N.;
Schlegel, D. J.;
Finkbeiner, D. P.;
Barentine, J. C.;
Blanton, M. R.;
Brewington, H. J.;
Gunn, J. E.;
Harvanek, M.;
Hogg, D. W.;
Ivezic, Z.;
Johnston, D.;
Kent, S. M.;
Kleinman, S. J.;
Knapp, G. R.;
Krzesinski, J.;
Long, D.;
Neilsen, E. H.;
Nitta, A.;
Loomis, C.;
Lupton, R. H.;
Roweis, S.;
Snedden, S. A.;
Strauss, M. A.;
Tucker, D. L.
Submitted: 2007-03-19, last modified: 2007-10-19
We present an algorithm to photometrically calibrate wide field optical
imaging surveys, that simultaneously solves for the calibration parameters and
relative stellar fluxes using overlapping observations. The algorithm decouples
the problem of "relative" calibrations, from that of "absolute" calibrations;
the absolute calibration is reduced to determining a few numbers for the entire
survey. We pay special attention to the spatial structure of the calibration
errors, allowing one to isolate particular error modes in downstream analyses.
Applying this to the Sloan Digital Sky Survey imaging data, we achieve ~1%
relative calibration errors across 8500 sq.deg. in griz; the errors are ~2% for
the u band. These errors are dominated by unmodelled atmospheric variations at
Apache Point Observatory. These calibrations, dubbed "ubercalibration", are now
public with SDSS Data Release 6, and will be a part of subsequent SDSS data
releases.
[7]
oai:arXiv.org:astro-ph/0606170 [pdf] - 82610
K-corrections and filter transformations in the ultraviolet, optical,
and near infrared
Submitted: 2006-06-07
Template fits to observed galaxy fluxes allow calculation of K-corrections
and conversions among observations of galaxies at various wavelengths. We
present a method for creating model-based template sets given a set of
heterogeneous photometric and spectroscopic galaxy data. Our technique,
non-negative matrix factorization, is akin to principle component analysis
(PCA), except that it is constrained to produce nonnegative templates, it can
use a basis set of models (rather than the delta function basis of PCA), and it
naturally handles uncertainties, missing data, and heterogeneous data
(including broad-band fluxes at various redshifts). The particular
implementation we present here is suitable for ultraviolet, optical, and
near-infrared observations in the redshift range 0 < z < 1.5. Since we base our
templates on stellar population synthesis models, the results are intepretable
in terms of approximate stellar masses and star-formation histories. We present
templates fit with this method to data from GALEX, Sloan Digital Sky Survey
spectroscopy and photometry, the Two-Micron All Sky Survey, the Deep
Extragalactic Evolutionary Probe and the Great Observatories Origins Deep
Survey. In addition, we present software for using such data to estimate
K-corrections and stellar masses.
[8]
oai:arXiv.org:astro-ph/0505057 [pdf] - 72827
Modeling complete distributions with incomplete observations: The
velocity ellipsoid from Hipparcos data
Submitted: 2005-05-03
[abridged] A "missing data" algorithm is developed to model (ie, reconstruct)
the three-dimensional velocity distribution function of a sample of stars using
data (velocity measurements) every one of which has one dimension unmeasured
(the radial direction). It also accounts for covariant measurement
uncertainties on the tangential velocity components. The algorithm is applied
to tangential velocities measured in a kinematically unbiased sample of 11,865
stars taken from the Hipparcos catalog. The local stellar velocity distribution
function of each of a set of 20 color-selected subsamples is modeled as a
mixture of two three-dimensional Gaussian ellipsoids of arbitrary relative
responsibility. In the fitting, one Gaussian (the "halo") is fixed at the known
mean velocity and velocity variance tensor of the Galaxy halo, and the other
(the "disk") is allowed to take arbitrary mean and arbitrary variance tensor.
The mean and variance tensor (commonly the "velocity ellipsoid") of the disk
velocity distribution are both found to be strong functions of stellar color,
with long-lived populations showing larger velocity dispersion, slower mean
rotation velocity, and smaller vertex deviation than short-lived populations.
The local standard of rest (LSR) is inferred in the usual way and the Sun's
motion relative to the LSR is found to be
(U,V,W)_{\odot}=(10.1,4.0,6.7)+/-(0.5,0.8,0.2) km/s. Artificial data sets are
made and analyzed, with the same error properties as the Hipparcos data, to
demonstrate that the analysis is unbiased. The results are shown to be
insensitive to the assumption that the velocity distributions are Gaussian.