Full-text search for arXiv

Roweis, Sam

Normalized to: Roweis, S.

8 article(s) in total. 29 co-authors, from 1 to 7 common article(s). Median position in authors list is 3,5.

[1] oai:arXiv.org:0905.2979 [pdf] - 391972

Extreme deconvolution: Inferring complete distribution functions from noisy, heterogeneous and incomplete observations

Bovy, Jo; Hogg, David W.; Roweis, Sam T.

Comments: Published in at http://dx.doi.org/10.1214/10-AOAS439 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Submitted: 2009-05-19, last modified: 2011-07-29

We generalize the well-known mixtures of Gaussians approach to density estimation and the accompanying Expectation--Maximization technique for finding the maximum likelihood parameters of the mixture to the case where each data point carries an individual $d$-dimensional uncertainty covariance and has unique missing data properties. This algorithm reconstructs the error-deconvolved or "underlying" distribution function common to all samples, even when the individual data points are samples from different distributions, obtained by convolving the underlying distribution with the heteroskedastic uncertainty distribution of the data point and projecting out the missing data directions. We show how this basic algorithm can be extended with conjugate priors on all of the model parameters and a "split-and-merge" procedure designed to avoid local maxima of the likelihood. We demonstrate the full method by applying it to the problem of inferring the three-dimensional velocity distribution of stars near the Sun from noisy two-dimensional, transverse velocity measurements from the Hipparcos satellite.

[2] oai:arXiv.org:0910.2233 [pdf] - 902030

Astrometry.net: Blind astrometric calibration of arbitrary astronomical images

Lang, Dustin; Hogg, David W.; Mierle, Keir; Blanton, Michael; Roweis, Sam

Comments: submitted to AJ

Submitted: 2009-10-12

We have built a reliable and robust system that takes as input an astronomical image, and returns as output the pointing, scale, and orientation of that image (the astrometric calibration or WCS information). The system requires no first guess, and works with the information in the image pixels alone; that is, the problem is a generalization of the "lost in space" problem in which nothing--not even the image scale--is known. After robust source detection is performed in the input image, asterisms (sets of four or five stars) are geometrically hashed and compared to pre-indexed hashes to generate hypotheses about the astrometric calibration. A hypothesis is only accepted as true if it passes a Bayesian decision theory test against a background hypothesis. With indices built from the USNO-B Catalog and designed for uniformity of coverage and redundancy, the success rate is 99.9% for contemporary near-ultraviolet and visual imaging survey data, with no false positives. The failure rate is consistent with the incompleteness of the USNO-B Catalog; augmentation with indices built from the 2MASS Catalog brings the completeness to 100% with no false positives. We are using this system to generate consistent and standards-compliant meta-data for digital and digitized imaging from plate repositories, automated observatories, individual scientific investigators, and hobbyists. This is the first step in a program of making it possible to trust calibration meta-data for astronomical data of arbitrary provenance.

[3] oai:arXiv.org:0905.2980 [pdf] - 24396

The velocity distribution of nearby stars from Hipparcos data I. The significance of the moving groups

Bovy, Jo; Hogg, David W.; Roweis, Sam T.

Comments:

Submitted: 2009-05-18, last modified: 2009-07-20

We present a three-dimensional reconstruction of the velocity distribution of nearby stars (<~ 100 pc) using a maximum likelihood density estimation technique applied to the two-dimensional tangential velocities of stars. The underlying distribution is modeled as a mixture of Gaussian components. The algorithm reconstructs the error-deconvolved distribution function, even when the individual stars have unique error and missing-data properties. We apply this technique to the tangential velocity measurements from a kinematically unbiased sample of 11,865 main sequence stars observed by the Hipparcos satellite. We explore various methods for validating the complexity of the resulting velocity distribution function, including criteria based on Bayesian model selection and how accurately our reconstruction predicts the radial velocities of a sample of stars from the Geneva-Copenhagen survey (GCS). Using this very conservative external validation test based on the GCS, we find that there is little evidence for structure in the distribution function beyond the moving groups established prior to the Hipparcos mission. This is in sharp contrast with internal tests performed here and in previous analyses, which point consistently to maximal structure in the velocity distribution. We quantify the information content of the radial velocity measurements and find that the mean amount of new information gained from a radial velocity measurement of a single star is significant. This argues for complementary radial velocity surveys to upcoming astrometric surveys.

[4] oai:arXiv.org:0805.0759 [pdf] - 12358

Blind Date: Using proper motions to determine the ages of historical images

Barron, Jonathan T.; Hogg, David W.; Lang, Dustin; Roweis, Sam

Comments: submitted to AJ

Submitted: 2008-05-06

Astrometric calibration is based on patterns of cataloged stars and therefore effectively assumes a particular epoch, which can be substantially incorrect for historical images. With the known proper motions of stars we can "run back the clock" to an approximation of the night sky in any given year, and in principle the year that best fits stellar patterns in any given image is an estimate of the year in which that image was taken. In this paper we use 47 scanned photographic images of M44 spanning years 1910-1975 to demonstrate this technique. We use only the pixel information in the images; we use no prior information or meta-data about image pointing, scale, orientation, or date. Blind Date returns date meta-data for the input images. It also improves the astrometric calibration of the image because the final astrometric calibration is performed at the appropriate epoch. The accuracy and reliability of Blind Date are functions of image size, pointing, angular resolution, and depth; performance is related to the sum of proper-motion signal-to-noise ratios for catalog stars measured in the input image. All of the science-quality images and 85 percent of the low-quality images in our sample of photographic plate images of M44 have their dates reliably determined to within a decade, many to within months.

[5] oai:arXiv.org:0709.2358 [pdf] - 4974

Cleaning the USNO-B Catalog through automatic detection of optical artifacts

Barron, Jonathan T.; Stumm, Christopher; Hogg, David W.; Lang, Dustin; Roweis, Sam

Comments: published in AJ

Submitted: 2007-09-14, last modified: 2008-01-20

The USNO-B Catalog contains spurious entries that are caused by diffraction spikes and circular reflection halos around bright stars in the original imaging data. These spurious entries appear in the Catalog as if they were real stars; they are confusing for some scientific tasks. The spurious entries can be identified by simple computer vision techniques because they produce repeatable patterns on the sky. Some techniques employed here are variants of the Hough transform, one of which is sensitive to (two-dimensional) overdensities of faint stars in thin right-angle cross patterns centered on bright ($<13 \mag$) stars, and one of which is sensitive to thin annular overdensities centered on very bright ($<7 \mag$) stars. After enforcing conservative statistical requirements on spurious-entry identifications, we find that of the 1,042,618,261 entries in the USNO-B Catalog, 24,148,382 of them ($2.3 \percent$) are identified as spurious by diffraction-spike criteria and 196,133 ($0.02 \percent$) are identified as spurious by reflection-halo criteria. The spurious entries are often detected in more than 2 bands and are not overwhelmingly outliers in any photometric properties; they therefore cannot be rejected easily on other grounds, i.e., without the use of computer vision techniques. We demonstrate our method, and return to the community in electronic form a table of spurious entries in the Catalog.

[6] oai:arXiv.org:astro-ph/0703454 [pdf] - 90263

An Improved Photometric Calibration of the Sloan Digital Sky Survey Imaging Data

Comments: 16 pages, 17 figures, matches version accepted in ApJ. These calibrations are available at http://www.sdss.org/dr6

Submitted: 2007-03-19, last modified: 2007-10-19

We present an algorithm to photometrically calibrate wide field optical imaging surveys, that simultaneously solves for the calibration parameters and relative stellar fluxes using overlapping observations. The algorithm decouples the problem of "relative" calibrations, from that of "absolute" calibrations; the absolute calibration is reduced to determining a few numbers for the entire survey. We pay special attention to the spatial structure of the calibration errors, allowing one to isolate particular error modes in downstream analyses. Applying this to the Sloan Digital Sky Survey imaging data, we achieve ~1% relative calibration errors across 8500 sq.deg. in griz; the errors are ~2% for the u band. These errors are dominated by unmodelled atmospheric variations at Apache Point Observatory. These calibrations, dubbed "ubercalibration", are now public with SDSS Data Release 6, and will be a part of subsequent SDSS data releases.

[7] oai:arXiv.org:astro-ph/0606170 [pdf] - 82610

K-corrections and filter transformations in the ultraviolet, optical, and near infrared

Blanton, Michael R.; Roweis, Sam

Comments: 43 pages, 20 figures, submitted to AJ, software and full-resolution figures available at http://cosmo.nyu.edu/blanton/kcorrect

Submitted: 2006-06-07

Template fits to observed galaxy fluxes allow calculation of K-corrections and conversions among observations of galaxies at various wavelengths. We present a method for creating model-based template sets given a set of heterogeneous photometric and spectroscopic galaxy data. Our technique, non-negative matrix factorization, is akin to principle component analysis (PCA), except that it is constrained to produce nonnegative templates, it can use a basis set of models (rather than the delta function basis of PCA), and it naturally handles uncertainties, missing data, and heterogeneous data (including broad-band fluxes at various redshifts). The particular implementation we present here is suitable for ultraviolet, optical, and near-infrared observations in the redshift range 0 < z < 1.5. Since we base our templates on stellar population synthesis models, the results are intepretable in terms of approximate stellar masses and star-formation histories. We present templates fit with this method to data from GALEX, Sloan Digital Sky Survey spectroscopy and photometry, the Two-Micron All Sky Survey, the Deep Extragalactic Evolutionary Probe and the Great Observatories Origins Deep Survey. In addition, we present software for using such data to estimate K-corrections and stellar masses.

[8] oai:arXiv.org:astro-ph/0505057 [pdf] - 72827

Modeling complete distributions with incomplete observations: The velocity ellipsoid from Hipparcos data

Hogg, David W.; Blanton, Michael R.; Roweis, Sam T.; Johnston, Kathryn V.

Comments: ApJ accepted for publication

Submitted: 2005-05-03

[abridged] A "missing data" algorithm is developed to model (ie, reconstruct) the three-dimensional velocity distribution function of a sample of stars using data (velocity measurements) every one of which has one dimension unmeasured (the radial direction). It also accounts for covariant measurement uncertainties on the tangential velocity components. The algorithm is applied to tangential velocities measured in a kinematically unbiased sample of 11,865 stars taken from the Hipparcos catalog. The local stellar velocity distribution function of each of a set of 20 color-selected subsamples is modeled as a mixture of two three-dimensional Gaussian ellipsoids of arbitrary relative responsibility. In the fitting, one Gaussian (the "halo") is fixed at the known mean velocity and velocity variance tensor of the Galaxy halo, and the other (the "disk") is allowed to take arbitrary mean and arbitrary variance tensor. The mean and variance tensor (commonly the "velocity ellipsoid") of the disk velocity distribution are both found to be strong functions of stellar color, with long-lived populations showing larger velocity dispersion, slower mean rotation velocity, and smaller vertex deviation than short-lived populations. The local standard of rest (LSR) is inferred in the usual way and the Sun's motion relative to the LSR is found to be (U,V,W)_{\odot}=(10.1,4.0,6.7)+/-(0.5,0.8,0.2) km/s. Artificial data sets are made and analyzed, with the same error properties as the Hipparcos data, to demonstrate that the analysis is unbiased. The results are shown to be insensitive to the assumption that the velocity distributions are Gaussian.