sort results by

Use logical operators AND, OR, NOT and round brackets to construct complex queries. Whitespace-separated words are treated as ANDed.

Show articles per page in mode

Freeman, Peter E.

Normalized to: Freeman, P.

46 article(s) in total. 240 co-authors, from 1 to 12 common article(s). Median position in authors list is 3,0.

[1]  oai:arXiv.org:2001.03621  [pdf] - 2029805
Evaluation of probabilistic photometric redshift estimation approaches for LSST
Comments: submitted to MNRAS
Submitted: 2020-01-10
Many scientific investigations of photometric galaxy surveys require redshift estimates, whose uncertainty properties are best encapsulated by photometric redshift (photo-z) posterior probability density functions (PDFs). A plethora of photo-z PDF estimation methodologies abound, producing discrepant results with no consensus on a preferred approach. We present the results of a comprehensive experiment comparing twelve photo-z algorithms applied to mock data produced for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC). By supplying perfect prior information, in the form of the complete template library and a representative training set as inputs to each code, we demonstrate the impact of the assumptions underlying each technique on the output photo-z PDFs. In the absence of a notion of true, unbiased photo-z PDFs, we evaluate and interpret multiple metrics of the ensemble properties of the derived photo-z PDFs as well as traditional reductions to photo-z point estimates. We report systematic biases and overall over/under-breadth of the photo-z PDFs of many popular codes, which may indicate avenues for improvement in the algorithms or implementations. Furthermore, we raise attention to the limitations of established metrics for assessing photo-z PDF accuracy; though we identify the conditional density estimate (CDE) loss as a promising metric of photo-z PDF performance in the case where true redshifts are available but true photo-z PDFs are not, we emphasize the need for science-specific performance metrics.
[2]  oai:arXiv.org:1908.11523  [pdf] - 2031949
Conditional Density Estimation Tools in Python and R with Applications to Photometric Redshifts and Likelihood-Free Cosmological Inference
Comments: 27 pages, 7 figures, 4 tables
Submitted: 2019-08-29, last modified: 2019-12-20
It is well known in astronomy that propagating non-Gaussian prediction uncertainty in photometric redshift estimates is key to reducing bias in downstream cosmological analyses. Similarly, likelihood-free inference approaches, which are beginning to emerge as a tool for cosmological analysis, require a characterization of the full uncertainty landscape of the parameters of interest given observed data. However, most machine learning (ML) or training-based methods with open-source software target point prediction or classification, and hence fall short in quantifying uncertainty in complex regression and parameter inference settings. As an alternative to methods that focus on predicting the response (or parameters) $\mathbf{y}$ from features $\mathbf{x}$, we provide nonparametric conditional density estimation (CDE) tools for approximating and validating the entire probability density function (PDF) $\mathrm{p}(\mathbf{y}|\mathbf{x})$ of $\mathbf{y}$ given (i.e., conditional on) $\mathbf{x}$. As there is no one-size-fits-all CDE method, the goal of this work is to provide a comprehensive range of statistical tools and open-source software for nonparametric CDE and method assessment which can accommodate different types of settings and be easily fit to the problem at hand. Specifically, we introduce four CDE software packages in $\texttt{Python}$ and $\texttt{R}$ based on ML prediction methods adapted and optimized for CDE: $\texttt{NNKCDE}$, $\texttt{RFCDE}$, $\texttt{FlexCode}$, and $\texttt{DeepCDE}$. Furthermore, we present the $\texttt{cdetools}$ package, which includes functions for computing a CDE loss function for tuning and assessing the quality of individual PDFs, along with diagnostic functions. We provide sample code in $\texttt{Python}$ and $\texttt{R}$ as well as examples of applications to photometric redshift estimation and likelihood-free cosmological inference via CDE.
[3]  oai:arXiv.org:1809.02136  [pdf] - 1871405
Automated Distant Galaxy Merger Classifications from Space Telescope Images using the Illustris Simulation
Comments: 20 pages, 16 figures, MNRAS accepted version
Submitted: 2018-09-06, last modified: 2019-04-12
We present image-based evolution of galaxy mergers from the Illustris cosmological simulation at 12 time-steps over 0.5 < z < 5. To do so, we created approximately one million synthetic deep Hubble Space Telescope and James Webb Space Telescope images and measured common morphological indicators. Using the merger tree, we assess methods to observationally select mergers with stellar mass ratios as low as 10:1 completing within +/- 250 Myr of the mock observation. We confirm that common one- or two-dimensional statistics select mergers so defined with low purity and completeness, leading to high statistical errors. As an alternative, we train redshift-dependent random forests (RFs) based on 5-10 inputs. Cross-validation shows the RFs yield superior, yet still imperfect, measurements of the late-stage merger fraction, and they select more mergers in bulge-dominated galaxies. When applied to CANDELS morphology catalogs, the RFs estimate a merger rate increasing to at least z = 3, albeit two times higher than expected by theory. This suggests possible mismatches in the feedback-determined morphologies, but affirms the basic understanding of galaxy merger evolution. The RFs achieve completeness of roughly 70% at 0.5 < z < 3, and purity increasing from 10% at z = 0.5 to 60% at z = 3. At earlier times, the training sets are insufficient, motivating larger simulations and smaller time sampling. By blending large surveys and large simulations, such machine learning techniques offer a promising opportunity to teach us the strengths and weaknesses of inferences about galaxy evolution.
[4]  oai:arXiv.org:1903.06796  [pdf] - 1850859
Astro2020 Science White Paper: The Next Decade of Astroinformatics and Astrostatistics
Comments: Submitted to the Astro2020 Decadal Survey call for science white papers
Submitted: 2019-03-15
Over the past century, major advances in astronomy and astrophysics have been largely driven by improvements in instrumentation and data collection. With the amassing of high quality data from new telescopes, and especially with the advent of deep and large astronomical surveys, it is becoming clear that future advances will also rely heavily on how those data are analyzed and interpreted. New methodologies derived from advances in statistics, computer science, and machine learning are beginning to be employed in sophisticated investigations that are not only bringing forth new discoveries, but are placing them on a solid footing. Progress in wide-field sky surveys, interferometric imaging, precision cosmology, exoplanet detection and characterization, and many subfields of stellar, Galactic and extragalactic astronomy, has resulted in complex data analysis challenges that must be solved to perform scientific inference. Research in astrostatistics and astroinformatics will be necessary to develop the state-of-the-art methodology needed in astronomy. Overcoming these challenges requires dedicated, interdisciplinary research. We recommend: (1) increasing funding for interdisciplinary projects in astrostatistics and astroinformatics; (2) dedicating space and time at conferences for interdisciplinary research and promotion; (3) developing sustainable funding for long-term astrostatisics appointments; and (4) funding infrastructure development for data archives and archive support, state-of-the-art algorithms, and efficient computing.
[5]  oai:arXiv.org:1704.06273  [pdf] - 1621253
Intrinsic Alignment in redMaPPer clusters -- II. Radial alignment of satellites toward cluster centers
Comments: 25 pages, 16 figures, 7 tables, accepted to MNRAS. Main statistical analysis tool changed, with the results remain similar
Submitted: 2017-04-20, last modified: 2018-01-20
We study the orientations of satellite galaxies in redMaPPer clusters constructed from the Sloan Digital Sky Survey at $0.1<z<0.35$ to determine whether there is any preferential tendency for satellites to point radially toward cluster centers. We analyze the satellite alignment (SA) signal based on three shape measurement methods (re-Gaussianization, de Vaucouleurs, and isophotal shapes), which trace galaxy light profiles at different radii. The measured SA signal depends on these shape measurement methods. We detect the strongest SA signal in isophotal shapes, followed by de Vaucouleurs shapes. While no net SA signal is detected using re-Gaussianization shapes across the entire sample, the observed SA signal reaches a statistically significant level when limiting to a subsample of higher luminosity satellites. We further investigate the impact of noise, systematics, and real physical isophotal twisting effects in the comparison between the SA signal detected via different shape measurement methods. Unlike previous studies, which only consider the dependence of SA on a few parameters, here we explore a total of 17 galaxy and cluster properties, using a statistical model averaging technique to naturally account for parameter correlations and identify significant SA predictors. We find that the measured SA signal is strongest for satellites with the following characteristics: higher luminosity, smaller distance to the cluster center, rounder in shape, higher bulge fraction, and distributed preferentially along the major axis directions of their centrals. Finally, we provide physical explanations for the identified dependences, and discuss the connection to theories of SA.
[6]  oai:arXiv.org:1711.00660  [pdf] - 1641321
Stellar Multiplicity Meets Stellar Evolution And Metallicity: The APOGEE View
Comments: 13 pages, 13 figures, replaced with version accepted by ApJ
Submitted: 2017-11-02, last modified: 2018-01-15
We use the multi-epoch radial velocities acquired by the APOGEE survey to perform a large scale statistical study of stellar multiplicity for field stars in the Milky Way, spanning the evolutionary phases between the main sequence and the red clump. We show that the distribution of maximum radial velocity shifts (\drvm) for APOGEE targets is a strong function of \logg, with main sequence stars showing \drvm\ as high as $\sim$300 \kms, and steadily dropping down to $\sim$30 \kms\ for \logg$\sim$0, as stars climb up the Red Giant Branch (RGB). Red clump stars show a distribution of \drvm\ values comparable to that of stars at the tip of the RGB, implying they have similar multiplicity characteristics. The observed attrition of high \drvm\ systems in the RGB is consistent with a lognormal period distribution in the main sequence and a multiplicity fraction of 0.35, which is truncated at an increasing period as stars become physically larger and undergo mass transfer after Roche Lobe Overflow during H shell burning. The \drvm\ distributions also show that the multiplicity characteristics of field stars are metallicity dependent, with metal-poor ([Fe/H]$\lesssim-0.5$) stars having a multiplicity fraction a factor 2-3 higher than metal-rich ([Fe/H]$\gtrsim0.0$) stars. This has profound implications for the formation rates of interacting binaries observed by astronomical transient surveys and gravitational wave detectors, as well as the habitability of circumbinary planets.
[7]  oai:arXiv.org:1712.00432  [pdf] - 1626415
Mapping Jet-ISM Interactions in X-ray Binaries with ALMA: A GRS 1915$+$105 Case Study
Comments: 23 pages, 16 Figures, Accepted to MNRAS
Submitted: 2017-12-01
We present Atacama Large Millimetre/Sub-Millimetre Array (ALMA) observations of IRAS 19132+1035, a candidate jet-ISM interaction zone near the black hole X-ray binary (BHXB) GRS 1915+105. With these ALMA observations (combining data from the 12 m array and the Atacama Compact Array), we map the molecular line emission across the IRAS 19132+1035 region. We detect emission from the $^{12}$CO [$J=2-1$], $^{13}$CO [$\nu=0$, $J=2-1$], C$^{18}$O [$J=2-1$], ${\rm H}_{2}{\rm CO}$ [$J=3_{0,3}-2_{0,2}$], ${\rm H}_{2}{\rm CO}$ [$J=3_{2,2}-2_{2,1}$], ${\rm H}_{2}{\rm CO}$ [$J=3_{2,1}-2_{2,0}$], SiO [$\nu=0$, $J=5-4$], CH$_3$OH [$J=4_{2,2}-3_{1,2}$], and CS [$\nu=0$, $J=5-4$] transitions. Given the morphological, spectral, and kinematic properties of this molecular emission, we present several lines of evidence that support the presence of a jet-ISM interaction at this site, including a jet-blown cavity in the molecular gas. This compelling new evidence identifies this site as a jet-ISM interaction zone, making GRS 1915$+$105 the third Galactic BHXB with at least one conclusive jet-ISM interaction zone. However, we find that this interaction occurs on much smaller scales than was postulated by previous work, where the BHXB jet does not appear to be dominantly powering the entire IRAS 19132+1035 region. Using estimates of the ISM conditions in the region, we utilize the detected cavity as a calorimeter to estimate the time-averaged power carried in the GRS 1915+105 jets of $(8.4^{+7.7}_{-8.1})\times10^{32}\,{\rm erg\,s}^{-1}$. Overall, our analysis demonstrates that molecular lines are excellent diagnostic tools to identify and probe jet-ISM interaction zones near Galactic BHXBs.
[8]  oai:arXiv.org:1707.04592  [pdf] - 1585956
Local Two-Sample Testing: A New Tool for Analysing High-Dimensional Astronomical Data
Comments: 11 pages, 9 figures; accepted to MNRAS
Submitted: 2017-07-14
Modern surveys have provided the astronomical community with a flood of high-dimensional data, but analyses of these data often occur after their projection to lower-dimensional spaces. In this work, we introduce a local two-sample hypothesis test framework that an analyst may directly apply to data in their native space. In this framework, the analyst defines two classes based on a response variable of interest (e.g. higher-mass galaxies versus lower-mass galaxies) and determines at arbitrary points in predictor space whether the local proportions of objects that belong to the two classes significantly differs from the global proportion. Our framework has a potential myriad of uses throughout astronomy; here, we demonstrate its efficacy by applying it to a sample of 2487 i-band-selected galaxies observed by the HST ACS in four of the CANDELS program fields. For each galaxy, we have seven morphological summary statistics along with an estimated stellar mass and star-formation rate. We perform two studies: one in which we determine regions of the seven-dimensional space of morphological statistics where high-mass galaxies are significantly more numerous than low-mass galaxies, and vice-versa, and another study where we use SFR in place of mass. We find that we are able to identify such regions, and show how high-mass/low-SFR regions are associated with concentrated and undisturbed galaxies while galaxies in low-mass/high-SFR regions appear more extended and/or disturbed than their high-mass/low-SFR counterparts.
[9]  oai:arXiv.org:1703.09242  [pdf] - 1582160
A Unified Framework for Constructing, Tuning and Assessing Photometric Redshift Density Estimates in a Selection Bias Setting
Comments: 11 pages; accepted by MNRAS
Submitted: 2017-03-27
Photometric redshift estimation is an indispensable tool of precision cosmology. One problem that plagues the use of this tool in the era of large-scale sky surveys is that the bright galaxies that are selected for spectroscopic observation do not have properties that match those of (far more numerous) dimmer galaxies; thus, ill-designed empirical methods that produce accurate and precise redshift estimates for the former generally will not produce good estimates for the latter. In this paper, we provide a principled framework for generating conditional density estimates (i.e. photometric redshift PDFs) that takes into account selection bias and the covariate shift that this bias induces. We base our approach on the assumption that the probability that astronomers label a galaxy (i.e. determine its spectroscopic redshift) depends only on its measured (photometric and perhaps other) properties x and not on its true redshift. With this assumption, we can explicitly write down risk functions that allow us to both tune and compare methods for estimating importance weights (i.e. the ratio of densities of unlabeled and labeled galaxies for different values of x) and conditional densities. We also provide a method for combining multiple conditional density estimates for the same galaxy into a single estimate with better properties. We apply our risk functions to an analysis of approximately one million galaxies, mostly observed by SDSS, and demonstrate through multiple diagnostic tests that our method achieves good conditional density estimates for the unlabeled galaxies.
[10]  oai:arXiv.org:1702.07728  [pdf] - 1564077
The Varying Mass Distribution of Molecular Clouds Across M83
Comments: 13 pages, accepted to MNRAS
Submitted: 2017-02-24
The work of Adamo et al. (2015) showed that the mass distributions of young massive stellar clusters were truncated above a maximum-mass scale in the nearby galaxy M83 and that this truncation mass varies with galactocentric radius. Here, we present a cloud-based analysis of ALMA CO($1\to 0$) observations of M83 to search for such a truncation mass in the molecular cloud population. We identify a population of 873 molecular clouds in M83 that is largely similar to those found in the Milky Way and Local Group galaxies, though clouds in the centre of the galaxy show high surface densities and enhanced turbulence, as is common for clouds in high-density nuclear environments. Like the young massive clusters, we find a maximum-mass scale for the molecular clouds that decreases radially in the galaxy. We find the most massive young massive cluster tracks the most massive molecular cloud with the cluster mass being $10^{-2}$ times that of the most massive molecular cloud. Outside the nuclear region of M83 ($R_{g}>0.5$ kpc), there is no evidence for changing internal conditions in the population of molecular clouds, with the average internal pressures, densities, and free-fall times remaining constant for the cloud population over the galaxy. This result is consistent with the bound cluster formation efficiency depending only on the large-scale properties of the ISM, rather than the internal conditions of individual clouds.
[11]  oai:arXiv.org:1509.06376  [pdf] - 1530313
Detecting Effects of Filaments on Galaxy Properties in the Sloan Digital Sky Survey III
Comments: To appear in MNRAS
Submitted: 2015-09-21, last modified: 2017-01-12
We study the effects of filaments on galaxy properties in the Sloan Digital Sky Survey (SDSS) Data Release 12 using filaments from the `Cosmic Web Reconstruction' catalogue (Chen et al. 2016), a publicly available filament catalogue for SDSS. Since filaments are tracers of medium-to-high density regions, we expect that galaxy properties associated with the environment are dependent on the distance to the nearest filament. Our analysis demonstrates that a red galaxy or a high-mass galaxy tend to reside closer to filaments than a blue or low-mass galaxy. After adjusting the effect from stellar mass, on average, early-forming galaxies or large galaxies have a shorter distance to filaments than late-forming galaxies or small galaxies. For the Main galaxy sample (MGS), all signals are very significant ($>6\sigma$). For the LOWZ and CMASS sample, the stellar mass and size are significant ($>2 \sigma$). The filament effects we observe persist until $z = 0.7$ (the edge of the CMASS sample). Comparing our results to those using the galaxy distances from redMaPPer galaxy clusters as a reference, we find a similar result between filaments and clusters. Moreover, we find that the effect of clusters on the stellar mass of nearby galaxies depends on the galaxy's filamentary environment. Our findings illustrate the strong correlation of galaxy properties with proximity to density ridges, strongly supporting the claim that density ridges are good tracers of filaments.
[12]  oai:arXiv.org:1605.01065  [pdf] - 1457265
Intrinsic alignments in redMaPPer clusters -- I. Central galaxy alignments and angular segregation of satellites
Comments: matches version accepted to MNRAS; minor changes in presentation compared to v1, no changes to results
Submitted: 2016-05-03, last modified: 2016-08-04
The shapes of cluster central galaxies are not randomly oriented, but rather exhibit coherent alignments with the shapes of their parent clusters as well as with the surrounding large-scale structures. In this work, we aim to identify the galaxy and cluster quantities that most strongly predict the central galaxy alignment phenomenon among a large parameter space with a sample of 8237 clusters and 94817 members within 0.1<z<0.35, based on the redMaPPer cluster catalog constructed from the Sloan Digital Sky Survey. We first quantify the alignment between the projected central galaxy shapes and the distribution of member satellites, to understand what central galaxy and cluster properties most strongly correlate with these alignments. Next, we investigate the angular segregation of satellites with respect to their central galaxy major axis directions, to identify the satellite properties that most strongly predict their angular segregation. We find that central galaxies are more aligned with their member galaxy distributions in clusters that are more elongated and have higher richness, and for central galaxies with larger physical size, higher luminosity and centering probability, and redder color. Satellites with redder color, higher luminosity, located closer to the central galaxy, and with smaller ellipticity show a stronger angular segregation toward their central galaxy major axes. Finally, we provide physical explanations for some of the identified correlations, and discuss the connection to theories of central galaxy alignments, the impact of primordial alignments with tidal fields, and the importance of anisotropic accretion.
[13]  oai:arXiv.org:1604.01339  [pdf] - 1386744
Photo-z Estimation: An Example of Nonparametric Conditional Density Estimation under Selection Bias
Comments:
Submitted: 2016-04-05
Redshift is a key quantity for inferring cosmological model parameters. In photometric redshift estimation, cosmologists use the coarse data collected from the vast majority of galaxies to predict the redshift of individual galaxies. To properly quantify the uncertainty in the predictions, however, one needs to go beyond standard regression and instead estimate the full conditional density f(z|x) of a galaxy's redshift z given its photometric covariates x. The problem is further complicated by selection bias: usually only the rarest and brightest galaxies have known redshifts, and these galaxies have characteristics and measured covariates that do not necessarily match those of more numerous and dimmer galaxies of unknown redshift. Unfortunately, there is not much research on how to best estimate complex multivariate densities in such settings. Here we describe a general framework for properly constructing and assessing nonparametric conditional density estimators under selection bias, and for combining two or more estimators for optimal performance. We propose new improved photo-z estimators and illus- trate our methods on data from the Sloan Data Sky Survey and an application to galaxy-galaxy lensing. Although our main application is photo-z estimation, our methods are relevant to any high-dimensional regression setting with complicated asymmetric and multimodal distributions in the response variable.
[14]  oai:arXiv.org:1504.01751  [pdf] - 1358764
Beyond Spheroids and Discs: Classifications of CANDELS Galaxy Structure at 1.4 < z < 2 via Principal Component Analysis
Comments: 31 pages, 24 figures, accepted for publication in MNRAS
Submitted: 2015-04-07, last modified: 2016-02-08
Important but rare and subtle processes driving galaxy morphology and star-formation may be missed by traditional spiral, elliptical, irregular or S\'ersic bulge/disk classifications. To overcome this limitation, we use a principal component analysis of non-parametric morphological indicators (concentration, asymmetry, Gini coefficient, $M_{20}$, multi-mode, intensity and deviation) measured at rest-frame $B$-band (corresponding to HST/WFC3 F125W at 1.4 $< z <$ 2) to trace the natural distribution of massive ($>10^{10} M_{\odot}$) galaxy morphologies. Principal component analysis (PCA) quantifies the correlations between these morphological indicators and determines the relative importance of each. The first three principal components (PCs) capture $\sim$75 per cent of the variance inherent to our sample. We interpret the first principal component (PC) as bulge strength, the second PC as dominated by concentration and the third PC as dominated by asymmetry. Both PC1 and PC2 correlate with the visual appearance of a central bulge and predict galaxy quiescence. PC1 is a better predictor of quenching than stellar mass, as as good as other structural indicators (S\'ersic-n or compactness). We divide the PCA results into groups using an agglomerative hierarchical clustering method. Unlike S\'ersic, this classification scheme separates compact galaxies from larger, smooth proto-elliptical systems, and star-forming disk-dominated clumpy galaxies from star-forming bulge-dominated asymmetric galaxies. Distinguishing between these galaxy structural types in a quantitative manner is an important step towards understanding the connections between morphology, galaxy assembly and star-formation.
[15]  oai:arXiv.org:1509.06443  [pdf] - 1447640
Cosmic Web Reconstruction through Density Ridges: Catalogue
Comments: 14 pages, 12 figures, 4 tables
Submitted: 2015-09-21
We construct a catalogue for filaments using a novel approach called SCMS (subspace constrained mean shift; Ozertem & Erdogmus 2011; Chen et al. 2015). SCMS is a gradient-based method that detects filaments through density ridges (smooth curves tracing high-density regions). A great advantage of SCMS is its uncertainty measure, which allows an evaluation of the errors for the detected filaments. To detect filaments, we use data from the Sloan Digital Sky Survey, which consist of three galaxy samples: the NYU main galaxy sample (MGS), the LOWZ sample and the CMASS sample. Each of the three dataset covers different redshift regions so that the combined sample allows detection of filaments up to z = 0.7. Our filament catalogue consists of a sequence of two-dimensional filament maps at different redshifts that provide several useful statistics on the evolution cosmic web. To construct the maps, we select spectroscopically confirmed galaxies within 0.050 < z < 0.700 and partition them into 130 bins. For each bin, we ignore the redshift, treating the galaxy observations as a 2-D data and detect filaments using SCMS. The filament catalogue consists of 130 individual 2-D filament maps, and each map comprises points on the detected filaments that describe the filamentary structures at a particular redshift. We also apply our filament catalogue to investigate galaxy luminosity and its relation with distance to filament. Using a volume-limited sample, we find strong evidence (6.1$\sigma$ - 12.3$\sigma$) that galaxies close to filaments are generally brighter than those at significant distance from filaments.
[16]  oai:arXiv.org:1509.05619  [pdf] - 1279380
Prediction of galaxy ellipticities and reduction of shape noise in cosmic shear measurements
Comments: 6 pages, 3 figures. Submitted to MNRAS on Aug. 19
Submitted: 2015-09-18
The intrinsic scatter in the ellipticities of galaxies about the mean shape, known as "shape noise," is the most important source of noise in weak lensing shear measurements. Several approaches to reducing shape noise have recently been put forward, using information beyond photometry, such as radio polarization and optical spectroscopy. Here we investigate how well the intrinsic ellipticities of galaxies can be predicted using other, exclusively photometric parameters. These parameters (such as galaxy colours) are already available in the data and do not necessitate additional, often expensive observations. We apply two regression techniques, generalized additive models (GAM) and projection pursuit regression (PPR) to the publicly released data catalog of galaxy properties from CFHTLenS. In our simple analysis we find that the individual galaxy ellipticities can indeed be predicted from other photometric parameters to better precision than the scatter about the mean ellipticity. This means that without additional observations beyond photometry the ellipticity contribution to the shear can be measured to higher precision, comparable to using a larger sample of galaxies. Our best-fit model, achieved using PPR, yields a gain equivalent to having 114.3% more galaxies. Using only parameters unaffected by lensing (e.g.~surface brightness, colour), the gain is only ~12%.
[17]  oai:arXiv.org:1501.05303  [pdf] - 1288321
Cosmic Web Reconstruction through Density Ridges: Method and Algorithm
Comments: To appear in MNRAS. 18 pages, 19 figures, 1 table
Submitted: 2015-01-21, last modified: 2015-08-27
The detection and characterization of filamentary structures in the cosmic web allows cosmologists to constrain parameters that dictates the evolution of the Universe. While many filament estimators have been proposed, they generally lack estimates of uncertainty, reducing their inferential power. In this paper, we demonstrate how one may apply the Subspace Constrained Mean Shift (SCMS) algorithm (Ozertem and Erdogmus (2011); Genovese et al. (2012)) to uncover filamentary structure in galaxy data. The SCMS algorithm is a gradient ascent method that models filaments as density ridges, one-dimensional smooth curves that trace high-density regions within the point cloud. We also demonstrate how augmenting the SCMS algorithm with bootstrap-based methods of uncertainty estimation allows one to place uncertainty bands around putative filaments. We apply the SCMS method to datasets sampled from the P3M N-body simulation, with galaxy number densities consistent with SDSS and WFIRST-AFTA and to LOWZ and CMASS data from the Baryon Oscillation Spectroscopic Survey (BOSS). To further assess the efficacy of SCMS, we compare the relative locations of BOSS filaments with galaxy clusters in the redMaPPer catalog, and find that redMaPPer clusters are significantly closer (with p-values $< 10^{-9}$) to SCMS-detected filaments than to randomly selected galaxies.
[18]  oai:arXiv.org:1508.04149  [pdf] - 1300265
Investigating Galaxy-Filament Alignments in Hydrodynamic Simulations using Density Ridges
Comments: 11 pages, 10 figures
Submitted: 2015-08-17
In this paper, we study the filamentary structures and the galaxy alignment along filaments at redshift $z=0.06$ in the MassiveBlack-II simulation, a state-of-the-art, high-resolution hydrodynamical cosmological simulation which includes stellar and AGN feedback in a volume of (100 Mpc$/h$)$^3$. The filaments are constructed using the subspace constrained mean shift (SCMS; Ozertem & Erdogmus (2011) and Chen et al. (2015a)). First, we show that reconstructed filaments using galaxies and reconstructed filaments using dark matter particles are similar to each other; over $50\%$ of the points on the galaxy filaments have a corresponding point on the dark matter filaments within distance $0.13$ Mpc$/h$ (and vice versa) and this distance is even smaller at high-density regions. Second, we observe the alignment of the major principal axis of a galaxy with respect to the orientation of its nearest filament and detect a $2.5$ Mpc$/h$ critical radius for filament's influence on the alignment when the subhalo mass of this galaxy is between $10^9M_\odot/h$ and $10^{12}M_\odot/h$. Moreover, we find the alignment signal to increase significantly with the subhalo mass. Third, when a galaxy is close to filaments (less than $0.25$ Mpc$/h$), the galaxy alignment toward the nearest galaxy group depends on the galaxy subhalo mass. Finally, we find that galaxies close to filaments or groups tend to be rounder than those away from filaments or groups.
[19]  oai:arXiv.org:1409.1583  [pdf] - 1240868
Diverse Structural Evolution at z > 1 in Cosmologically Simulated Galaxies
Comments: 23 pages, 16 figures, MNRAS accepted version
Submitted: 2014-09-04, last modified: 2015-07-02
From mock Hubble Space Telescope images, we quantify non-parametric statistics of galaxy morphology, thereby predicting the emergence of relationships among stellar mass, star formation, and observed rest-frame optical structure at 1 < z < 3. We measure automated diagnostics of galaxy morphology in cosmological simulations of the formation of 22 central galaxies with 9.3 < log10 M_*/M_sun < 10.7. These high-spatial-resolution zoom-in calculations enable accurate modeling of the rest-frame UV and optical morphology. Even with small numbers of galaxies, we find that structural evolution is neither universal nor monotonic: galaxy interactions can trigger either bulge or disc formation, and optically bulge-dominated galaxies at this mass may not remain so forever. Simulated galaxies with M_* > 10^10 M_sun contain relatively more disc-dominated light profiles than those with lower mass, reflecting significant disc brightening in some haloes at 1 < z < 2. By this epoch, simulated galaxies with specific star formation rates below 10^-9.7 yr^-1 are more likely than normal star-formers to have a broader mix of structural types, especially at M_* > 10^10 M_sun. We analyze a cosmological major merger at z ~ 1.5 and find that the newly proposed MID morphology diagnostics trace later merger stages while G-M20 trace earlier ones. MID is sensitive also to clumpy star-forming discs. The observability time of typical MID-enhanced events in our simulation sample is less than 100 Myr. A larger sample of cosmological assembly histories may be required to calibrate such diagnostics in the face of their sensitivity to viewing angle, segmentation algorithm, and various phenomena such as clumpy star formation and minor mergers.
[20]  oai:arXiv.org:1406.7536  [pdf] - 844312
Estimating the distribution of Galaxy Morphologies on a continuous space
Comments: 4 pages, 3 figures, Statistical Challenges in 21st Century Cosmology, Proceedings IAU Symposium No. 306, 2014
Submitted: 2014-06-29
The incredible variety of galaxy shapes cannot be summarized by human defined discrete classes of shapes without causing a possibly large loss of information. Dictionary learning and sparse coding allow us to reduce the high dimensional space of shapes into a manageable low dimensional continuous vector space. Statistical inference can be done in the reduced space via probability distribution estimation and manifold estimation.
[21]  oai:arXiv.org:1404.3168  [pdf] - 809422
Functional Regression for Quasar Spectra
Comments:
Submitted: 2014-04-11
The Lyman-alpha forest is a portion of the observed light spectrum of distant galactic nuclei which allows us to probe remote regions of the Universe that are otherwise inaccessible. The observed Lyman-alpha forest of a quasar light spectrum can be modeled as a noisy realization of a smooth curve that is affected by a `damping effect' which occurs whenever the light emitted by the quasar travels through regions of the Universe with higher matter concentration. To decode the information conveyed by the Lyman-alpha forest about the matter distribution, we must be able to separate the smooth `continuum' from the noise and the contribution of the damping effect in the quasar light spectra. To predict the continuum in the Lyman-alpha forest, we use a nonparametric functional regression model in which both the response and the predictor variable (the smooth part of the damping-free portion of the spectrum) are function-valued random variables. We demonstrate that the proposed method accurately predicts the unobservable continuum in the Lyman-alpha forest both on simulated spectra and real spectra. Also, we introduce distribution-free prediction bands for the nonparametric functional regression model that have finite sample guarantees. These prediction bands, together with bootstrap-based confidence bands for the projection of the mean continuum on a fixed number of principal components, allow us to assess the degree of uncertainty in the model predictions.
[22]  oai:arXiv.org:1401.1867  [pdf] - 1202636
Nonparametric 3D map of the IGM using the Lyman-alpha forest
Comments:
Submitted: 2014-01-08
Visualizing the high-redshift Universe is difficult due to the dearth of available data; however, the Lyman-alpha forest provides a means to map the intergalactic medium at redshifts not accessible to large galaxy surveys. Large-scale structure surveys, such as the Baryon Oscillation Spectroscopic Survey (BOSS), have collected quasar (QSO) spectra that enable the reconstruction of HI density fluctuations. The data fall on a collection of lines defined by the lines-of-sight (LOS) of the QSO, and a major issue with producing a 3D reconstruction is determining how to model the regions between the LOS. We present a method that produces a 3D map of this relatively uncharted portion of the Universe by employing local polynomial smoothing, a nonparametric methodology. The performance of the method is analyzed on simulated data that mimics the varying number of LOS expected in real data, and then is applied to a sample region selected from BOSS. Evaluation of the reconstruction is assessed by considering various features of the predicted 3D maps including visual comparison of slices, PDFs, counts of local minima and maxima, and standardized correlation functions. This 3D reconstruction allows for an initial investigation of the topology of this portion of the Universe using persistent homology.
[23]  oai:arXiv.org:1306.1238  [pdf] - 1171844
New Image Statistics for Detecting Disturbed Galaxy Morphologies at High Redshift
Comments: 15 pages, 14 figures, accepted for publication in MNRAS
Submitted: 2013-06-05
Testing theories of hierarchical structure formation requires estimating the distribution of galaxy morphologies and its change with redshift. One aspect of this investigation involves identifying galaxies with disturbed morphologies (e.g., merging galaxies). This is often done by summarizing galaxy images using, e.g., the CAS and Gini-M20 statistics of Conselice (2003) and Lotz et al. (2004), respectively, and associating particular statistic values with disturbance. We introduce three statistics that enhance detection of disturbed morphologies at high-redshift (z ~ 2): the multi-mode (M), intensity (I), and deviation (D) statistics. We show their effectiveness by training a machine-learning classifier, random forest, using 1,639 galaxies observed in the H band by the Hubble Space Telescope WFC3, galaxies that had been previously classified by eye by the CANDELS collaboration (Grogin et al. 2011, Koekemoer et al. 2011). We find that the MID statistics (and the A statistic of Conselice 2003) are the most useful for identifying disturbed morphologies. We also explore whether human annotators are useful for identifying disturbed morphologies. We demonstrate that they show limited ability to detect disturbance at high redshift, and that increasing their number beyond approximately 10 does not provably yield better classification performance. We propose a simulation-based model-fitting algorithm that mitigates these issues by bypassing annotation.
[24]  oai:arXiv.org:1105.6344  [pdf] - 489781
Prototype selection for parameter estimation in complex models
Comments: Published in at http://dx.doi.org/10.1214/11-AOAS500 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Submitted: 2011-05-31, last modified: 2012-03-20
Parameter estimation in astrophysics often requires the use of complex physical models. In this paper we study the problem of estimating the parameters that describe star formation history (SFH) in galaxies. Here, high-dimensional spectral data from galaxies are appropriately modeled as linear combinations of physical components, called simple stellar populations (SSPs), plus some nonlinear distortions. Theoretical data for each SSP is produced for a fixed parameter vector via computer modeling. Though the parameters that define each SSP are continuous, optimizing the signal model over a large set of SSPs on a fine parameter grid is computationally infeasible and inefficient. The goal of this study is to estimate the set of parameters that describes the SFH of each galaxy. These target parameters, such as the average ages and chemical compositions of the galaxy's stellar populations, are derived from the SSP parameters and the component weights in the signal model. Here, we introduce a principled approach of choosing a small basis of SSP prototypes for SFH parameter estimation. The basic idea is to quantize the vector space and effective support of the model components. In addition to greater computational efficiency, we achieve better estimates of the SFH target parameters. In simulations, our proposed quantization method obtains a substantial improvement in estimating the target parameters over the common method of employing a parameter grid. Sparse coding techniques are not appropriate for this problem without proper constraints, while constrained sparse coding methods perform poorly for parameter estimation because their objective is signal reconstruction, not estimation of the target parameters.
[25]  oai:arXiv.org:1111.0911  [pdf] - 434117
Exploiting Non-Linear Structure in Astronomical Data for Improved Statistical Inference
Comments: Invited talk at SCMA V, Penn State University, June 2011, PA. To appear in the Proceedings of "Statistical Challenges in Modern Astronomy V"
Submitted: 2011-11-03
Many estimation problems in astrophysics are highly complex, with high-dimensional, non-standard data objects (e.g., images, spectra, entire distributions, etc.) that are not amenable to formal statistical analysis. To utilize such data and make accurate inferences, it is crucial to transform the data into a simpler, reduced form. Spectral kernel methods are non-linear data transformation methods that efficiently reveal the underlying geometry of observable data. Here we focus on one particular technique: diffusion maps or more generally spectral connectivity analysis (SCA). We give examples of applications in astronomy; e.g., photometric redshift estimation, prototype selection for estimation of star formation history, and supernova light curve classification. We outline some computational and statistical challenges that remain, and we discuss some promising future directions for astronomy and data mining.
[26]  oai:arXiv.org:1103.6034  [pdf] - 1053061
Semi-supervised Learning for Photometric Supernova Classification
Comments: 16 pages, 11 figures, accepted for publication in MNRAS
Submitted: 2011-03-30, last modified: 2011-09-27
We present a semi-supervised method for photometric supernova typing. Our approach is to first use the nonlinear dimension reduction technique diffusion map to detect structure in a database of supernova light curves and subsequently employ random forest classification on a spectroscopically confirmed training set to learn a model that can predict the type of each newly observed supernova. We demonstrate that this is an effective method for supernova typing. As supernova numbers increase, our semi-supervised method efficiently utilizes this information to improve classification, a property not enjoyed by template based methods. Applied to supernova data simulated by Kessler et al. (2010b) to mimic those of the Dark Energy Survey, our methods achieve (cross-validated) 95% Type Ia purity and 87% Type Ia efficiency on the spectroscopic sample, but only 50% Type Ia purity and 50% efficiency on the photometric sample due to their spectroscopic follow-up strategy. To improve the performance on the photometric sample, we search for better spectroscopic follow-up procedures by studying the sensitivity of our machine learned supernova classification on the specific strategy used to obtain training sets. With a fixed amount of spectroscopic follow-up time, we find that deeper magnitude-limited spectroscopic surveys are better for producing training sets. For supernova Ia (II-P) typing, we obtain a 44% (1%) increase in purity to 72% (87%) and 30% (162%) increase in efficiency to 65% (84%) of the sample using a 25th (24.5th) magnitude-limited survey instead of the shallower spectroscopic sample used in the original simulations. When redshift information is available, we incorporate it into our analysis using a novel method of altering the diffusion map representation of the supernovae. Incorporating host redshifts leads to a 5% improvement in Type Ia purity and 13% improvement in Type Ia efficiency.
[27]  oai:arXiv.org:1010.0677  [pdf] - 1041050
The XMM Cluster Survey: X-ray analysis methodology
Comments: MNRAS accepted, 45 pages, 38 figures. Our companion paper describing our optical analysis methodology and presenting a first set of confirmed clusters has now been submitted to MNRAS
Submitted: 2010-10-04, last modified: 2011-06-15
The XMM Cluster Survey (XCS) is a serendipitous search for galaxy clusters using all publicly available data in the XMM-Newton Science Archive. Its main aims are to measure cosmological parameters and trace the evolution of X-ray scaling relations. In this paper we describe the data processing methodology applied to the 5,776 XMM observations used to construct the current XCS source catalogue. A total of 3,675 > 4-sigma cluster candidates with > 50 background-subtracted X-ray counts are extracted from a total non-overlapping area suitable for cluster searching of 410 deg^2. Of these, 993 candidates are detected with > 300 background-subtracted X-ray photon counts, and we demonstrate that robust temperature measurements can be obtained down to this count limit. We describe in detail the automated pipelines used to perform the spectral and surface brightness fitting for these candidates, as well as to estimate redshifts from the X-ray data alone. A total of 587 (122) X-ray temperatures to a typical accuracy of < 40 (< 10) per cent have been measured to date. We also present the methodology adopted for determining the selection function of the survey, and show that the extended source detection algorithm is robust to a range of cluster morphologies by inserting mock clusters derived from hydrodynamical simulations into real XMM images. These tests show that the simple isothermal beta-profiles is sufficient to capture the essential details of the cluster population detected in the archival XMM observations. The redshift follow-up of the XCS cluster sample is presented in a companion paper, together with a first data release of 503 optically-confirmed clusters.
[28]  oai:arXiv.org:1103.1603  [pdf] - 1052567
An Unbiased Method of Modeling the Local Peculiar Velocity Field with Type Ia Supernovae
Comments: 53 pages, 19 figures, accepted for publication in The Astrophysical Journal
Submitted: 2011-03-08
We apply statistically rigorous methods of nonparametric risk estimation to the problem of inferring the local peculiar velocity field from nearby supernovae (SNIa). We use two nonparametric methods - Weighted Least Squares (WLS) and Coefficient Unbiased (CU) - both of which employ spherical harmonics to model the field and use the estimated risk to determine at which multipole to truncate the series. We show that if the data are not drawn from a uniform distribution or if there is power beyond the maximum multipole in the regression, a bias is introduced on the coefficients using WLS. CU estimates the coefficients without this bias by including the sampling density making the coefficients more accurate but not necessarily modeling the velocity field more accurately. After applying nonparametric risk estimation to SNIa data, we find that there are not enough data at this time to measure power beyond the dipole. The WLS Local Group bulk flow is moving at 538 +- 86 km/s towards (l,b) = (258 +- 10 deg, 36 +- 11 deg) and the CU bulk flow is moving at 446 +- 101 km/s towards (l,b) = (273 +- 11 deg, 46 +- 8 deg). We find that the magnitude and direction of these measurements are in agreement with each other and previous results in the literature.
[29]  oai:arXiv.org:1006.4334  [pdf] - 1360716
On Computing Upper Limits to Source Intensities
Comments: 30 pages, 12 figures, accepted in ApJ
Submitted: 2010-06-22
A common problem in astrophysics is determining how bright a source could be and still not be detected. Despite the simplicity with which the problem can be stated, the solution involves complex statistical issues that require careful analysis. In contrast to the confidence bound, this concept has never been formally analyzed, leading to a great variety of often ad hoc solutions. Here we formulate and describe the problem in a self-consistent manner. Detection significance is usually defined by the acceptable proportion of false positives (the TypeI error), and we invoke the complementary concept of false negatives (the TypeII error), based on the statistical power of a test, to compute an upper limit to the detectable source intensity. To determine the minimum intensity that a source must have for it to be detected, we first define a detection threshold, and then compute the probabilities of detecting sources of various intensities at the given threshold. The intensity that corresponds to the specified TypeII error probability defines that minimum intensity, and is identified as the upper limit. Thus, an upper limit is a characteristic of the detection procedure rather than the strength of any particular source and should not be confused with confidence intervals or other estimates of source intensity. This is particularly important given the large number of catalogs that are being generated from increasingly sensitive surveys. We discuss the differences between these upper limits and confidence bounds. Both measures are useful quantities that should be reported in order to extract the most science from catalogs, though they answer different statistical questions: an upper bound describes an inference range on the source intensity, while an upper limit calibrates the detection process. We provide a recipe for computing upper limits that applies to all detection algorithms.
[30]  oai:arXiv.org:0906.0995  [pdf] - 1002464
Photometric Redshift Estimation Using Spectral Connectivity Analysis
Comments: Resubmitted to MNRAS (11 pages, 8 figures)
Submitted: 2009-06-04
The development of fast and accurate methods of photometric redshift estimation is a vital step towards being able to fully utilize the data of next-generation surveys within precision cosmology. In this paper we apply a specific approach to spectral connectivity analysis (SCA; Lee & Wasserman 2009) called diffusion map. SCA is a class of non-linear techniques for transforming observed data (e.g., photometric colours for each galaxy, where the data lie on a complex subset of p-dimensional space) to a simpler, more natural coordinate system wherein we apply regression to make redshift predictions. As SCA relies upon eigen-decomposition, our training set size is limited to ~ 10,000 galaxies; we use the Nystrom extension to quickly estimate diffusion coordinates for objects not in the training set. We apply our method to 350,738 SDSS main sample galaxies, 29,816 SDSS luminous red galaxies, and 5,223 galaxies from DEEP2 with CFHTLS ugriz photometry. For all three datasets, we achieve prediction accuracies on par with previous analyses, and find that use of the Nystrom extension leads to a negligible loss of prediction accuracy relative to that achieved with the training sets. As in some previous analyses (e.g., Collister & Lahav 2004, Ball et al. 2008), we observe that our predictions are generally too high (low) in the low (high) redshift regimes. We demonstrate that this is a manifestation of attenuation bias, wherein measurement error (i.e., uncertainty in diffusion coordinates due to uncertainty in the measured fluxes/magnitudes) reduces the slope of the best-fit regression line. Mitigation of this bias is necessary if we are to use photometric redshift estimates produced by computationally efficient empirical methods in precision cosmology.
[31]  oai:arXiv.org:0905.4683  [pdf] - 1002383
Accurate parameter estimation for star formation history in galaxies using SDSS spectra
Comments: Resubmitted to MNRAS; 16 pages, 15 figures
Submitted: 2009-05-28
To further our knowledge of the complex physical process of galaxy formation, it is essential that we characterize the formation and evolution of large databases of galaxies. The spectral synthesis STARLIGHT code of Cid Fernandes et al. (2004) was designed for this purpose. Results of STARLIGHT are highly dependent on the choice of input basis of simple stellar population (SSP) spectra. Speed of the code, which uses random walks through the parameter space, scales as the square of the number of basis spectra, making it computationally necessary to choose a small number of SSPs that are coarsely sampled in age and metallicity. In this paper, we develop methods based on diffusion map (Lafon & Lee, 2006) that, for the first time, choose appropriate bases of prototype SSP spectra from a large set of SSP spectra designed to approximate the continuous grid of age and metallicity of SSPs of which galaxies are truly composed. We show that our techniques achieve better accuracy of physical parameter estimation for simulated galaxies. Specifically, we show that our methods significantly decrease the age-metallicity degeneracy that is common in galaxy population synthesis methods. We analyze a sample of 3046 galaxies in SDSS DR6 and compare the parameter estimates obtained from different basis choices.
[32]  oai:arXiv.org:0805.4136  [pdf] - 12977
Inference for the dark energy equation of state using Type IA supernova data
Comments: Published in at http://dx.doi.org/10.1214/08-AOAS229 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Submitted: 2008-05-27, last modified: 2009-05-18
The surprising discovery of an accelerating universe led cosmologists to posit the existence of "dark energy"--a mysterious energy field that permeates the universe. Understanding dark energy has become the central problem of modern cosmology. After describing the scientific background in depth, we formulate the task as a nonlinear inverse problem that expresses the comoving distance function in terms of the dark-energy equation of state. We present two classes of methods for making sharp statistical inferences about the equation of state from observations of Type Ia Supernovae (SNe). First, we derive a technique for testing hypotheses about the equation of state that requires no assumptions about its form and can distinguish among competing theories. Second, we present a framework for computing parametric and nonparametric estimators of the equation of state, with an associated assessment of uncertainty. Using our approach, we evaluate the strength of statistical evidence for various competing models of dark energy. Consistent with current studies, we find that with the available Type Ia SNe data, it is not possible to distinguish statistically among popular dark-energy models, and that, in particular, there is no support in the data for rejecting a cosmological constant. With much more supernova data likely to be available in coming years (e.g., from the DOE/NASA Joint Dark Energy Mission), we address the more interesting question of whether future data sets will have sufficient resolution to distinguish among competing theories.
[33]  oai:arXiv.org:0802.4462  [pdf] - 1934218
The XMM Cluster Survey: Forecasting cosmological and cluster scaling-relation parameter constraints
Comments: 28 pages, 17 figures. Revised version, as accepted for publication in MNRAS. High-resolution figures available at http://xcs-home.org (under "Publications")
Submitted: 2008-02-29, last modified: 2009-04-26
We forecast the constraints on the values of sigma_8, Omega_m, and cluster scaling relation parameters which we expect to obtain from the XMM Cluster Survey (XCS). We assume a flat Lambda-CDM Universe and perform a Monte Carlo Markov Chain analysis of the evolution of the number density of galaxy clusters that takes into account a detailed simulated selection function. Comparing our current observed number of clusters shows good agreement with predictions. We determine the expected degradation of the constraints as a result of self-calibrating the luminosity-temperature relation (with scatter), including temperature measurement errors, and relying on photometric methods for the estimation of galaxy cluster redshifts. We examine the effects of systematic errors in scaling relation and measurement error assumptions. Using only (T,z) self-calibration, we expect to measure Omega_m to +-0.03 (and Omega_Lambda to the same accuracy assuming flatness), and sigma_8 to +-0.05, also constraining the normalization and slope of the luminosity-temperature relation to +-6 and +-13 per cent (at 1sigma) respectively in the process. Self-calibration fails to jointly constrain the scatter and redshift evolution of the luminosity-temperature relation significantly. Additional archival and/or follow-up data will improve on this. We do not expect measurement errors or imperfect knowledge of their distribution to degrade constraints significantly. Scaling-relation systematics can easily lead to cosmological constraints 2sigma or more away from the fiducial model. Our treatment is the first exact treatment to this level of detail, and introduces a new `smoothed ML' estimate of expected constraints.
[34]  oai:arXiv.org:0809.2800  [pdf] - 16406
Revealing components of the galaxy population through nonparametric techniques
Comments: 12 pages, 10 figures, accepted for publication in MNRAS
Submitted: 2008-09-16
The distributions of galaxy properties vary with environment, and are often multimodal, suggesting that the galaxy population may be a combination of multiple components. The behaviour of these components versus environment holds details about the processes of galaxy development. To release this information we apply a novel, nonparametric statistical technique, identifying four components present in the distribution of galaxy H$\alpha$ emission-line equivalent-widths. We interpret these components as passive, star-forming, and two varieties of active galactic nuclei. Independent of this interpretation, the properties of each component are remarkably constant as a function of environment. Only their relative proportions display substantial variation. The galaxy population thus appears to comprise distinct components which are individually independent of environment, with galaxies rapidly transitioning between components as they move into denser environments.
[35]  oai:arXiv.org:0807.2900  [pdf] - 314999
Exploiting Low-Dimensional Structure in Astronomical Spectra
Comments: 24 pages, 8 figures
Submitted: 2008-07-18
Dimension-reduction techniques can greatly improve statistical inference in astronomy. A standard approach is to use Principal Components Analysis (PCA). In this work we apply a recently-developed technique, diffusion maps, to astronomical spectra for data parameterization and dimensionality reduction, and develop a robust, eigenmode-based framework for regression. We show how our framework provides a computationally efficient means by which to predict redshifts of galaxies, and thus could inform more expensive redshift estimators such as template cross-correlation. It also provides a natural means by which to identify outliers (e.g., misclassified spectra, spectra with anomalous features). We analyze 3835 SDSS spectra and show how our framework yields a more than 95% reduction in dimensionality. Finally, we show that the prediction error of the diffusion map-based regression approach is markedly smaller than that of a similar approach based on PCA, clearly demonstrating the superiority of diffusion maps over PCA for this regression task.
[36]  oai:arXiv.org:astro-ph/0510844  [pdf] - 77340
Massive Science with VO and Grids
Comments: Invited talk at ADASSXV conference published as ASP Conference Series, Vol. XXX, 2005 C. Gabriel, C. Arviset, D. Ponz and E. Solano, eds. 9 pages
Submitted: 2005-10-31
There is a growing need for massive computational resources for the analysis of new astronomical datasets. To tackle this problem, we present here our first steps towards marrying two new and emerging technologies; the Virtual Observatory (e.g, AstroGrid) and the computational grid (e.g. TeraGrid, COSMOS etc.). We discuss the construction of VOTechBroker, which is a modular software tool designed to abstract the tasks of submission and management of a large number of computational jobs to a distributed computer system. The broker will also interact with the AstroGrid workflow and MySpace environments. We discuss our planned usages of the VOTechBroker in computing a huge number of n-point correlation functions from the SDSS data and massive model-fitting of millions of CMBfast models to WMAP data. We also discuss other applications including the determination of the XMM Cluster Survey selection function and the construction of new WMAP maps.
[37]  oai:arXiv.org:astro-ph/0510406  [pdf] - 260658
Examining the Effect of the Map-Making Algorithm on Observed Power Asymmetry in WMAP Data
Comments: 45 pages, 16 figures (21 figure files), high-resolution versions of Figures 1-3 at http://www.stat.cmu.edu/~pfreeman, accepted for publication in ApJ
Submitted: 2005-10-13
We analyze first-year data of WMAP to determine the significance of asymmetry in summed power between arbitrarily defined opposite hemispheres, using maps that we create ourselves with software developed independently of the WMAP team. We find that over the multipole range l=[2,64], the significance of asymmetry is ~ 10^-4, a value insensitive to both frequency and power spectrum. We determine the smallest multipole ranges exhibiting significant asymmetry, and find twelve, including l=[2,3] and [6,7], for which the significance -> 0. In these ranges there is an improbable association between the direction of maximum significance and the ecliptic plane (p ~ 0.01). Also, contours of least significance follow great circles inclined relative to the ecliptic at the largest scales. The great circle for l=[2,3] passes over previously reported preferred axes and is insensitive to frequency, while the great circle for l=[6,7] is aligned with the ecliptic poles. We examine how changing map-making parameters affects asymmetry, and find that at large scales, it is rendered insignificant if the magnitude of the WMAP dipole vector is increased by approximately 1-3 sigma (or 2-6 km/s). While confirmation of this result would require data recalibration, such a systematic change would be consistent with observations of frequency-independent asymmetry. We conclude that the use of an incorrect dipole vector, in combination with a systematic or foreground process associated with the ecliptic, may help to explain the observed asymmetry.
[38]  oai:arXiv.org:astro-ph/0501056  [pdf] - 70168
Chandra Observations of MBM12 and Models of the Local Bubble
Comments: 25 pages, 5 figures, Accepted by the Astrophysical Journal
Submitted: 2005-01-04
Chandra observations toward the nearby molecular cloud MBM12 show unexpectedly strong and nearly equal foreground O VIII and O VII emission. As the observed portion of MBM12 is optically thick at these energies, the emission lines must be formed nearby, coming either from the Local Bubble (LB) or charge exchange with ions from the Sun. Equilibrium models for the LB predict stronger O VII than O VIII, so these results suggest that the LB is far from equilibrium or a substantial portion of O VIII is from another source, such as charge exchange within the Solar system. Despite the likely contamination, we can combine our results with other EUV and X-ray observations to reject LB models which posit a cool recombining plasma as the source of LB X-rays.
[39]  oai:arXiv.org:astro-ph/0308493  [pdf] - 554543
Chandra Multi-wavelength Project (ChaMP). II. First Results of X-ray Source Properties
Comments: 26 pages, PDF, including 9 figures. Accepted in Aug 2003 for publication in ApJ. See also an accompanying X-ray paper by Kim et al. (2003) and a follow-up optical paper by Green et al. (2003) in the ChaMP web site http://hea-www.harvard.edu/CHAMP/
Submitted: 2003-08-27
We present the first results of ChaMP X-ray source properties obtained from the initial sample of 62 observations. The data have been uniformly reduced and analyzed with techniques specifically developed for the ChaMP and then validated by visual examination. Utilizing only near on-axis, bright X-ray sources (to avoid problems caused by incompleteness and the Eddington bias), we derive the Log(N)-Log(S) relation in soft (0.5-2 keV) and hard (2-8 keV) energy bands. The ChaMP data are consistent with previous results of ROSAT, ASCA and Chandra deep surveys. In particular, our data nicely fill in the flux gap in the hard band between the Chandra Deep Field data and the previous ASCA data. We check whether there is any systematic difference in the source density between cluster and non-cluster fields and also search for field-to-field variations, both of which have been previously reported. We found no significant field-to-field cosmic variation in either test within the statistics (~1 sigma) across the flux levels included in our sample. In the X-ray color-color plot, most sources fall in the location characterized by photon index = 1.5-2 and NH = a few x 10^20 cm^2, suggesting that they are typical broad-line AGNs. There also exist a considerable number of sources with peculiar X-ray colors (e.g., highly absorbed, very hard, very soft). We confirm a trend that on average the X-ray color hardens as the count rate decreases. Since the hardening is confined to the softest energy band (0.3-0.9 keV), we conclude it is most likely due to absorption. We cross-correlate the X-ray sources with other catalogs and describe their properties in terms of optical color, X-ray-to-optical luminosity ratio and X-ray colors.
[40]  oai:arXiv.org:astro-ph/0308492  [pdf] - 554542
Chandra Multi-wavelength Project (ChaMP). I. First X-ray Source Catalog
Comments: 85 pages, PDF, including 9 tables and 15 figures. Accepted in Aug 2003 for publication in ApJ Supplement. For a full paper, visit the ChaMP web site http://hea-www.harvard.edu/CHAMP/. See also an accompanying X-ray paper by Kim et al. (2003) and a follow-up optical paper by Green et al. (2003)
Submitted: 2003-08-27
The Chandra Multi-wavelength Project (ChaMP) is a wide-area (~14 deg^2) survey of serendipitous Chandra X-ray sources, aiming to establish fair statistical samples covering a wide range of characteristics (such as absorbed AGNs, high z clusters of galaxies) at flux levels (fX ~ 10^-15 - 10^-14 erg sec-1 cm-2) intermediate between the Chandra Deep surveys and previous missions. We present the first ChaMP catalog, which consists of 991 near on-axis, bright X-ray sources obtained from the initial sample of 62 observations. The data have been uniformly reduced and analyzed with techniques specifically developed for the ChaMP and then validated by visual examination. To assess source reliability and positional uncertainty, we perform a series of simulations and also use Chandra data to complement the simulation study. The false source detection rate is found to be as good as or better than expected for a given limiting threshold. On the other hand, the chance of missing a real source is rather complex, depending on the source counts, off-axis distance (or PSF), and background rate. The positional error (95% confidence level) is usually < 1" for a bright source, regardless of its off-axis distance while it can be as large as 4" for a weak source (~20 counts) at a large off-axis distance (Doff-axis > 8'). We have also developed new methods to find spatially extended or temporary variable sources and those sources are listed in the catalog.
[41]  oai:arXiv.org:astro-ph/0204159  [pdf] - 48704
Is RX J185635-375 a Quark Star?
Comments: 16 pages, 3 figures, accepted for publication in the Astrophysical Journal
Submitted: 2002-04-09
Deep Chandra LETG+HRC-S observations of the isolated neutron star candidate RX J1856.5-3754 have been analysed to search for metallic and resonance cyclotron spectral features and for pulsation behaviour. As found from earlier observations, the X-ray spectrum is well-represented by a ~ 60 eV (7e5 K) blackbody. No unequivocal evidence of spectral line or edge features has been found, arguing against metal-dominated models. The data contain no evidence for pulsation and we place a 99% confidence upper limit of 2.7% on the unaccelerated pulse fraction over a wide frequency range from 1e-4 to 100 Hz. We argue that the derived interstellar medium neutral hydrogen column density of 8e19 <= N_H <= 1.1e20 per sq. cm favours the larger distance from two recent HST parallax analyses, placing RX J1856.5-3754 at ~ 140 pc instead of ~ 60 pc, and in the outskirts of the R CrA dark molecular cloud. That such a comparatively rare region of high ISM density is precisely where an isolated neutron star re-heated by accretion of interstellar matter would be expected is either entirely coincidental, or current theoretical arguments excluding this scenario for RX J1856.5-3754 are premature. Taken at face value, the combined observational evidence -- a lack of spectral and temporal features and an implied radius at infinity of 3.8-8.2 km that is too small for current neutron star models -- points to a more compact object, such as allowed for quark matter equations of state.
[42]  oai:arXiv.org:astro-ph/0108429  [pdf] - 44405
A Wavelet-Based Algorithm for the Spatial Analysis of Poisson Data
Comments: Accepted for publication in Ap. J. Supp. (v. 138 Jan. 2002). 61 pages, 23 figures, expands to 3.8 Mb. Abstract abridged for astro-ph submission
Submitted: 2001-08-27
Wavelets are scaleable, oscillatory functions that deviate from zero only within a limited spatial regime and have average value zero. In addition to their use as source characterizers, wavelet functions are rapidly gaining currency within the source detection field. Wavelet-based source detection involves the correlation of scaled wavelet functions with binned, two-dimensional image data. If the chosen wavelet function exhibits the property of vanishing moments, significantly non-zero correlation coefficients will be observed only where there are high-order variations in the data; e.g., they will be observed in the vicinity of sources. In this paper, we describe the mission-independent, wavelet-based source detection algorithm WAVDETECT, part of the CIAO software package. Aspects of our algorithm include: (1) the computation of local, exposure-corrected normalized (i.e. flat-fielded) background maps; (2) the correction for exposure variations within the field-of-view; (3) its applicability within the low-counts regime, as it does not require a minimum number of background counts per pixel for the accurate computation of source detection thresholds; (4) the generation of a source list in a manner that does not depend upon a detailed knowledge of the point spread function (PSF) shape; and (5) error analysis. These features make our algorithm considerably more general than previous methods developed for the analysis of X-ray image data, especially in the low count regime. We demonstrate the algorithm's robustness by applying it to various images.
[43]  oai:arXiv.org:astro-ph/0108426  [pdf] - 44402
Sherpa: a Mission-Independent Data Analysis Application
Comments: To appear in Proc. SPIE Conf. 4477. 12 pages, 4 figures
Submitted: 2001-08-27
The ever-increasing quality and complexity of astronomical data underscores the need for new and powerful data analysis applications. This need has led to the development of Sherpa, a modeling and fitting program in the CIAO software package that enables the analysis of multi-dimensional, multi-wavelength data. In this paper, we present an overview of Sherpa's features, which include: support for a wide variety of input and output data formats, including the new Model Descriptor List (MDL) format; a model language which permits the construction of arbitrarily complex model expressions, including ones representing instrument characteristics; a wide variety of fit statistics and methods of optimization, model comparison, and parameter estimation; multi-dimensional visualization, provided by ChIPS; and new interactive analysis capabilities provided by embedding the S-Lang interpreted scripting language. We conclude by showing example Sherpa analysis sessions.
[44]  oai:arXiv.org:astro-ph/9906395  [pdf] - 107116
Resonant Cyclotron Radiation Transfer Model Fits to Spectra from Gamma-Ray Burst GRB870303
Comments: LaTeX2e (aastex.cls included); 45 pages text, 17 figures (on 21 pages); accepted by ApJ (to be published 1 Nov 1999, v. 525)
Submitted: 1999-06-24
We demonstrate that models of resonant cyclotron radiation transfer in a strong field (i.e. cyclotron scattering) can account for spectral lines seen at two epochs, denoted S1 and S2, in the Ginga data for GRB870303. Using a generalized version of the Monte Carlo code of Wang et al. (1988,1989b), we model line formation by injecting continuum photons into a static plane-parallel slab of electrons threaded by a strong neutron star magnetic field (~ 10^12 G) which may be oriented at an arbitrary angle relative to the slab normal. We examine two source geometries, which we denote "1-0" and "1-1," with the numbers representing the relative electron column densities above and below the continuum photon source plane. We compare azimuthally symmetric models, i.e. models in which the magnetic field is parallel to the slab normal, with models having more general magnetic field orientations. If the bursting source has a simple dipole field, these two model classes represent line formation at the magnetic pole, or elsewhere on the stellar surface. We find that the data of S1 and S2, considered individually, are consistent with both geometries, and with all magnetic field orientations, with the exception that the S1 data clearly favor line formation away from a polar cap in the 1-1 geometry, with the best-fit model placing the line-forming region at the magnetic equator. Within both geometries, fits to the combined (S1+S2) data marginally favor models which feature equatorial line formation, and in which the observer's orientation with respect to the slab changes between the two epochs. We interpret this change as being due to neutron star rotation, and we place limits on the rotation period.
[45]  oai:arXiv.org:astro-ph/9906394  [pdf] - 107115
Statistical Analysis of Spectral Line Candidates in Gamma-Ray Burst GRB870303
Comments: LaTeX2e (aastex.cls included); 41 pages text, 10 figures (on 11 pages); accepted by ApJ (to be published 1 Nov 1999, v. 525)
Submitted: 1999-06-24
The Ginga data for the gamma-ray burst GRB870303 exhibit low-energy dips in two temporally distinct spectra, denoted S1 and S2. S1, spanning 4 s, exhibits a single line candidate at ~ 20 keV, while S2, spanning 9 s, exhibits apparently harmonically spaced line candidates at ~ 20 and 40 keV. We evaluate the statistical evidence for these lines, using phenomenological continuum and line models which in their details are independent of the distance scale to gamma-ray bursts. We employ the methodologies based on both frequentist and Bayesian statistical inference that we develop in Freeman et al. (1999b). These methodologies utilize the information present in the data to select the simplest model that adequately describes the data from among a wide range of continuum and continuum-plus-line(s) models. This ensures that the chosen model does not include free parameters that the data deem unnecessary and that would act to reduce the frequentist significance and Bayesian odds of the continuum-plus-line(s) model. We calculate the significance of the continuum-plus-line(s) models using the Chi-Square Maximum Likelihood Ratio test. We describe a parametrization of the exponentiated Gaussian absorption line shape that makes the probability surface in parameter space better-behaved, allowing us to estimate analytically the Bayesian odds. The significance of the continuum-plus-line models requested by the S1 and S2 data are 3.6 x 10^-5 and 1.7 x 10^-4 respectively, with the odds favoring them being 114:1 and 7:1. We also apply our methodology to the combined (S1+S2) data. The significance of the continuum-plus-lines model requested by the combined data is 4.2 x 10^-8, with the odds favoring it being 40,300:1.
[46]  oai:arXiv.org:astro-ph/9601167  [pdf] - 94022
BATSE SD Observations of Hercules X-1
Comments: 5 pages, LaTeX (style files aipbook.sty, aps.sty, aps10.sty, prabib.sty, psfig.sty, and revtex.sty included with PAPER.tex), 2 embedded PostScript figures (mongo1.ps, mongo2.ps)
Submitted: 1996-01-29
The cyclotron line in the spectrum of the accretion-powered pulsar Her X-1 offers an opportunity to assess the ability of the BATSE Spectroscopy Detectors (SDs) to detect lines like those seen in some GRBs. Preliminary analysis of an initial SD pulsar mode observation of Her X-1 indicated a cyclotron line at an energy of approximately 44 keV, rather than at the expected energy of approximately 36 keV. Our analysis of four SD pulsar mode observations of Her X-1 made during high-states of its 35 day cycle confirms this result. We consider a number of phenomenological models for the continuum spectrum and the cyclotron line. This ensures that we use the simplest models that adequately describe the data, and that our results are robust. We find modest evidence (significance Q ~ 10^-4-10^-2) for a line at approximately 44 keV in the data of the first observation. Joint fits to the four observations provide stronger evidence (Q ~ 10^-7-10^-4) for the line. Such a shift in the cyclotron line energy of an accretion-powered pulsar is unprecedented.