Full-text search for arXiv

Poczos, Barnabas

Normalized to: Poczos, B.

8 article(s) in total. 41 co-authors, from 1 to 5 common article(s). Median position in authors list is 5,5.

[1] oai:arXiv.org:1902.10159 [pdf] - 1840221

The Role of Machine Learning in the Next Decade of Cosmology

Comments: Submitted to the Astro2020 call for science white papers

Submitted: 2019-02-26

In recent years, machine learning (ML) methods have remarkably improved how cosmologists can interpret data. The next decade will bring new opportunities for data-driven cosmological discovery, but will also present new challenges for adopting ML methodologies and understanding the results. ML could transform our field, but this transformation will require the astronomy community to both foster and promote interdisciplinary research endeavors.

[2] oai:arXiv.org:1902.05950 [pdf] - 2025399

A Robust and Efficient Deep Learning Method for Dynamical Mass Measurements of Galaxy Clusters

Ho, Matthew; Rau, Markus Michael; Ntampaka, Michelle; Farahi, Arya; Trac, Hy; Poczos, Barnabas

Comments: 21 pages, 10 figures, 3 tables, submitted to ApJ

Submitted: 2019-02-15

We demonstrate the ability of Convolutional Neural Networks (CNNs) to mitigate systematics in the virial scaling relation and produce dynamical mass estimates of galaxy clusters with remarkably low bias and scatter. We present two models, CNN$_\text{1D}$ and CNN$_\text{2D}$, which leverage this deep learning tool to infer cluster masses from distributions of member galaxy dynamics. Our first model, CNN$_\text{1D}$, infers cluster mass directly from the distribution of member galaxy line-of-sight velocities. Our second model, CNN$_\text{2D}$, extends the input space of CNN$_\text{1D}$ to learn on the joint distribution of galaxy line-of-sight velocities and projected radial distances. We train each model as a regression over cluster mass using a labeled catalog of realistic mock cluster observations generated from the MultiDark simulation and UniverseMachine catalog. We then evaluate the performance of each model on an independent set of mock observations selected from the same simulated catalog. The CNN models produce cluster mass predictions with log-normal residuals of scatter as low as $0.127$ dex, a factor of three improvement over the classical M-$\sigma$ power law estimator. Furthermore, the CNN model reduces prediction scatter relative to similar machine learning approaches by up to $20\%$ while executing in drastically shorter training and evaluation times (by a factor of 30) and producing considerably more robust mass predictions (improving prediction stability under variations in galaxy sampling rate by $53\%$).

[3] oai:arXiv.org:1711.02033 [pdf] - 1590820

Estimating Cosmological Parameters from the Dark Matter Distribution

Ravanbakhsh, Siamak; Oliva, Junier; Fromenteau, Sebastien; Price, Layne C.; Ho, Shirley; Schneider, Jeff; Poczos, Barnabas

Comments: ICML 2016

Submitted: 2017-11-06

A grand challenge of the 21st century cosmology is to accurately estimate the cosmological parameters of our Universe. A major approach to estimating the cosmological parameters is to use the large-scale matter distribution of the Universe. Galaxy surveys provide the means to map out cosmic large-scale structure in three dimensions. Information about galaxy locations is typically summarized in a "single" function of scale, such as the galaxy correlation function or power-spectrum. We show that it is possible to estimate these cosmological parameters directly from the distribution of matter. This paper presents the application of deep 3D convolutional networks to volumetric representation of dark-matter simulations as well as the results obtained using a recently proposed distribution regression framework, showing that machine learning techniques are comparable to, and can sometimes outperform, maximum-likelihood point estimates using "cosmological models". This opens the way to estimating the parameters of our Universe with higher accuracy.

[4] oai:arXiv.org:1703.02642 [pdf] - 1598017

CMU DeepLens: Deep Learning For Automatic Image-based Galaxy-Galaxy Strong Lens Finding

Lanusse, Francois; Ma, Quanbin; Li, Nan; Collett, Thomas E.; Li, Chun-Liang; Ravanbakhsh, Siamak; Mandelbaum, Rachel; Poczos, Barnabas

Comments: 12 pages, 9 figures, submitted to MNRAS

Submitted: 2017-03-07

Galaxy-scale strong gravitational lensing is not only a valuable probe of the dark matter distribution of massive galaxies, but can also provide valuable cosmological constraints, either by studying the population of strong lenses or by measuring time delays in lensed quasars. Due to the rarity of galaxy-scale strongly lensed systems, fast and reliable automated lens finding methods will be essential in the era of large surveys such as LSST, Euclid, and WFIRST. To tackle this challenge, we introduce CMU DeepLens, a new fully automated galaxy-galaxy lens finding method based on Deep Learning. This supervised machine learning approach does not require any tuning after the training step which only requires realistic image simulations of strongly lensed systems. We train and validate our model on a set of 20,000 LSST-like mock observations including a range of lensed systems of various sizes and signal-to-noise ratios (S/N). We find on our simulated data set that for a rejection rate of non-lenses of 99%, a completeness of 90% can be achieved for lenses with Einstein radii larger than 1.4" and S/N larger than 20 on individual $g$-band LSST exposures. Finally, we emphasize the importance of realistically complex simulations for training such machine learning methods by demonstrating that the performance of models of significantly different complexities cannot be distinguished on simpler simulations. We make our code publicly available at https://github.com/McWilliamsCenter/CMUDeepLens .

[5] oai:arXiv.org:1609.05796 [pdf] - 1525222

Enabling Dark Energy Science with Deep Generative Models of Galaxy Images

Ravanbakhsh, Siamak; Lanusse, Francois; Mandelbaum, Rachel; Schneider, Jeff; Poczos, Barnabas

Comments:

Submitted: 2016-09-19, last modified: 2016-11-30

Understanding the nature of dark energy, the mysterious force driving the accelerated expansion of the Universe, is a major challenge of modern cosmology. The next generation of cosmological surveys, specifically designed to address this issue, rely on accurate measurements of the apparent shapes of distant galaxies. However, shape measurement methods suffer from various unavoidable biases and therefore will rely on a precise calibration to meet the accuracy requirements of the science analysis. This calibration process remains an open challenge as it requires large sets of high quality galaxy images. To this end, we study the application of deep conditional generative models in generating realistic galaxy images. In particular we consider variations on conditional variational autoencoder and introduce a new adversarial objective for training of conditional generative networks. Our results suggest a reliable alternative to the acquisition of expensive high quality observations for generating the calibration data needed by the next generation of cosmological surveys.

[6] oai:arXiv.org:1509.05409 [pdf] - 1510196

Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Machine Learning

Ntampaka, M.; Trac, H.; Sutherland, D. J.; Fromenteau, S.; Poczos, B.; Schneider, J.

Comments: 18 pages, 12 figures, accepted for publication at ApJ

Submitted: 2015-09-17, last modified: 2016-10-25

We study dynamical mass measurements of galaxy clusters contaminated by interlopers and show that a modern machine learning (ML) algorithm can predict masses by better than a factor of two compared to a standard scaling relation approach. We create two mock catalogs from Multidark's publicly available $N$-body MDPL1 simulation, one with perfect galaxy cluster membership information and the other where a simple cylindrical cut around the cluster center allows interlopers to contaminate the clusters. In the standard approach, we use a power-law scaling relation to infer cluster mass from galaxy line-of-sight (LOS) velocity dispersion. Assuming perfect membership knowledge, this unrealistic case produces a wide fractional mass error distribution, with a width of $\Delta\epsilon\approx0.87$. Interlopers introduce additional scatter, significantly widening the error distribution further ($\Delta\epsilon\approx2.13$). We employ the support distribution machine (SDM) class of algorithms to learn from distributions of data to predict single values. Applied to distributions of galaxy observables such as LOS velocity and projected distance from the cluster center, SDM yields better than a factor-of-two improvement ($\Delta\epsilon\approx0.67$) for the contaminated case. Remarkably, SDM applied to contaminated clusters is better able to recover masses than even the scaling relation approach applied to uncontaminated clusters. We show that the SDM method more accurately reproduces the cluster mass function, making it a valuable tool for employing cluster observations to evaluate cosmological models.

[7] oai:arXiv.org:1410.0686 [pdf] - 1222366

A Machine Learning Approach for Dynamical Mass Measurements of Galaxy Clusters

Ntampaka, Michelle; Trac, Hy; Sutherland, Dougal J.; Battaglia, Nicholas; Poczos, Barnabas; Schneider, Jeff

Comments: Published in The Astrophysical Journal, 13 pages, 8 figures. Support Distribution Machines is publicly available at https://github.com/dougalsutherland/py-sdm

Submitted: 2014-10-02, last modified: 2015-04-27

We present a modern machine learning approach for cluster dynamical mass measurements that is a factor of two improvement over using a conventional scaling relation. Different methods are tested against a mock cluster catalog constructed using halos with mass >= 10^14 Msolar/h from Multidark's publicly-available N-body MDPL halo catalog. In the conventional method, we use a standard M(sigma_v) power law scaling relation to infer cluster mass, M, from line-of-sight (LOS) galaxy velocity dispersion, sigma_v. The resulting fractional mass error distribution is broad, with width=0.87 (68% scatter), and has extended high-error tails. The standard scaling relation can be simply enhanced by including higher-order moments of the LOS velocity distribution. Applying the kurtosis as a correction term to log(sigma_v) reduces the width of the error distribution to 0.74 (16% improvement). Machine learning can be used to take full advantage of all the information in the velocity distribution. We employ the Support Distribution Machines (SDMs) algorithm that learns from distributions of data to predict single values. SDMs trained and tested on the distribution of LOS velocities yield width=0.46 (47% improvement). Furthermore, the problematic tails of the mass error distribution are effectively eliminated. Decreasing cluster mass errors will improve measurements of the growth of structure and lead to tighter constraints on cosmological parameters.

[8] oai:arXiv.org:1303.1055 [pdf] - 1165014

A First Look at creating mock catalogs with machine learning techniques

Xu, Xiaoying; Ho, Shirley; Trac, Hy; Schneider, Jeff; Poczos, Barnabas; Ntampaka, Michelle

Comments: 11 pages, 6 figures

Submitted: 2013-03-05

We investigate machine learning (ML) techniques for predicting the number of galaxies (N_gal) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N_gal. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test 2 algorithms: support vector machines (SVM) and k-nearest-neighbour (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N_gal by training our algorithms on the following 6 halo properties: number of particles, M_200, \sigma_v, v_max, half-mass radius and spin. For Millennium, our predicted N_gal values have a mean-squared-error (MSE) of ~0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to ~5-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N_gal. Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g. blue, red, high M_star, low M_star). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, machine learning offers an interesting alternative for creating mock catalogs.