Normalized to: Ntampaka, M.
[1]
oai:arXiv.org:2007.05144 [pdf] - 2131963
A Machine Learning Approach to the Census of Galaxy Clusters
Su, Y.;
Zhang, Y.;
Liang, G.;
ZuHone, J. A.;
Barnes, D. J.;
Jacobs, N. B.;
Ntampaka, M.;
Forman, W. R.;
Nulsen, P. E. J.;
Kraft, R. P.;
Jones, C.
Submitted: 2020-07-09
The origin of the diverse population of galaxy clusters remains an
unexplained aspect of large-scale structure formation and cluster evolution. We
present a novel method of using X-ray images to identify cool core (CC), weak
cool core (WCC), and non cool core (NCC) clusters of galaxies, that are defined
by their central cooling times. We employ a convolutional neural network,
ResNet-18, which is commonly used for image analysis, to classify clusters. We
produce mock Chandra X-ray observations for a sample of 318 massive clusters
drawn from the IllustrisTNG simulations. The network is trained and tested with
low resolution mock Chandra images covering a central 1 Mpc square for the
clusters in our sample. Without any spectral information, the deep learning
algorithm is able to identify CC, WCC, and NCC clusters, achieving balanced
accuracies (BAcc) of 92%, 81%, and 83%, respectively. The performance is
superior to classification by conventional methods using central gas densities,
with an average BAcc = 81%, or surface brightness concentrations, giving BAcc =
73%. We use Class Activation Mapping to localize discriminative regions for the
classification decision. From this analysis, we observe that the network has
utilized regions from cluster centers out to r~300 kpc and r~500 kpc to
identify CC and NCC clusters, respectively. It may have recognized features in
the intracluster medium that are associated with AGN feedback and disruptive
major mergers.
[2]
oai:arXiv.org:1911.02479 [pdf] - 1994455
Algorithms and Statistical Models for Scientific Discovery in the
Petabyte Era
Nord, Brian;
Connolly, Andrew J.;
Kinney, Jamie;
Kubica, Jeremy;
Narayan, Gautaum;
Peek, Joshua E. G.;
Schafer, Chad;
Tollerud, Erik J.;
Avestruz, Camille;
Babu, G. Jogesh;
Birrer, Simon;
Burke, Douglas;
Caldeira, João;
Caldwell, Douglas A.;
Carlberg, Joleen K.;
Chen, Yen-Chi;
Dong, Chuanfei;
Feigelson, Eric D.;
Golkhou, V. Zach;
Kashyap, Vinay;
Li, T. S.;
Loredo, Thomas;
Lucie-Smith, Luisa;
Mandel, Kaisey S.;
Martínez-Galarza, J. R.;
Miller, Adam A.;
Natarajan, Priyamvada;
Ntampaka, Michelle;
Ptak, Andy;
Rapetti, David;
Shamir, Lior;
Siemiginowska, Aneta;
Sipőcz, Brigitta M.;
Smith, Arfon M.;
Tran, Nhan;
Vilalta, Ricardo;
Walkowicz, Lucianne M.;
ZuHone, John
Submitted: 2019-11-04
The field of astronomy has arrived at a turning point in terms of size and
complexity of both datasets and scientific collaboration. Commensurately,
algorithms and statistical models have begun to adapt --- e.g., via the onset
of artificial intelligence --- which itself presents new challenges and
opportunities for growth. This white paper aims to offer guidance and ideas for
how we can evolve our technical and collaborative frameworks to promote
efficient algorithmic development and take advantage of opportunities for
scientific discovery in the petabyte era. We discuss challenges for discovery
in large and complex data sets; challenges and requirements for the next stage
of development of statistical methodologies and algorithmic tool sets; how we
might change our paradigms of collaboration and education; and the ethical
implications of scientists' contributions to widely applicable algorithms and
computational modeling. We start with six distinct recommendations that are
supported by the commentary following them. This white paper is related to a
larger corpus of effort that has taken place within and around the Petabytes to
Science Workshops (https://petabytestoscience.github.io/).
[3]
oai:arXiv.org:1908.02765 [pdf] - 1978011
Using X-Ray Morphological Parameters to Strengthen Galaxy Cluster Mass
Estimates via Machine Learning
Submitted: 2019-08-07, last modified: 2019-09-30
We present a machine learning approach for estimating galaxy cluster masses,
trained using both Chandra and eROSITA mock X-ray observations of 2,041
clusters from the Magneticum simulations. We train a random forest regressor,
an ensemble learning method based on decision tree regression, to predict
cluster masses using an input feature set. The feature set uses core-excised
X-ray luminosity and a variety of morphological parameters, including surface
brightness concentration, smoothness, asymmetry, power ratios, and ellipticity.
The regressor is cross-validated and calibrated on a training sample of 1,615
clusters (80% of sample), and then results are reported as applied to a test
sample of 426 clusters (20% of sample). This procedure is performed for two
different mock observation series in an effort to bracket the potential
enhancement in mass predictions that can be made possible by including
dynamical state information. The first series is computed from idealized
Chandra-like mock cluster observations, with high spatial resolution, long
exposure time (1 Ms), and the absence of background. The second series is
computed from realistic-condition eROSITA mocks with lower spatial resolution,
short exposures (2 ks), instrument effects, and background photons modeled. We
report a 20% reduction in the mass estimation scatter when either series is
used in our random forest model compared to a standard regression model that
only employs core-excised luminosity. The morphological parameters that hold
the highest feature importance are smoothness, asymmetry, and surface
brightness concentration. Hence, these parameters, which encode the dynamical
state of the cluster, can be used to make more accurate predictions of cluster
masses in upcoming surveys, offering a crucial step forward for cosmological
analyses.
[4]
oai:arXiv.org:1909.10527 [pdf] - 2046267
A Hybrid Deep Learning Approach to Cosmological Constraints From Galaxy
Redshift Surveys
Submitted: 2019-09-23
We present a deep machine learning (ML)-based technique for accurately
determining $\sigma_8$ and $\Omega_m$ from mock 3D galaxy surveys. The mock
surveys are built from the AbacusCosmos suite of $N$-body simulations, which
comprises 40 cosmological volume simulations spanning a range of cosmological
models, and we account for uncertainties in galaxy formation scenarios through
the use of generalized halo occupation distributions (HODs). We explore a trio
of ML models: a 3D convolutional neural network (CNN), a power-spectrum-based
fully connected network, and a hybrid approach that merges the two to combine
physically motivated summary statistics with flexible CNNs. We describe best
practices for training a deep model on a suite of matched-phase simulations and
we test our model on a completely independent sample that uses previously
unseen initial conditions, cosmological parameters, and HOD parameters. Despite
the fact that the mock observations are quite small
($\sim0.07h^{-3}\,\mathrm{Gpc}^3$) and the training data span a large parameter
space (6 cosmological and 6 HOD parameters), the CNN and hybrid CNN can
constrain $\sigma_8$ and $\Omega_m$ to $\sim3\%$ and $\sim4\%$, respectively.
[5]
oai:arXiv.org:1810.08211 [pdf] - 1929690
Machine Learning Applied to the Reionization History of the Universe in
the 21 cm Signal
Submitted: 2018-10-18, last modified: 2019-08-05
The Epoch of Reionization (EoR) features a rich interplay between the first
luminous sources and the low-density gas of the intergalactic medium (IGM),
where photons from these sources ionize the IGM. There are currently few
observational constraints on key observables related to the EoR, such as the
midpoint and duration of reionization. Although upcoming observations of the 21
cm power spectrum with next-generation radio interferometers such as the
Hydrogen Epoch of Reionization Array (HERA) and the Square Kilometre Array
(SKA) are expected to provide information about the midpoint of reionization
readily, extracting the duration from the power spectrum alone is a more
difficult proposition. As an alternative method for extracting information
about reionization, we present an application of convolutional neural networks
(CNNs) to images of reionization. These images are two-dimensional in the plane
of the sky, and extracted at a series of redshift values to generate "image
cubes" that are qualitatively similar to those of the HERA and the SKA will
generate in the near future. Additionally, we include the impact that the
bright foreground signal from the the Milky Way imparts on such image cubes
from interferometers, but do not include the noise induced from observations.
We show that we are able to recover the duration of reionization $\Delta$z to
within 5% using CNNs, assuming that the midpoint of reionization is already
relatively well constrained. These results have exciting impacts for estimating
$\tau$, the optical depth to the cosmic microwave background, which can help
constrain other cosmological parameters.
[6]
oai:arXiv.org:1907.01676 [pdf] - 1915228
Astro2020 APC White Paper: The Early Career Perspective on the Coming
Decade, Astrophysics Career Paths, and the Decadal Survey Process
Moravec, Emily;
Czekala, Ian;
Follette, Kate;
Ahmed, Zeeshan;
Alpaslan, Mehmet;
Amon, Alexandra;
Armentrout, Will;
Arney, Giada;
Barron, Darcy;
Bellm, Eric;
Bender, Amy;
Bridge, Joanna;
Colon, Knicole;
Datta, Rahul;
DeRoo, Casey;
Feng, Wanda;
Florian, Michael;
Gabriel, Travis;
Hall, Kirsten;
Hamden, Erika;
Hathi, Nimish;
Hawkins, Keith;
Hoadley, Keri;
Jensen-Clem, Rebecca;
Kao, Melodie;
Kara, Erin;
Karkare, Kirit;
Kiessling, Alina;
Kimball, Amy;
Kirkpatrick, Allison;
La Plante, Paul;
Leisenring, Jarron;
Li, Miao;
Lomax, Jamie;
Lund, Michael B.;
McCleary, Jacqueline;
Mills, Elisabeth;
Montiel, Edward;
Nelson, Nicholas;
Nevin, Rebecca;
Norris, Ryan;
Ntampaka, Michelle;
O'Donnell, Christine;
Peretz, Eliad;
Malagon, Andres Plazas;
Prescod-Weinstein, Chanda;
Pullen, Anthony;
Rice, Jared;
Roettenbacher, Rachael;
Sanderson, Robyn;
Simon, Jospeh;
Smith, Krista Lynne;
Stevenson, Kevin;
Veach, Todd;
Wetzel, Andrew;
Youngblood, Allison
Submitted: 2019-07-02, last modified: 2019-07-12
In response to the need for the Astro2020 Decadal Survey to explicitly engage
early career astronomers, the National Academies of Sciences, Engineering, and
Medicine hosted the Early Career Astronomer and Astrophysicist Focus Session
(ECFS) on October 8-9, 2018 under the auspices of Committee of Astronomy and
Astrophysics. The meeting was attended by fifty six pre-tenure faculty,
research scientists, postdoctoral scholars, and senior graduate students, as
well as eight former decadal survey committee members, who acted as
facilitators. The event was designed to educate early career astronomers about
the decadal survey process, to solicit their feedback on the role that early
career astronomers should play in Astro2020, and to provide a forum for the
discussion of a wide range of topics regarding the astrophysics career path.
This white paper presents highlights and themes that emerged during two days
of discussion. In Section 1, we discuss concerns that emerged regarding the
coming decade and the astrophysics career path, as well as specific
recommendations from participants regarding how to address them. We have
organized these concerns and suggestions into five broad themes. These include
(sequentially): (1) adequately training astronomers in the statistical and
computational techniques necessary in an era of "big data", (2) responses to
the growth of collaborations and telescopes, (3) concerns about the adequacy of
graduate and postdoctoral training, (4) the need for improvements in equity and
inclusion in astronomy, and (5) smoothing and facilitating transitions between
early career stages. Section 2 is focused on ideas regarding the decadal survey
itself, including: incorporating early career voices, ensuring diverse input
from a variety of stakeholders, and successfully and broadly disseminating the
results of the survey.
[7]
oai:arXiv.org:1810.07703 [pdf] - 1903130
A Deep Learning Approach to Galaxy Cluster X-ray Masses
Ntampaka, M.;
ZuHone, J.;
Eisenstein, D.;
Nagai, D.;
Vikhlinin, A.;
Hernquist, L.;
Marinacci, F.;
Nelson, D.;
Pakmor, R.;
Pillepich, A.;
Torrey, P.;
Vogelsberger, M.
Submitted: 2018-10-17, last modified: 2019-06-18
We present a machine-learning approach for estimating galaxy cluster masses
from Chandra mock images. We utilize a Convolutional Neural Network (CNN), a
deep machine learning tool commonly used in image recognition tasks. The CNN is
trained and tested on our sample of 7,896 Chandra X-ray mock observations,
which are based on 329 massive clusters from the IllustrisTNG simulation. Our
CNN learns from a low resolution spatial distribution of photon counts and does
not use spectral information. Despite our simplifying assumption to neglect
spectral information, the resulting mass values estimated by the CNN exhibit
small bias in comparison to the true masses of the simulated clusters (-0.02
dex) and reproduce the cluster masses with low intrinsic scatter, 8% in our
best fold and 12% averaging over all. In contrast, a more standard core-excised
luminosity method achieves 15-18% scatter. We interpret the results with an
approach inspired by Google DeepDream and find that the CNN ignores the central
regions of clusters, which are known to have high scatter with mass.
[8]
oai:arXiv.org:1906.07729 [pdf] - 1938484
Cluster Cosmology with the Velocity Distribution Function of the HeCS-SZ
Sample
Submitted: 2019-06-18
We apply the Velocity Distribution Function (VDF) to a sample of
Sunyaev-Zel'dovich (SZ)-selected clusters, and we report preliminary
cosmological constraints in the $\sigma_8$-$\Omega_m$ cosmological parameter
space. The VDF is a forward-modeled test statistic that can be used to
constrain cosmological models directly from galaxy cluster dynamical
observations. The method was introduced in Ntampaka et al. (2017) and employs
line-of-sight velocity measurements to directly constrain cosmological
parameters; it is less sensitive to measurement error than a standard halo mass
function approach. The method is applied to the Hectospec Survey of
Sunyaev-Zeldovich-Selected Clusters (HeCS-SZ) sample, which is a spectroscopic
follow up of a Planck-selected sample of 83 galaxy clusters. Credible regions
are calculated by comparing the VDF of the observed cluster sample to that of
mock observations, yielding $\mathcal{S}_8 \equiv \sigma_8
\left(\Omega_m/0.3\right)^{0.25} = 0.751\pm0.037$. These constraints are in
tension with the Planck Cosmic Microwave Background (CMB) TT fiducial value,
which lies outside of our 95% credible region, but are in agreement with some
recent analyses of large scale structure that observe fewer massive clusters
than are predicted by the Planck fiducial cosmological parameters.
[9]
oai:arXiv.org:1903.06634 [pdf] - 1850834
Increasing the Discovery Space in Astrophysics - A Collation of Six
Submitted White Papers
Fabbiano, G.;
Elvis, M.;
Accomazzi, A.;
Berriman, G. B.;
Brickhouse, N.;
Bose, S.;
Carrera, D.;
Chilingarian, I.;
Civano, F.;
Czerny, B.;
D'Abrusco, R.;
Diemer, B.;
Drake, J.;
Meibody, R. Emami;
Farah, J. R.;
Fazio, G. G.;
Feigelson, E.;
Fornasini, F.;
Gallagher, Jay;
Grindlay, J.;
Hernquist, L.;
James, D. J.;
Karovska, M.;
Kashyap, V.;
Kim, D. -W.;
Lacy, G. M.;
Lazio, J.;
Lusso, E.;
Maksym, W. P.;
Galarza, R. Martinez;
Mazzarella, J.;
Ntampaka, M.;
Risaliti, G.;
Sanders, D.;
Scoville, N.;
Shapiro, I.;
Siemiginowska, A.;
Smth, A.;
Smith, S.;
Szentgyorgyi, A.;
Tacchella, S.;
Thakar, A.;
Tolls, V.;
Vrtilek, S.;
Wilkes, B.;
Wilner, D.;
Willner, S. P.;
Wolk, S. J.;
Zhao, J. -H.
Submitted: 2019-03-15, last modified: 2019-03-18
We write in response to the call from the 2020 Decadal Survey to submit white
papers illustrating the most pressing scientific questions in astrophysics for
the coming decade. We propose exploration as the central question for the
Decadal Committee's discussions.The history of astronomy shows that paradigm
changing discoveries are not driven by well formulated scientific questions,
based on the knowledge of the time. They were instead the result of the
increase in discovery space fostered by new telescopes and instruments. An
additional tool for increasing the discovery space is provided by the analysis
and mining of the increasingly larger amount of archival data available to
astronomers. Revolutionary observing facilities, and the state of the art
astronomy archives needed to support these facilities, will open up the
universe to new discovery. Here we focus on exploration for compact objects and
multi messenger science. This white paper includes science examples of the
power of the discovery approach, encompassing all the areas of astrophysics
covered by the 2020 Decadal Survey.
[10]
oai:arXiv.org:1903.06796 [pdf] - 1850859
Astro2020 Science White Paper: The Next Decade of Astroinformatics and
Astrostatistics
Siemiginowska, A.;
Eadie, G.;
Czekala, I.;
Feigelson, E.;
Ford, E. B.;
Kashyap, V.;
Kuhn, M.;
Loredo, T.;
Ntampaka, M.;
Stevens, A.;
Avelino, A.;
Borne, K.;
Budavari, T.;
Burkhart, B.;
Cisewski-Kehe, J.;
Civano, F.;
Chilingarian, I.;
van Dyk, D. A.;
Fabbiano, G.;
Finkbeiner, D. P.;
Foreman-Mackey, D.;
Freeman, P.;
Fruscione, A.;
Goodman, A. A.;
Graham, M.;
Guenther, H. M.;
Hakkila, J.;
Hernquist, L.;
Huppenkothen, D.;
James, D. J.;
Law, C.;
Lazio, J.;
Lee, T.;
López-Morales, M.;
Mahabal, A. A.;
Mandel, K.;
Meng, X. L.;
Moustakas, J.;
Muna, D.;
Peek, J. E. G.;
Richards, G.;
Portillo, S. K. N.;
Scargle, J.;
de Souza, R. S.;
Speagle, J. S.;
Stassun, K. G.;
Stenning, D. C.;
Taylor, S. R.;
Tremblay, G. R.;
Trimble, V.;
Yanamandra-Fisher, P. A.;
Young, C. A.
Submitted: 2019-03-15
Over the past century, major advances in astronomy and astrophysics have been
largely driven by improvements in instrumentation and data collection. With the
amassing of high quality data from new telescopes, and especially with the
advent of deep and large astronomical surveys, it is becoming clear that future
advances will also rely heavily on how those data are analyzed and interpreted.
New methodologies derived from advances in statistics, computer science, and
machine learning are beginning to be employed in sophisticated investigations
that are not only bringing forth new discoveries, but are placing them on a
solid footing. Progress in wide-field sky surveys, interferometric imaging,
precision cosmology, exoplanet detection and characterization, and many
subfields of stellar, Galactic and extragalactic astronomy, has resulted in
complex data analysis challenges that must be solved to perform scientific
inference. Research in astrostatistics and astroinformatics will be necessary
to develop the state-of-the-art methodology needed in astronomy. Overcoming
these challenges requires dedicated, interdisciplinary research. We recommend:
(1) increasing funding for interdisciplinary projects in astrostatistics and
astroinformatics; (2) dedicating space and time at conferences for
interdisciplinary research and promotion; (3) developing sustainable funding
for long-term astrostatisics appointments; and (4) funding infrastructure
development for data archives and archive support, state-of-the-art algorithms,
and efficient computing.
[11]
oai:arXiv.org:1902.10159 [pdf] - 1840221
The Role of Machine Learning in the Next Decade of Cosmology
Ntampaka, Michelle;
Avestruz, Camille;
Boada, Steven;
Caldeira, Joao;
Cisewski-Kehe, Jessi;
Di Stefano, Rosanne;
Dvorkin, Cora;
Evrard, August E.;
Farahi, Arya;
Finkbeiner, Doug;
Genel, Shy;
Goodman, Alyssa;
Goulding, Andy;
Ho, Shirley;
Kosowsky, Arthur;
La Plante, Paul;
Lanusse, Francois;
Lochner, Michelle;
Mandelbaum, Rachel;
Nagai, Daisuke;
Newman, Jeffrey A.;
Nord, Brian;
Peek, J. E. G.;
Peel, Austin;
Poczos, Barnabas;
Rau, Markus Michael;
Siemiginowska, Aneta;
Sutherland, Dougal J.;
Trac, Hy;
Wandelt, Benjamin
Submitted: 2019-02-26
In recent years, machine learning (ML) methods have remarkably improved how
cosmologists can interpret data. The next decade will bring new opportunities
for data-driven cosmological discovery, but will also present new challenges
for adopting ML methodologies and understanding the results. ML could transform
our field, but this transformation will require the astronomy community to both
foster and promote interdisciplinary research endeavors.
[12]
oai:arXiv.org:1902.05950 [pdf] - 2025399
A Robust and Efficient Deep Learning Method for Dynamical Mass
Measurements of Galaxy Clusters
Submitted: 2019-02-15
We demonstrate the ability of Convolutional Neural Networks (CNNs) to
mitigate systematics in the virial scaling relation and produce dynamical mass
estimates of galaxy clusters with remarkably low bias and scatter. We present
two models, CNN$_\text{1D}$ and CNN$_\text{2D}$, which leverage this deep
learning tool to infer cluster masses from distributions of member galaxy
dynamics. Our first model, CNN$_\text{1D}$, infers cluster mass directly from
the distribution of member galaxy line-of-sight velocities. Our second model,
CNN$_\text{2D}$, extends the input space of CNN$_\text{1D}$ to learn on the
joint distribution of galaxy line-of-sight velocities and projected radial
distances. We train each model as a regression over cluster mass using a
labeled catalog of realistic mock cluster observations generated from the
MultiDark simulation and UniverseMachine catalog. We then evaluate the
performance of each model on an independent set of mock observations selected
from the same simulated catalog. The CNN models produce cluster mass
predictions with log-normal residuals of scatter as low as $0.127$ dex, a
factor of three improvement over the classical M-$\sigma$ power law estimator.
Furthermore, the CNN model reduces prediction scatter relative to similar
machine learning approaches by up to $20\%$ while executing in drastically
shorter training and evaluation times (by a factor of 30) and producing
considerably more robust mass predictions (improving prediction stability under
variations in galaxy sampling rate by $53\%$).
[13]
oai:arXiv.org:1509.05409 [pdf] - 1510196
Dynamical Mass Measurements of Contaminated Galaxy Clusters Using
Machine Learning
Submitted: 2015-09-17, last modified: 2016-10-25
We study dynamical mass measurements of galaxy clusters contaminated by
interlopers and show that a modern machine learning (ML) algorithm can predict
masses by better than a factor of two compared to a standard scaling relation
approach. We create two mock catalogs from Multidark's publicly available
$N$-body MDPL1 simulation, one with perfect galaxy cluster membership
information and the other where a simple cylindrical cut around the cluster
center allows interlopers to contaminate the clusters. In the standard
approach, we use a power-law scaling relation to infer cluster mass from galaxy
line-of-sight (LOS) velocity dispersion. Assuming perfect membership knowledge,
this unrealistic case produces a wide fractional mass error distribution, with
a width of $\Delta\epsilon\approx0.87$. Interlopers introduce additional
scatter, significantly widening the error distribution further
($\Delta\epsilon\approx2.13$). We employ the support distribution machine (SDM)
class of algorithms to learn from distributions of data to predict single
values. Applied to distributions of galaxy observables such as LOS velocity and
projected distance from the cluster center, SDM yields better than a
factor-of-two improvement ($\Delta\epsilon\approx0.67$) for the contaminated
case. Remarkably, SDM applied to contaminated clusters is better able to
recover masses than even the scaling relation approach applied to
uncontaminated clusters. We show that the SDM method more accurately reproduces
the cluster mass function, making it a valuable tool for employing cluster
observations to evaluate cosmological models.
[14]
oai:arXiv.org:1602.01837 [pdf] - 1530462
The Velocity Distribution Function of Galaxy Clusters as a Cosmological
Probe
Submitted: 2016-02-04, last modified: 2016-10-20
We present a new approach for quantifying the abundance of galaxy clusters
and constraining cosmological parameters using dynamical measurements. In the
standard method, galaxy line-of-sight (LOS) velocities, $v$, or velocity
dispersions are used to infer cluster masses, $M$, in order to quantify the
halo mass function (HMF), $dn(M)/d\log(M)$, which is strongly affected by mass
measurement errors. In our new method, the probability distribution of
velocities for each cluster in the sample are summed to create a new statistic
called the velocity distribution function (VDF), $dn(v)/dv$. The VDF can be
measured more directly and precisely than the HMF and it can also be robustly
predicted with cosmological simulations which capture the dynamics of subhalos
or galaxies. We apply these two methods to mock cluster catalogs and forecast
the bias and constraints on the matter density parameter $\Omega_m$ and the
amplitude of matter fluctuations $\sigma_8$ in flat $\Lambda$CDM cosmologies.
For an example observation of 200 massive clusters, the VDF with (without)
velocity errors constrains the parameter combination $\sigma_8\Omega_m^{0.29\
(0.29)} = 0.587 \pm 0.011\ (0.583 \pm 0.011)$ and shows only minor bias.
However, the HMF with dynamical mass errors is biased to low $\Omega_m$ and
high $\sigma_8$ and the fiducial model lies well outside of the forecast
constraints, prior to accounting for Eddington bias. When the VDF is combined
with constraints from the cosmic microwave background (CMB), the degeneracy
between cosmological parameters can be significantly reduced. Upcoming
spectroscopic surveys that probe larger volumes and fainter magnitudes will
provide a larger number of clusters for applying the VDF as a cosmological
probe.
[15]
oai:arXiv.org:1410.0686 [pdf] - 1222366
A Machine Learning Approach for Dynamical Mass Measurements of Galaxy
Clusters
Submitted: 2014-10-02, last modified: 2015-04-27
We present a modern machine learning approach for cluster dynamical mass
measurements that is a factor of two improvement over using a conventional
scaling relation. Different methods are tested against a mock cluster catalog
constructed using halos with mass >= 10^14 Msolar/h from Multidark's
publicly-available N-body MDPL halo catalog. In the conventional method, we use
a standard M(sigma_v) power law scaling relation to infer cluster mass, M, from
line-of-sight (LOS) galaxy velocity dispersion, sigma_v. The resulting
fractional mass error distribution is broad, with width=0.87 (68% scatter), and
has extended high-error tails. The standard scaling relation can be simply
enhanced by including higher-order moments of the LOS velocity distribution.
Applying the kurtosis as a correction term to log(sigma_v) reduces the width of
the error distribution to 0.74 (16% improvement). Machine learning can be used
to take full advantage of all the information in the velocity distribution. We
employ the Support Distribution Machines (SDMs) algorithm that learns from
distributions of data to predict single values. SDMs trained and tested on the
distribution of LOS velocities yield width=0.46 (47% improvement). Furthermore,
the problematic tails of the mass error distribution are effectively
eliminated. Decreasing cluster mass errors will improve measurements of the
growth of structure and lead to tighter constraints on cosmological parameters.
[16]
oai:arXiv.org:1303.1055 [pdf] - 1165014
A First Look at creating mock catalogs with machine learning techniques
Submitted: 2013-03-05
We investigate machine learning (ML) techniques for predicting the number of
galaxies (N_gal) that occupy a halo, given the halo's properties. These types
of mappings are crucial for constructing the mock galaxy catalogs necessary for
analyses of large-scale structure. The ML techniques proposed here distinguish
themselves from traditional halo occupation distribution (HOD) modeling as they
do not assume a prescribed relationship between halo properties and N_gal. In
addition, our ML approaches are only dependent on parent halo properties (like
HOD methods), which are advantageous over subhalo-based approaches as
identifying subhalos correctly is difficult. We test 2 algorithms: support
vector machines (SVM) and k-nearest-neighbour (kNN) regression. We take
galaxies and halos from the Millennium simulation and predict N_gal by training
our algorithms on the following 6 halo properties: number of particles, M_200,
\sigma_v, v_max, half-mass radius and spin. For Millennium, our predicted N_gal
values have a mean-squared-error (MSE) of ~0.16 for both SVM and kNN. Our
predictions match the overall distribution of halos reasonably well and the
galaxy correlation function at large scales to ~5-10%. In addition, we
demonstrate a feature selection algorithm to isolate the halo parameters that
are most predictive, a useful technique for understanding the mapping between
halo properties and N_gal. Lastly, we investigate these ML-based approaches in
making mock catalogs for different galaxy subpopulations (e.g. blue, red, high
M_star, low M_star). Given its non-parametric nature as well as its powerful
predictive and feature selection capabilities, machine learning offers an
interesting alternative for creating mock catalogs.