Normalized to: Suberlak, K.
[1]
oai:arXiv.org:1905.09034 [pdf] - 1912866
AXS: A framework for fast astronomical data processing based on Apache
Spark
Submitted: 2019-05-22, last modified: 2019-05-24
We introduce AXS (Astronomy eXtensions for Spark), a scalable open-source
astronomical data analysis framework built on Apache Spark, a widely used
industry-standard engine for big data processing. Building on capabilities
present in Spark, AXS aims to enable querying and analyzing almost arbitrarily
large astronomical catalogs using familiar Python/AstroPy concepts, DataFrame
APIs, and SQL statements. We achieve this by i) adding support to Spark for
efficient on-line positional cross-matching and ii) supplying a Python library
supporting commonly-used operations for astronomical data analysis. To support
scalable cross-matching, we developed a variant of the ZONES algorithm (Gray et
al. 2004) capable of operating in distributed, shared-nothing architecture. We
couple this to a data partitioning scheme that enables fast catalog
cross-matching and handles the data skew often present in deep all-sky data
sets. The cross-match and other often-used functionalities are exposed to the
end users through an easy-to-use Python API. We demonstrate AXS' technical and
scientific performance on SDSS, ZTF, Gaia DR2, and AllWise catalogs. Using AXS
we were able to perform on-the-fly cross-match of Gaia DR2 (1.8 billion rows)
and AllWise (900 million rows) data sets in ~ 30 seconds. We discuss how
cloud-ready distributed systems like AXS provide a natural way to enable
comprehensive end-user analyses of large datasets such as LSST.
[2]
oai:arXiv.org:1712.01848 [pdf] - 1599795
Solving the puzzle of discrepant quasar variability on monthly
time-scales implied by SDSS and CRTS data sets
Submitted: 2017-12-05
We present an improved photometric error analysis for the 7,100 CRTS
(Catalina Real-Time Transient Survey) optical light curves for quasars from the
SDSS (Sloan Digital Sky Survey) Stripe 82 catalogue. The SDSS imaging survey
has provided a time-resolved photometric data set which greatly improved our
understanding of the quasar optical continuum variability: Data for monthly and
longer time-scales are consistent with a damped random walk (DRW). Recently,
newer data obtained by CRTS provided puzzling evidence for enhanced
variability, compared to SDSS results, on monthly time-scales. Quantitatively,
SDSS results predict about 0.06 mag root-mean-square (rms) variability for
monthly time-scales, while CRTS data show about a factor of 2 larger rms, for
spectroscopically confirmed SDSS quasars. Our analysis has successfully
resolved this discrepancy as due to slightly underestimated photometric
uncertainties from the CRTS image processing pipelines. As a result, the
correction for observational noise is too small and the implied quasar
variability is too large. The CRTS photometric error correction factors,
derived from detailed analysis of non-variable SDSS standard stars that were
re-observed by CRTS, are about 20-30%, and result in reconciling quasar
variability behaviour implied by the CRTS data with earlier SDSS results. An
additional analysis based on independent light curve data for the same objects
obtained by the Palomar Transient Factory provides further support for this
conclusion. In summary, the quasar variability constraints on weekly and
monthly time-scales from SDSS, CRTS and PTF surveys are mutually compatible, as
well as consistent with DRW model.