Normalized to: Stolorz, P.
[1]
oai:arXiv.org:astro-ph/0208246 [pdf] - 51062
Challenges for Cluster Analysis in a Virtual Observatory
Submitted: 2002-08-12
There has been an unprecedented and continuing growth in the volume, quality,
and complexity of astronomical data sets over the past few years, mainly
through large digital sky surveys. Virtual Observatory (VO) concept represents
a scientific and technological framework needed to cope with this data flood.
We review some of the applied statistics and computing challenges posed by the
analysis of large and complex data sets expected in the VO-based research. The
challenges are driven both by the size and the complexity of the data sets
(billions of data vectors in parameter spaces of tens or hundreds of
dimensions), by the heterogeneity of the data and measurement errors, the
selection effects and censored data, and by the intrinsic clustering properties
(functional form, topology) of the data distribution in the parameter space of
observed attributes. Examples of scientific questions one may wish to address
include: objective determination of the numbers of object classes present in
the data, and the membership probabilities for each source; searches for
unusual, rare, or even new types of objects and phenomena; discovery of
physically interesting multivariate correlations which may be present in some
of the clusters; etc.
[2]
oai:arXiv.org:astro-ph/0108346 [pdf] - 44322
Exploration of Parameter Spaces in a Virtual Observatory
Submitted: 2001-08-21
Like every other field of intellectual endeavor, astronomy is being
revolutionised by the advances in information technology. There is an ongoing
exponential growth in the volume, quality, and complexity of astronomical data
sets, mainly through large digital sky surveys and archives. The Virtual
Observatory (VO) concept represents a scientific and technological framework
needed to cope with this data flood. Systematic exploration of the observable
parameter spaces, covered by large digital sky surveys spanning a range of
wavelengths, will be one of the primary modes of research with a VO. This is
where the truly new discoveries will be made, and new insights be gained about
the already known astronomical objects and phenomena. We review some of the
methodological challenges posed by the analysis of large and complex data sets
expected in the VO-based research. The challenges are driven both by the size
and the complexity of the data sets (billions of data vectors in parameter
spaces of tens or hundreds of dimensions), by the heterogeneity of the data and
measurement errors, including differences in basic survey parameters for the
federated data sets (e.g., in the positional accuracy and resolution,
wavelength coverage, time baseline, etc.), various selection effects, as well
as the intrinsic clustering properties (functional form, topology) of the data
distributions in the parameter spaces of observed attributes. Answering these
challenges will require substantial collaborative efforts and partnerships
between astronomers, computer scientists, and statisticians.
[3]
oai:arXiv.org:astro-ph/0012489 [pdf] - 40078
Exploration of Large Digital Sky Surveys
Djorgovski, S. G.;
Brunner, R. J.;
Mahabal, A. A.;
Odewahn, S. C.;
de Carvalho, R. R.;
Gal, R. R.;
Stolorz, P.;
Granat, R.;
Curkendall, D.;
Jacob, J.;
Castro, S.
Submitted: 2000-12-22
We review some of the scientific opportunities and technical challenges posed
by the exploration of the large digital sky surveys, in the context of a
Virtual Observatory (VO). The VO paradigm will profoundly change the way
observational astronomy is done. Clustering analysis techniques can be used to
discover samples of rare, unusual, or even previously unknown types of
astronomical objects and phenomena. Exploration of the previously poorly probed
portions of the observable parameter space are especially promising. We
illustrate some of the possible types of studies with examples drawn from
DPOSS; much more complex and interesting applications are forthcoming.
Development of the new tools needed for an efficient exploration of these vast
data sets requires a synergy between astronomy and information sciences, with
great potential returns for both fields.
[4]
oai:arXiv.org:astro-ph/9708218 [pdf] - 1235082
Data-Mining a Large Digital Sky Survey: From the Challenges to the
Scientific Results
Submitted: 1997-08-23
The analysis and an efficient scientific exploration of the Digital Palomar
Observatory Sky Survey (DPOSS) represents a major technical challenge. The
input data set consists of 3 Terabytes of pixel information, and contains a few
billion sources. We describe some of the specific scientific problems posed by
the data, including searches for distant quasars and clusters of galaxies, and
the data-mining techniques we are exploring in addressing them.
Machine-assisted discovery methods may become essential for the analysis of
such multi-Terabyte data sets. New and future approaches involve unsupervised
classification and clustering analysis in the Giga-object data space, including
various Bayesian techniques. In addition to the searches for known types of
objects in this data base, these techniques may also offer the possibility of
discovering previously unknown, rare types of astronomical objects.