Normalized to: Di, S.
[1]
oai:arXiv.org:1707.09320 [pdf] - 1586520
Z-checker: A Framework for Assessing Lossy Compression of Scientific
Data
Submitted: 2017-06-12, last modified: 2017-11-10
Because of vast volume of data being produced by today's scientific
simulations and experiments, lossy data compressor allowing user-controlled
loss of accuracy during the compression is a relevant solution for
significantly reducing the data size. However, lossy compressor developers and
users are missing a tool to explore the features of scientific datasets and
understand the data alteration after compression in a systematic and reliable
way. To address this gap, we have designed and implemented a generic framework
called Z-checker. On the one hand, Z-checker combines a battery of data
analysis components for data compression. On the other hand, Z-checker is
implemented as an open-source community tool to which users and developers can
contribute and add new analysis components based on their additional analysis
demands. In this paper, we present a survey of existing lossy compressors. Then
we describe the design framework of Z-checker, in which we integrated
evaluation metrics proposed in prior work as well as other analysis tools.
Specifically, for lossy compressor developers, Z-checker can be used to
characterize critical properties of any dataset to improve compression
strategies. For lossy compression users, Z-checker can detect the compression
quality, provide various global distortion analysis comparing the original data
with the decompressed data and statistical analysis of the compression error.
Z-checker can perform the analysis with either coarse granularity or fine
granularity, such that the users and developers can select the best-fit,
adaptive compressors for different parts of the dataset. Z-checker features a
visualization interface displaying all analysis results in addition to some
basic views of the datasets such as time series. To the best of our knowledge,
Z-checker is the first tool designed to assess lossy compression
comprehensively for scientific datasets.
[2]
oai:arXiv.org:1707.08205 [pdf] - 1586389
Exploration of Pattern-Matching Techniques for Lossy Compression on
Cosmology Simulation Data Sets
Submitted: 2017-06-17, last modified: 2017-08-06
Because of the vast volume of data being produced by today's scientific
simulations, lossy compression allowing user-controlled information loss can
significantly reduce the data size and the I/O burden. However, for large-scale
cosmology simulation, such as the Hardware/Hybrid Accelerated Cosmology Code
(HACC), where memory overhead constraints restrict compression to only one
snapshot at a time, the lossy compression ratio is extremely limited because of
the fairly low spatial coherence and high irregularity of the data. In this
work, we propose a pattern-matching (similarity searching) technique to
optimize the prediction accuracy and compression ratio of SZ lossy compressor
on the HACC data sets. We evaluate our proposed method with different
configurations and compare it with state-of-the-art lossy compressors.
Experiments show that our proposed optimization approach can improve the
prediction accuracy and reduce the compressed size of quantization codes
compared with SZ. We present several lessons useful for future research
involving pattern-matching techniques for lossy compression.