Normalized to: Shigarov, A.
[1]
oai:arXiv.org:1908.01554 [pdf] - 1928286
Data Aggregation In The Astroparticle Physics Distributed Data Storage
Submitted: 2019-08-05
German-Russian Astroparticle Data Life Cycle Initiative is an international
project whose aim is to develop a distributed data storage system that
aggregates data from the storage systems of different astroparticle
experiments. The prototype of such a system, which is called the Astroparticle
Physics Distributed Data Storage (APPDS), has been being developed. In this
paper, the Data Aggregation Service, one of the core services of APDDS, is
presented. The Data Aggregation Service connects all distributed services of
APPDS together to find the necessary data and deliver them to users on demand.
[2]
oai:arXiv.org:1907.13303 [pdf] - 1926539
German-Russian Astroparticle Data Life Cycle Initiative
Haungs, Andreas;
Bychkov, Igor;
Dubenskaya, Julia;
Fedorov, Oleg;
Heiss, Andreas;
Kang, Donghwa;
Kazarina, Yulia;
Korosteleva, Elena;
Kostunin, Dmitriy;
Kryukov, Alexander;
Mikhailov, Andrey;
Nguyen, Minh-Duc;
Polgart, Frank;
Polyakov, Stanislav;
Postnikov, Evgeny;
Shigarov, Alexey;
Shipilov, Dmitry;
Streit, Achim;
Tokareva, Victoria;
Wochele, Doris;
Wochele, Jürgen;
Zhurov, Dmitry
Submitted: 2019-07-31
A data life cycle (DLC) is a high-level data processing pipeline that
involves data acquisition, event reconstruction, data analysis, publication,
archiving, and sharing. For astroparticle physics a DLC is particularly
important due to the geographical and content diversity of the research field.
A dedicated and experiment spanning analysis and data centre would ensure that
multi-messenger analyses can be carried out using state-of-the-art methods. The
German-Russian Astroparticle Data Life Cycle Initiative (GRADLCI) is a joint
project of the KASCADE-Grande and TAIGA collaborations, aimed at developing a
concept and creating a DLC prototype that takes into account the data
processing features specific for the research field. An open science system
based on the KASCADE Cosmic Ray Data Centre (KCDC), which is a web-based
platform to provide the astroparticle physics data for the general public, must
also include effective methods for distributed data storage algorithms and
techniques to allow the community to perform simulations and analyses with
sophisticated machine learning methods. The aim is to achieve more efficient
analyses of the data collected in different, globally dispersed observatories,
as well as a modern education to Big Data Scientist in the synergy between
basic research and the information society. The contribution covers the status
and future plans of the initiative.
[3]
oai:arXiv.org:1907.06863 [pdf] - 1917095
Distributed data storage for modern astroparticle physics experiments
Submitted: 2019-07-16
The German-Russian Astroparticle Data Life Cycle Initiative is an
international project launched in 2018. The Initiative aims to develop
technologies that provide a unified approach to data management, as well as to
demonstrate their applicability on the example of two large astrophysical
experiments - KASCADE and TAIGA. One of the key points of the project is the
development of a distributed storage, which, on the one hand, will allow data
of several experiments to be combined into a single repository with unified
interface, and on the other hand, will provide data to all participants of
experimental groups for multi-messenger analysis. Our approach to storage
design is based on the single write-multiple read (SWMR) model for accessing
raw or centrally processed data for further analysis. The main feature of the
distributed storage is the ability to extract data either as a collection of
files or as aggregated events from different sources. In the last case the
storage provides users with a special service that aggregates data from
different storages into a single sample. Thanks to this feature,
multi-messenger methods used for more sophisticated data exploration can be
applied. Users can use both Web-interface and Application Programming Interface
(API) for accessing the storage. In this paper we describe the architecture of
a distributed data storage for astroparticle physics and discuss the current
status of our work.
[4]
oai:arXiv.org:1907.06183 [pdf] - 1916080
Metadata Extraction from Raw Astroparticle Data of TAIGA Experiment
Submitted: 2019-07-14
Today, the operating TAIGA (Tunka Advanced Instrument for cosmic rays and
Gamma Astronomy) experiment continuously produces and accumulates a large
volume of raw astroparticle data. To be available for the scientific community
these data should be well-described and formally characterized. The use of
metadata makes it possible to search for and to aggregate digital objects (e.g.
events and runs) by time and equipment through a unified interface to access
them. The important part of the metadata is hidden and scattered in
folder/files names and package headers. Such metadata should be extracted from
binary files, transformed to a unified form of digital objects, and loaded into
the catalog. To address this challenge we developed a concept of the metadata
extractor that can be extended by facility-specific extraction modules. It is
designed to automatically collect descriptive metadata from raw data files of
all TAIGA formats.
[5]
oai:arXiv.org:1906.10594 [pdf] - 1905978
Towards the Baikal Open Laboratory in Astroparticle Physics
Bezyazeekov, Pavel;
Bychkov, Igor;
Budnev, Nikolay;
Chernykh, Daria;
Kazarina, Yulia;
Kostunin, Dmitriy;
Kryukov, Alexander;
Monkhoev, Roman;
Shigarov, Alexey;
Shipilov, Dmitriy
Submitted: 2019-06-25
The open science framework defined in the German-Russian Astroparticle Data
Life Cycle Initiative (GRADLCI) has triggered educational and outreach
activities at the Irkutsk State University (ISU), which is actively
participated in the two major astroparticle facilities in the region: TAIGA
observatory and Baikal-GVD neutrino telescope. We describe the ideas grew out
of this unique environment and propose a new open science laboratory based on
education and outreach as well as on the development and testing new methods
and techniques for the multimessenger astronomy.
[6]
oai:arXiv.org:1812.01906 [pdf] - 1794146
A distributed data warehouse system for astroparticle physics
Nguyen, Minh-Duc;
Kryukov, Alexander;
Dubenskaya, Julia;
Korosteleva, Elena;
Polyakov, Stanislav;
Postnikov, Evgeny;
Bychkov, Igor;
Mikhailov, Andrey;
Shigarov, Alexey;
Fedorov, Oleg;
Kazarina, Yulia;
Shipilov, Dmitry;
Zhurov, Dmitry
Submitted: 2018-12-05
A distributed data warehouse system is one of the actual issues in the field
of astroparticle physics. Famous experiments, such as TAIGA, KASCADE-Grande,
produce tens of terabytes of data measured by their instruments. It is critical
to have a smart data warehouse system on-site to store the collected data for
further distribution effectively. It is also vital to provide scientists with a
handy and user-friendly interface to access the collected data with proper
permissions not only on-site but also online. The latter case is handy when
scientists need to combine data from different experiments for analysis. In
this work, we describe an approach to implementing a distributed data warehouse
system that allows scientists to acquire just the necessary data from different
experiments via the Internet on demand. The implementation is based on
CernVM-FS with additional components developed by us to search through the
whole available data sets and deliver their subsets to users' computers.
[7]
oai:arXiv.org:1812.01551 [pdf] - 1792007
Particle identification in ground-based gamma-ray astronomy using
convolutional neural networks
Postnikov, E. B.;
Bychkov, I. V.;
Dubenskaya, J. Y.;
Fedorov, O. L.;
Kazarina, Y. A.;
Korosteleva, E. E.;
Kryukov, A. P.;
Mikhailov, A. A.;
Nguyen, M. D.;
Polyakov, S. P.;
Shigarov, A. O.;
Shipilov, D. A.;
Zhurov, D. P.
Submitted: 2018-12-04
Modern detectors of cosmic gamma-rays are a special type of imaging
telescopes (air Cherenkov telescopes) supplied with cameras with a relatively
large number of photomultiplier-based pixels. For example, the camera of the
TAIGA-IACT telescope has 560 pixels of hexagonal structure. Images in such
cameras can be analysed by deep learning techniques to extract numerous
physical and geometrical parameters and/or for incoming particle
identification. The most powerful deep learning technique for image analysis,
the so-called convolutional neural network (CNN), was implemented in this
study. Two open source libraries for machine learning, PyTorch and TensorFlow,
were tested as possible software platforms for particle identification in
imaging air Cherenkov telescopes. Monte Carlo simulation was performed to
analyse images of gamma-rays and background particles (protons) as well as
estimate identification accuracy. Further steps of implementation and
improvement of this technique are discussed.
[8]
oai:arXiv.org:1812.01324 [pdf] - 1791980
Using Binary File Format Description Languages for Documenting, Parsing,
and Verifying Raw Data in TAIGA Experiment
Bychkov, I.;
Demichev, A.;
Dubenskaya, J.;
Fedorov, O.;
Hmelnov, A.;
Kazarina, Y.;
Korosteleva, E.;
Kostunin, D.;
Kryukov, A.;
Mikhailov, A.;
Nguyen, M. D.;
Polyakov, S.;
Postnikov, E.;
Shigarov, A.;
Shipilov, D.;
Zhurov, D.
Submitted: 2018-12-04
The paper is devoted to the issues of raw binary data documenting, parsing
and verifying in astroparticle data lifecycle. The long-term preservation of
raw data of astroparticle experiments as originally generated is essential for
re-running analyses and reproducing research results. The selected high-quality
raw data should have detailed documentation and accompanied by open software
tools for access to them. We consider applicability of binary file format
description languages to specify, parse and verify raw data of the Tunka
Advanced Instrument for cosmic rays and Gamma Astronomy (TAIGA) experiment. The
formal specifications are implemented for five data formats of the experiment
and provide automatic generation of source code for data reading libraries in
target programming languages (e.g. C++, Java, and Python). These libraries were
tested on TAIGA data. They showed a good performance and help us to locate the
parts with corrupted data. The format specifications can be used as metadata
for exchanging of astroparticle raw data. They can also simplify software
development for data aggregation from various sources for the multi-messenger
analysis.
[9]
oai:arXiv.org:1812.01212 [pdf] - 1791969
Application of HUBzero platform for the educational process in
astroparticle physics
Kazarina, Yulia;
Bychkov, Igor;
Kryukov, Alexander;
Dubenskaya, Julia;
Korosteleva, Elena;
Nguyen, Minh-Duc;
Polyakov, Stanislav;
Postnikov, Evgeny;
Mikhailov, Andrey;
Shigarov, Alexey;
Fedorov, Oleg;
Shipilov, Dmitry;
Zhurov, Dmitry
Submitted: 2018-12-03
In the frame of the Karlsruhe-Russian Astroparticle Data Life Cycle
Initiative it was proposed to deploy an educational resource
astroparticle.online for the training of students in the field of astroparticle
physics. This resource is based on HUBzero, which is an open-source software
platform for building powerful websites, which supports scientific discovery,
learning, and collaboration. HUBzero has been deployed on the servers of
Matrosov Institute for System Dynamics and Control Theory. The educational
resource astroparticle.online is being filled with the information covering
cosmic messengers, astroparticle physics experiments and educational courses
and schools on astroparticle physics. Furthermore, the educational resource
astroparticle.online can be used for online collaboration. We present the
current status of this project and our first experience of application of this
service as a collaboration framework.
[10]
oai:arXiv.org:1811.12086 [pdf] - 1791770
Russian-German Astroparticle Data Life Cycle Initiative
Bychkov, Igor;
Demichev, Andrey;
Dubenskaya, Julia;
Fedorov, Oleg;
Haungs, Andreas;
Heiss, Andreas;
Kang, Donghwa;
Kazarina, Yulia;
Korosteleva, Elena;
Kostunin, Dmitriy;
Kryukov, Alexander;
Mikhailov, Andrey;
Nguyen, Minh-Duc;
Polyakov, Stanislav;
Postnikov, Evgeny;
Shigarov, Alexey;
Shipilov, Dmitry;
Streit, Achim;
Tokareva, Victoria;
Wochele, Doris;
Wochele, Jürgen;
Zhurov, Dmitry
Submitted: 2018-11-29
Modern large-scale astroparticle setups measure high-energy particles, gamma
rays, neutrinos, radio waves, and the recently discovered gravitational waves.
Ongoing and future experiments are located worldwide. The data acquired have
different formats, storage concepts, and publication policies. Such differences
are a crucial point in the era of Big Data and of multi-messenger analysis in
astroparticle physics. We propose an open science web platform called
ASTROPARTICLE.ONLINE which enables us to publish, store, search, select, and
analyze astroparticle data. In the first stage of the project, the following
components of a full data life cycle concept are under development: describing,
storing, and reusing astroparticle data; software to perform multi-messenger
analysis using deep learning; and outreach for students, post-graduate
students, and others who are interested in astroparticle physics. Here we
describe the concepts of the web platform and the first obtained results,
including the meta data structure for astroparticle data, data analysis by
using convolution neural networks, description of the binary data, and the
outreach platform for those interested in astroparticle physics. The
KASCADE-Grande and TAIGA cosmic-ray experiments were chosen as pilot examples.