Normalized to: Kryukov, A.
[1]
oai:arXiv.org:1908.01554 [pdf] - 1928286
Data Aggregation In The Astroparticle Physics Distributed Data Storage
Submitted: 2019-08-05
German-Russian Astroparticle Data Life Cycle Initiative is an international
project whose aim is to develop a distributed data storage system that
aggregates data from the storage systems of different astroparticle
experiments. The prototype of such a system, which is called the Astroparticle
Physics Distributed Data Storage (APPDS), has been being developed. In this
paper, the Data Aggregation Service, one of the core services of APDDS, is
presented. The Data Aggregation Service connects all distributed services of
APPDS together to find the necessary data and deliver them to users on demand.
[2]
oai:arXiv.org:1907.13303 [pdf] - 1926539
German-Russian Astroparticle Data Life Cycle Initiative
Haungs, Andreas;
Bychkov, Igor;
Dubenskaya, Julia;
Fedorov, Oleg;
Heiss, Andreas;
Kang, Donghwa;
Kazarina, Yulia;
Korosteleva, Elena;
Kostunin, Dmitriy;
Kryukov, Alexander;
Mikhailov, Andrey;
Nguyen, Minh-Duc;
Polgart, Frank;
Polyakov, Stanislav;
Postnikov, Evgeny;
Shigarov, Alexey;
Shipilov, Dmitry;
Streit, Achim;
Tokareva, Victoria;
Wochele, Doris;
Wochele, Jürgen;
Zhurov, Dmitry
Submitted: 2019-07-31
A data life cycle (DLC) is a high-level data processing pipeline that
involves data acquisition, event reconstruction, data analysis, publication,
archiving, and sharing. For astroparticle physics a DLC is particularly
important due to the geographical and content diversity of the research field.
A dedicated and experiment spanning analysis and data centre would ensure that
multi-messenger analyses can be carried out using state-of-the-art methods. The
German-Russian Astroparticle Data Life Cycle Initiative (GRADLCI) is a joint
project of the KASCADE-Grande and TAIGA collaborations, aimed at developing a
concept and creating a DLC prototype that takes into account the data
processing features specific for the research field. An open science system
based on the KASCADE Cosmic Ray Data Centre (KCDC), which is a web-based
platform to provide the astroparticle physics data for the general public, must
also include effective methods for distributed data storage algorithms and
techniques to allow the community to perform simulations and analyses with
sophisticated machine learning methods. The aim is to achieve more efficient
analyses of the data collected in different, globally dispersed observatories,
as well as a modern education to Big Data Scientist in the synergy between
basic research and the information society. The contribution covers the status
and future plans of the initiative.
[3]
oai:arXiv.org:1907.10480 [pdf] - 1922562
Deep Learning for Energy Estimation and Particle Identification in
Gamma-ray Astronomy
Submitted: 2019-07-23
Deep learning techniques, namely convolutional neural networks (CNN), have
previously been adapted to select gamma-ray events in the TAIGA experiment,
having achieved a good quality of selection as compared with the conventional
Hillas approach. Another important task for the TAIGA data analysis was also
solved with CNN: gamma-ray energy estimation showed some improvement in
comparison with the conventional method based on the Hillas analysis.
Furthermore, our software was completely redeveloped for the graphics
processing unit (GPU), which led to significantly faster calculations in both
of these tasks. All the results have been obtained with the simulated data of
TAIGA Monte Carlo software; their experimental confirmation is envisaged for
the near future.
[4]
oai:arXiv.org:1907.06863 [pdf] - 1917095
Distributed data storage for modern astroparticle physics experiments
Submitted: 2019-07-16
The German-Russian Astroparticle Data Life Cycle Initiative is an
international project launched in 2018. The Initiative aims to develop
technologies that provide a unified approach to data management, as well as to
demonstrate their applicability on the example of two large astrophysical
experiments - KASCADE and TAIGA. One of the key points of the project is the
development of a distributed storage, which, on the one hand, will allow data
of several experiments to be combined into a single repository with unified
interface, and on the other hand, will provide data to all participants of
experimental groups for multi-messenger analysis. Our approach to storage
design is based on the single write-multiple read (SWMR) model for accessing
raw or centrally processed data for further analysis. The main feature of the
distributed storage is the ability to extract data either as a collection of
files or as aggregated events from different sources. In the last case the
storage provides users with a special service that aggregates data from
different storages into a single sample. Thanks to this feature,
multi-messenger methods used for more sophisticated data exploration can be
applied. Users can use both Web-interface and Application Programming Interface
(API) for accessing the storage. In this paper we describe the architecture of
a distributed data storage for astroparticle physics and discuss the current
status of our work.
[5]
oai:arXiv.org:1907.06183 [pdf] - 1916080
Metadata Extraction from Raw Astroparticle Data of TAIGA Experiment
Submitted: 2019-07-14
Today, the operating TAIGA (Tunka Advanced Instrument for cosmic rays and
Gamma Astronomy) experiment continuously produces and accumulates a large
volume of raw astroparticle data. To be available for the scientific community
these data should be well-described and formally characterized. The use of
metadata makes it possible to search for and to aggregate digital objects (e.g.
events and runs) by time and equipment through a unified interface to access
them. The important part of the metadata is hidden and scattered in
folder/files names and package headers. Such metadata should be extracted from
binary files, transformed to a unified form of digital objects, and loaded into
the catalog. To address this challenge we developed a concept of the metadata
extractor that can be extended by facility-specific extraction modules. It is
designed to automatically collect descriptive metadata from raw data files of
all TAIGA formats.
[6]
oai:arXiv.org:1906.10594 [pdf] - 1905978
Towards the Baikal Open Laboratory in Astroparticle Physics
Bezyazeekov, Pavel;
Bychkov, Igor;
Budnev, Nikolay;
Chernykh, Daria;
Kazarina, Yulia;
Kostunin, Dmitriy;
Kryukov, Alexander;
Monkhoev, Roman;
Shigarov, Alexey;
Shipilov, Dmitriy
Submitted: 2019-06-25
The open science framework defined in the German-Russian Astroparticle Data
Life Cycle Initiative (GRADLCI) has triggered educational and outreach
activities at the Irkutsk State University (ISU), which is actively
participated in the two major astroparticle facilities in the region: TAIGA
observatory and Baikal-GVD neutrino telescope. We describe the ideas grew out
of this unique environment and propose a new open science laboratory based on
education and outreach as well as on the development and testing new methods
and techniques for the multimessenger astronomy.
[7]
oai:arXiv.org:1812.01906 [pdf] - 1794146
A distributed data warehouse system for astroparticle physics
Nguyen, Minh-Duc;
Kryukov, Alexander;
Dubenskaya, Julia;
Korosteleva, Elena;
Polyakov, Stanislav;
Postnikov, Evgeny;
Bychkov, Igor;
Mikhailov, Andrey;
Shigarov, Alexey;
Fedorov, Oleg;
Kazarina, Yulia;
Shipilov, Dmitry;
Zhurov, Dmitry
Submitted: 2018-12-05
A distributed data warehouse system is one of the actual issues in the field
of astroparticle physics. Famous experiments, such as TAIGA, KASCADE-Grande,
produce tens of terabytes of data measured by their instruments. It is critical
to have a smart data warehouse system on-site to store the collected data for
further distribution effectively. It is also vital to provide scientists with a
handy and user-friendly interface to access the collected data with proper
permissions not only on-site but also online. The latter case is handy when
scientists need to combine data from different experiments for analysis. In
this work, we describe an approach to implementing a distributed data warehouse
system that allows scientists to acquire just the necessary data from different
experiments via the Internet on demand. The implementation is based on
CernVM-FS with additional components developed by us to search through the
whole available data sets and deliver their subsets to users' computers.
[8]
oai:arXiv.org:1812.02234 [pdf] - 1793637
A framework to monitor activities of satellite data processing in
real-time
Submitted: 2018-12-05
Space Monitoring Data Center (SMDC) of SINP MSU is one of the several centers
in the world that collects data on the radiational conditions in near-Earth
orbit from various Russian (Lomonosov, Electro-L1, Electro-L2, Meteor-M1,
Meteor-M2, etc.) and foreign (GOES 13, GOES 15, ACE, SDO, etc.) satellites. The
primary purposes of SMDC are: aggregating heterogeneous data from different
sources; providing a unified interface for data retrieval, visualization,
analysis, as well as development and testing new space weather models; and
controlling the correctness and completeness of data. Space weather models rely
on data provided by SMDC to produce forecasts. Therefore, monitoring the whole
data processing cycle is crucial for further success in the modelling of
physical processes in near-Earth orbit based on the collected data. To solve
the problem described above, we have developed a framework called Live Monitor
at SMDC. Live Monitor allows watching all stages and program components
involved in each data processing cycle. All activities of each stage are logged
by Live Monitor and shown in real-time on a web interface. When an error
occurs, a notification message will be sent to satellite operators via email
and the Telegram messenger service so that they could take measures in time.
The Live Monitor's API can be used to create a customized monitoring service
with minimum coding.
[9]
oai:arXiv.org:1812.01324 [pdf] - 1791980
Using Binary File Format Description Languages for Documenting, Parsing,
and Verifying Raw Data in TAIGA Experiment
Bychkov, I.;
Demichev, A.;
Dubenskaya, J.;
Fedorov, O.;
Hmelnov, A.;
Kazarina, Y.;
Korosteleva, E.;
Kostunin, D.;
Kryukov, A.;
Mikhailov, A.;
Nguyen, M. D.;
Polyakov, S.;
Postnikov, E.;
Shigarov, A.;
Shipilov, D.;
Zhurov, D.
Submitted: 2018-12-04
The paper is devoted to the issues of raw binary data documenting, parsing
and verifying in astroparticle data lifecycle. The long-term preservation of
raw data of astroparticle experiments as originally generated is essential for
re-running analyses and reproducing research results. The selected high-quality
raw data should have detailed documentation and accompanied by open software
tools for access to them. We consider applicability of binary file format
description languages to specify, parse and verify raw data of the Tunka
Advanced Instrument for cosmic rays and Gamma Astronomy (TAIGA) experiment. The
formal specifications are implemented for five data formats of the experiment
and provide automatic generation of source code for data reading libraries in
target programming languages (e.g. C++, Java, and Python). These libraries were
tested on TAIGA data. They showed a good performance and help us to locate the
parts with corrupted data. The format specifications can be used as metadata
for exchanging of astroparticle raw data. They can also simplify software
development for data aggregation from various sources for the multi-messenger
analysis.
[10]
oai:arXiv.org:1812.01551 [pdf] - 1792007
Particle identification in ground-based gamma-ray astronomy using
convolutional neural networks
Postnikov, E. B.;
Bychkov, I. V.;
Dubenskaya, J. Y.;
Fedorov, O. L.;
Kazarina, Y. A.;
Korosteleva, E. E.;
Kryukov, A. P.;
Mikhailov, A. A.;
Nguyen, M. D.;
Polyakov, S. P.;
Shigarov, A. O.;
Shipilov, D. A.;
Zhurov, D. P.
Submitted: 2018-12-04
Modern detectors of cosmic gamma-rays are a special type of imaging
telescopes (air Cherenkov telescopes) supplied with cameras with a relatively
large number of photomultiplier-based pixels. For example, the camera of the
TAIGA-IACT telescope has 560 pixels of hexagonal structure. Images in such
cameras can be analysed by deep learning techniques to extract numerous
physical and geometrical parameters and/or for incoming particle
identification. The most powerful deep learning technique for image analysis,
the so-called convolutional neural network (CNN), was implemented in this
study. Two open source libraries for machine learning, PyTorch and TensorFlow,
were tested as possible software platforms for particle identification in
imaging air Cherenkov telescopes. Monte Carlo simulation was performed to
analyse images of gamma-rays and background particles (protons) as well as
estimate identification accuracy. Further steps of implementation and
improvement of this technique are discussed.
[11]
oai:arXiv.org:1811.11822 [pdf] - 1886463
Gamma/Hadron Separation in Imaging Air Cherenkov Telescopes Using Deep
Learning Libraries TensorFlow and PyTorch
Submitted: 2018-11-28, last modified: 2018-12-04
In this work we compare two open source machine learning libraries, PyTorch
and TensorFlow, as software platforms for rejecting hadron background events
detected by imaging air Cherenkov telescopes (IACTs). Monte Carlo simulation
for the TAIGA-IACT telescope is used to estimate background rejection quality.
A wide variety of neural network algorithms provided by both libraries can
easily be tested on various types of data, which is useful for various imaging
air Cherenkov experiments. The work is a component of the Astroparticle.online
project, which collaborates with the TAIGA and KASCADE experiments and welcomes
any astroparticle experiment to join.
[12]
oai:arXiv.org:1812.01212 [pdf] - 1791969
Application of HUBzero platform for the educational process in
astroparticle physics
Kazarina, Yulia;
Bychkov, Igor;
Kryukov, Alexander;
Dubenskaya, Julia;
Korosteleva, Elena;
Nguyen, Minh-Duc;
Polyakov, Stanislav;
Postnikov, Evgeny;
Mikhailov, Andrey;
Shigarov, Alexey;
Fedorov, Oleg;
Shipilov, Dmitry;
Zhurov, Dmitry
Submitted: 2018-12-03
In the frame of the Karlsruhe-Russian Astroparticle Data Life Cycle
Initiative it was proposed to deploy an educational resource
astroparticle.online for the training of students in the field of astroparticle
physics. This resource is based on HUBzero, which is an open-source software
platform for building powerful websites, which supports scientific discovery,
learning, and collaboration. HUBzero has been deployed on the servers of
Matrosov Institute for System Dynamics and Control Theory. The educational
resource astroparticle.online is being filled with the information covering
cosmic messengers, astroparticle physics experiments and educational courses
and schools on astroparticle physics. Furthermore, the educational resource
astroparticle.online can be used for online collaboration. We present the
current status of this project and our first experience of application of this
service as a collaboration framework.
[13]
oai:arXiv.org:1811.12086 [pdf] - 1791770
Russian-German Astroparticle Data Life Cycle Initiative
Bychkov, Igor;
Demichev, Andrey;
Dubenskaya, Julia;
Fedorov, Oleg;
Haungs, Andreas;
Heiss, Andreas;
Kang, Donghwa;
Kazarina, Yulia;
Korosteleva, Elena;
Kostunin, Dmitriy;
Kryukov, Alexander;
Mikhailov, Andrey;
Nguyen, Minh-Duc;
Polyakov, Stanislav;
Postnikov, Evgeny;
Shigarov, Alexey;
Shipilov, Dmitry;
Streit, Achim;
Tokareva, Victoria;
Wochele, Doris;
Wochele, Jürgen;
Zhurov, Dmitry
Submitted: 2018-11-29
Modern large-scale astroparticle setups measure high-energy particles, gamma
rays, neutrinos, radio waves, and the recently discovered gravitational waves.
Ongoing and future experiments are located worldwide. The data acquired have
different formats, storage concepts, and publication policies. Such differences
are a crucial point in the era of Big Data and of multi-messenger analysis in
astroparticle physics. We propose an open science web platform called
ASTROPARTICLE.ONLINE which enables us to publish, store, search, select, and
analyze astroparticle data. In the first stage of the project, the following
components of a full data life cycle concept are under development: describing,
storing, and reusing astroparticle data; software to perform multi-messenger
analysis using deep learning; and outreach for students, post-graduate
students, and others who are interested in astroparticle physics. Here we
describe the concepts of the web platform and the first obtained results,
including the meta data structure for astroparticle data, data analysis by
using convolution neural networks, description of the binary data, and the
outreach platform for those interested in astroparticle physics. The
KASCADE-Grande and TAIGA cosmic-ray experiments were chosen as pilot examples.
[14]
oai:arXiv.org:1811.02403 [pdf] - 1785865
Architecture of Distributed Data Storage for Astroparticle Physics
Submitted: 2018-11-06
For the successful development of the astrophysics and, accordingly, for
obtaining more complete knowledge of the Universe, it is extremely important to
combine and comprehensively analyze information of various types (e.g., about
charged cosmic particles, gamma rays, neutrinos, etc.) obtained by using divers
large-scale experimental setups located throughout the world. It is obvious
that all kinds of activities must be performed continually across all stages of
the data life cycle to help support effective data management, in particular,
the collection and storage of data, its processing and analysis, refining the
physical model, making preparations for publication, and data reprocessing
taking refinement into account. In this paper we present a general approach to
construction and the architecture of a system to be able to collect, store, and
provide users' access to astrophysical data. We also suggest a new approach to
the construction of a metadata registry based on the blockchain technology.