Normalized to: Kempton, D.
[1]
oai:arXiv.org:2006.12224 [pdf] - 2119196
Machine Learning in Heliophysics and Space Weather Forecasting: A White
Paper of Findings and Recommendations
Nita, Gelu;
Georgoulis, Manolis;
Kitiashvili, Irina;
Sadykov, Viacheslav;
Camporeale, Enrico;
Kosovichev, Alexander;
Wang, Haimin;
Oria, Vincent;
Wang, Jason;
Angryk, Rafal;
Aydin, Berkay;
Ahmadzadeh, Azim;
Bai, Xiaoli;
Bastian, Timothy;
Boubrahimi, Soukaina Filali;
Chen, Bin;
Davey, Alisdair;
Fereira, Sheldon;
Fleishman, Gregory;
Gary, Dale;
Gerrard, Andrew;
Hellbourg, Gregory;
Herbert, Katherine;
Ireland, Jack;
Illarionov, Egor;
Kuroda, Natsuha;
Li, Qin;
Liu, Chang;
Liu, Yuexin;
Kim, Hyomin;
Kempton, Dustin;
Ma, Ruizhe;
Martens, Petrus;
McGranaghan, Ryan;
Semones, Edward;
Stefan, John;
Stejko, Andrey;
Collado-Vega, Yaireska;
Wang, Meiqi;
Xu, Yan;
Yu, Sijie
Submitted: 2020-06-22
The authors of this white paper met on 16-17 January 2020 at the New Jersey
Institute of Technology, Newark, NJ, for a 2-day workshop that brought together
a group of heliophysicists, data providers, expert modelers, and computer/data
scientists. Their objective was to discuss critical developments and prospects
of the application of machine and/or deep learning techniques for data
analysis, modeling and forecasting in Heliophysics, and to shape a strategy for
further developments in the field. The workshop combined a set of plenary
sessions featuring invited introductory talks interleaved with a set of open
discussion sessions. The outcome of the discussion is encapsulated in this
white paper that also features a top-level list of recommendations agreed by
participants.
[2]
oai:arXiv.org:1911.09061 [pdf] - 2001514
Challenges with Extreme Class-Imbalance and Temporal Coherence: A Study
on Solar Flare Data
Submitted: 2019-11-20
In analyses of rare-events, regardless of the domain of application,
class-imbalance issue is intrinsic. Although the challenges are known to data
experts, their explicit impact on the analytic and the decisions made based on
the findings are often overlooked. This is in particular prevalent in
interdisciplinary research where the theoretical aspects are sometimes
overshadowed by the challenges of the application. To show-case these
undesirable impacts, we conduct a series of experiments on a recently created
benchmark data, named Space Weather ANalytics for Solar Flares (SWAN-SF). This
is a multivariate time series dataset of magnetic parameters of active regions.
As a remedy for the imbalance issue, we study the impact of data manipulation
(undersampling and oversampling) and model manipulation (using class weights).
Furthermore, we bring to focus the auto-correlation of time series that is
inherited from the use of sliding window for monitoring flares' history.
Temporal coherence, as we call this phenomenon, invalidates the randomness
assumption, thus impacting all sampling practices including different
cross-validation techniques. We illustrate how failing to notice this concept
could give an artificial boost in the forecast performance and result in
misleading findings. Throughout this study we utilized Support Vector Machine
as a classifier, and True Skill Statistics as a verification metric for
comparison of experiments. We conclude our work by specifying the correct
practice in each case, and we hope that this study could benefit researchers in
other domains where time series of rare events are of interest.
[3]
oai:arXiv.org:1912.02743 [pdf] - 2010164
Toward Filament Segmentation Using Deep Neural Networks
Submitted: 2019-11-20
We use a well-known deep neural network framework, called Mask R-CNN, for
identification of solar filaments in full-disk H-alpha images from Big Bear
Solar Observatory (BBSO). The image data, collected from BBSO's archive, are
integrated with the spatiotemporal metadata of filaments retrieved from the
Heliophysics Events Knowledgebase (HEK) system. This integrated data is then
treated as the ground-truth in the training process of the model. The available
spatial metadata are the output of a currently running filament-detection
module developed and maintained by the Feature Finding Team; an international
consortium selected by NASA. Despite the known challenges in the identification
and characterization of filaments by the existing module, which in turn are
inherited into any other module that intends to learn from such outputs, Mask
R-CNN shows promising results. Trained and validated on two years worth of BBSO
data, this model is then tested on the three following years. Our case-by-case
and overall analyses show that Mask R-CNN can clearly compete with the existing
module and in some cases even perform better. Several cases of false positives
and false negatives, that are correctly segmented by this model are also shown.
The overall advantages of using the proposed model are two-fold: First, deep
neural networks' performance generally improves as more annotated data, or
better annotations are provided. Second, such a model can be scaled up to
detect other solar events, as well as a single multi-purpose module. The
results presented in this study introduce a proof of concept in benefits of
employing deep neural networks for detection of solar events, and in
particular, filaments.
[4]
oai:arXiv.org:1906.01062 [pdf] - 1925073
A Curated Image Parameter Dataset from Solar Dynamics Observatory
Mission
Submitted: 2019-06-03
We provide a large image parameter dataset extracted from the Solar Dynamics
Observatory (SDO) mission's AIA instrument, for the period of January 2011
through the current date, with the cadence of six minutes, for nine wavelength
channels. The volume of the dataset for each year is just short of 1 TiB.
Towards achieving better results in the region classification of active regions
and coronal holes, we improve upon the performance of a set of ten image
parameters, through an in depth evaluation of various assumptions that are
necessary for calculation of these image parameters. Then, where possible, a
method for finding an appropriate settings for the parameter calculations was
devised, as well as a validation task to show our improved results. In
addition, we include comparisons of JP2 and FITS image formats using supervised
classification models, by tuning the parameters specific to the format of the
images from which they are extracted, and specific to each wavelength. The
results of these comparisons show that utilizing JP2 images, which are
significantly smaller files, is not detrimental to the region classification
task that these parameters were originally intended for. Finally, we compute
the tuned parameters on the AIA images and provide a public API
(http://dmlab.cs.gsu.edu/dmlabapi) to access the dataset. This dataset can be
used in a range of studies on AIA images, such as content-based image retrieval
or tracking of solar events, where dimensionality reduction on the images is
necessary for feasibility of the tasks.