Normalized to: Govada, A.
[1]
oai:arXiv.org:1606.07345 [pdf] - 1785772
A Communication Efficient and Scalable Distributed Data Mining for the
Astronomical Data
Submitted: 2016-06-23
In 2020, ~60PB of archived data will be accessible to the astronomers. But to
analyze such a paramount data will be a challenging task. This is basically due
to the computational model used to download the data from complex
geographically distributed archives to a central site and then analyzing it in
the local systems. Because the data has to be downloaded to the central site,
the network BW limitation will be a hindrance for the scientific discoveries.
Also analyzing this PB-scale on local machines in a centralized manner is
challenging. In this virtual observatory is a step towards this problem,
however, it does not provide the data mining model. Adding the distributed data
mining layer to the VO can be the solution in which the knowledge can be
downloaded by the astronomers instead the raw data and thereafter astronomers
can either reconstruct the data back from the downloaded knowledge or use the
knowledge directly for further analysis.Therefore, in this paper, we present
Distributed Load Balancing Principal Component Analysis for optimally
distributing the computation among the available nodes to minimize the
transmission cost and downloading cost for the end user. The experimental
analysis is done with Fundamental Plane(FP) data, Gadotti data and complex
Mfeat data. In terms of transmission cost, our approach performs better than
Qi. et al. and Yue.et al. The analysis shows that with the complex Mfeat data
~90% downloading cost can be reduced for the end user with the negligible loss
in accuracy.