Normalized to: Schear, N.
[1]
oai:arXiv.org:1406.5751 [pdf] - 1047920
Computing on Masked Data: a High Performance Method for Improving Big
Data Veracity
Submitted: 2014-06-22
The growing gap between data and users calls for innovative tools that
address the challenges faced by big data volume, velocity and variety. Along
with these standard three V's of big data, an emerging fourth "V" is veracity,
which addresses the confidentiality, integrity, and availability of the data.
Traditional cryptographic techniques that ensure the veracity of data can have
overheads that are too large to apply to big data. This work introduces a new
technique called Computing on Masked Data (CMD), which improves data veracity
by allowing computations to be performed directly on masked data and ensuring
that only authorized recipients can unmask the data. Using the sparse linear
algebra of associative arrays, CMD can be performed with significantly less
overhead than other approaches while still supporting a wide range of linear
algebraic operations on the masked data. Databases with strong support of
sparse operations, such as SciDB or Apache Accumulo, are ideally suited to this
technique. Examples are shown for the application of CMD to a complex DNA
matching algorithm and to database operations over social media data.