From NoSQL Accumulo to NewSQL Graphulo: design and utility of graph algorithms inside a BigTable database
September 13, 2016
Conference Paper
Author:
Published in:
HPEC 2016: IEEE Conf. on High Performance Extreme Computing, 13-15 September 2016.
R&D Area:
R&D Group:
Summary
Google BigTable's scale-out design for distributed key-value storage inspired a generation of NoSQL databases. Recently the NewSQL paradigm emerged in response to analytic workloads that demand distributed computation local to data storage. Many such analytics take the form of graph algorithms, a trend that motivated the GraphBLAS initiative to standardize a set of matrix math kernels for building graph algorithms. In this article we show how it is possible to implement the GraphBLAS kernels in a BigTable database by presenting the design of Graphulo, a library for executing graph algorithms inside the Apache Accumulo database. We detail the Graphulo implementation of two graph algorithms and conduct experiments comparing their performance to two main-memory matrix math systems. Our results shed insight into the conditions that determine when executing a graph algorithm is faster inside a database versus an external system—in short, that memory requirements and relative I/O are critical factors.