Distributed Graph Analytics with Faunus

 

Distributed Graph Analytics with Faunus

Marko Rodriguez

 

Abstract:

Faunus is a graph analytics engine built atop the Hadoop distributed computing platform. The graph representation is a distributed adjacency list, whereby a vertex and its incident edges are co-located with one another. Querying a Faunus graph is possible with a MapReduce-variant of the Gremlin graph traversal language. A Gremlin expression compiles down to a series of MapReduce-steps that are sequence optimized and then executed by Hadoop. Results are stored as transformations to the input graph (graph derivations) or computational side-effects such as aggregates (graph statistics). Beyond querying, a collection of input/output formats are supported which enable Faunus to load/store graphs in the distributed graph database Titan, various common text-based formats stored in HDFS, and via arbitrary user-defined functions. This presentation will focus primarily on Faunus, but will also review the satellite technologies that enable it.
 

References:

 

The Speaker:

Dr. Marko A. Rodriguez has focused his academic and commercial career on graph theory, network science, and graph-system architecture and development. He is a TinkerPop cofounder and serves as the lead developer of the Gremlin graph traversal language. Marko received his Bachelors in Cognitive Science from UC San Diego, his Masters and Ph.D. in Computer Science from UC Santa Cruz and was a Director’s Fellow at the Center for Nonlinear Studies of the Los Alamos National Laboratory. Currently, Marko is CEO and engineer for Aurelius -- a graph computing firm based out of Santa Fe, New Mexico.
 

Video of the talk:

https://www.youtube.com/watch?feature=player_embedded&v=ALhjzlNuZdA