How to use GraphFrames on Databricks

This article includes example notebooks to help you get started using GraphFrames on Databricks. GraphFrames is a package for Apache Spark that provides DataFrame-based graphs. It provides high-level APIs in Java, Python, and Scala. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.

This article includes three example notebooks: a introductory notebook available in Python and in Scala, and a Python user guide. For additional examples using GraphFrames with Scala, see GraphFrames user guide - Scala.

Databricks Runtime recommendation for GraphFrames

Databricks recommends using a cluster running Databricks Runtime for Machine Learning, as it includes an optimized installation of GraphFrames.

If you are not using a cluster running Databricks Runtime ML, download the JAR file from the GraphFrames library, load it to a volume, and install it onto your cluster.

Get started with GraphFrames

The following notebooks show you how to use GraphFrames to perform graph analysis.

Graph Analysis with GraphFrames (Python)

Open notebook in new tab Open in Databricks

Graph Analysis with GraphFrames (Scala)

Open notebook in new tab Open in Databricks

GraphFrames user guide (Python)

The following notebook includes Python code examples of how to use GraphFrames.

GraphFrames Python notebook

Open notebook in new tab Open in Databricks

Databricks Runtime recommendation for GraphFrames​

Get started with GraphFrames​

Graph Analysis with GraphFrames (Python)

Graph Analysis with GraphFrames (Scala)

GraphFrames user guide (Python)​

GraphFrames Python notebook

Databricks Runtime recommendation for GraphFrames

Get started with GraphFrames

GraphFrames user guide (Python)