Neo4j

Experimental

The legacy query federation documentation has been retired and might not be updated. The configurations mentioned in this content are not officially endorsed or tested by Databricks. If Lakehouse Federation supports your source database, Databricks recommends using that instead.

Neo4j is a native graph database that leverages data relationships as first-class entities. You can connect a Databricks cluster to a Neo4j cluster using the neo4j-spark-connector, which offers Apache Spark APIs for RDD, DataFrame, and GraphFrames. The neo4j-spark-connector uses the binary Bolt protocol to transfer data to and from the Neo4j server.

This article describes how to deploy and configure Neo4j, and configure Databricks to access Neo4j.

Neo4j deployment and configuration

You can deploy Neo4j on various cloud providers.

To deploy Neo4j on AWS EC2 using a custom AMI follow the instructions in Hosting Neo4j on EC2 on AWS. For other options, see the official Neo4j cloud deployment guide. This guide assumes Neo4j 3.2.2.

Change the Neo4j password from the default (you should be prompted when you first access Neo4j) and modify conf/neo4j.conf to accept remote connections.

ini
# conf/neo4j.conf

# Bolt connector
dbms.connector.bolt.enabled=true
#dbms.connector.bolt.tls_level=OPTIONAL
dbms.connector.bolt.listen_address=0.0.0.0:7687

# HTTP Connector. There must be exactly one HTTP connector.
dbms.connector.http.enabled=true
#dbms.connector.http.listen_address=0.0.0.0:7474

# HTTPS Connector. There can be zero or one HTTPS connectors.
dbms.connector.https.enabled=true
#dbms.connector.https.listen_address=0.0.0.0:7473

For more information, see Configuring Neo4j Connectors.

Databricks configuration

If your Neo4j cluster is running in AWS and you want to use private IPs, see the VPC Peering guide.

Install two libraries: neo4j-spark-connector and graphframes as Spark Packages. See the libraries guide for instructions.

Create a cluster with these Spark configurations.

Bash
spark.neo4j.bolt.url bolt://<ip-of-neo4j-instance>:7687
spark.neo4j.bolt.user <username>
spark.neo4j.bolt.password <password>

Import libraries and test the connection.

Scala
import org.neo4j.spark._
import org.graphframes._

val neo = Neo4j(sc)

// Dummy Cypher query to check connection
val testConnection = neo.cypher("MATCH (n) RETURN n;").loadRdd[Long]

Neo4j deployment and configuration​

Databricks configuration​

Neo4j deployment and configuration

Databricks configuration