Neo4j is a native graph database that leverages data relationships as first-class entities. You can connect a cluster in Databricks to a Neo4j cluster using the neo4j-spark-connector, which offers Spark APIs for RDD, DataFrame, GraphX and GraphFrames. The neo4j-spark-connector uses the binary Bolt protocol to transfer data to and from the Neo4j server.
Neo4j can be deployed on various cloud providers: Azure, Digital Ocean, AWS EC2, etc.
Make sure the Neo4j password has been changed from default (you should be prompted when you first access Neo4j) and modify conf/neo4j.conf to accept remote connections. For more information see Configuring Neo4j Connectors.
# conf/neo4j.conf # Bolt connector dbms.connector.bolt.enabled=true #dbms.connector.bolt.tls_level=OPTIONAL dbms.connector.bolt.listen_address=0.0.0.0:7687 # HTTP Connector. There must be exactly one HTTP connector. dbms.connector.http.enabled=true #dbms.connector.http.listen_address=0.0.0.0:7474 # HTTPS Connector. There can be zero or one HTTPS connectors. dbms.connector.https.enabled=true #dbms.connector.https.listen_address=0.0.0.0:7473
If your Neo4J cluster is running in AWS and you’d like to use private IPs, see the VPC Peering guide.