%sh echo "Success will show 5 packets sent with 0 packet loss" ping -c 5 172.16.1.4 echo "Look for 0 packet loss" echo "You should see: (UNKNOWN) [IP] PORT (?) open" nc -nzv 172.16.1.4 9042 echo ""
Success will show 5 packets sent with 0 packet loss
PING 172.16.1.4 (172.16.1.4): 56 data bytes
64 bytes from 172.16.1.4: icmp_seq=0 ttl=63 time=2.287 ms
64 bytes from 172.16.1.4: icmp_seq=1 ttl=63 time=0.927 ms
64 bytes from 172.16.1.4: icmp_seq=2 ttl=63 time=1.078 ms
64 bytes from 172.16.1.4: icmp_seq=3 ttl=63 time=1.110 ms
64 bytes from 172.16.1.4: icmp_seq=4 ttl=63 time=1.128 ms
--- 172.16.1.4 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.927/1.306/2.287/0.496 ms
Look for 0 packet loss
You should see: (UNKNOWN) [IP] PORT (?) open
(UNKNOWN) [172.16.1.4] 9042 (?) open
val cassandraHostIP = "172.16.1.4" // Create script folder dbutils.fs.mkdirs("/databricks/scripts") // Add init script that adds the Cassandra hostname to all worker nodes dbutils.fs.put(s"/databricks/scripts/cassandra.sh", s""" #!/usr/bin/bash echo '[driver]."spark.cassandra.connection.host" = "$cassandraHostIP"' >> /home/ubuntu/databricks/common/conf/cassandra.conf """.trim, true)
Wrote 139 bytes.
sparkClusterName: String = DavesCassandraCluster
cassandraHostIP: String = 172.16.1.4
res2: Boolean = true
// Start with a system table val df = spark .read .format("org.apache.spark.sql.cassandra") .options(Map( "table" -> "roles", "keyspace" -> "system_auth")) .load df.explain
== Physical Plan ==
*(1) Scan org.apache.spark.sql.cassandra.CassandraSourceRelation@1737cc8f [role#30,can_login#31,is_superuser#32,member_of#33,salted_hash#34] PushedFilters: [], ReadSchema: struct<role:string,can_login:boolean,is_superuser:boolean,member_of:array<string>,salted_hash:str...
df: org.apache.spark.sql.DataFrame = [role: string, can_login: boolean ... 3 more fields]
Connecting Azure Databricks to Cassandra
Last refresh: Never