// we use that class as a key in the group by val rdd = sc.parallelize(Array((TestKey(1L, "abd"), "dss"), (TestKey(2L, "ggs"), "dse"), (TestKey(1L, "abd"), "qrf"))) rdd.groupByKey().collect
rdd: org.apache.spark.rdd.RDD[(TestKey, String)] = ParallelCollectionRDD[11874] at parallelize at <console>:42
res10: Array[(TestKey, Iterable[String])] = Array((TestKey(2,ggs),CompactBuffer(dse)), (TestKey(1,abd),CompactBuffer(dss)), (TestKey(1,abd),CompactBuffer(qrf)))
package com.databricks.example case class TestKey(id: Long, str: String)
Warning: classes defined within packages cannot be redefined without a cluster restart.
Compilation successful.
import com.databricks.example val rdd = sc.parallelize(Array( (example.TestKey(1L, "abd"), "dss"), (example.TestKey(2L, "ggs"), "dse"), (example.TestKey(1L, "abd"), "qrf"))) rdd.groupByKey().collect
import com.databricks.example
rdd: org.apache.spark.rdd.RDD[(com.databricks.example.TestKey, String)] = ParallelCollectionRDD[11876] at parallelize at <console>:43
res12: Array[(com.databricks.example.TestKey, Iterable[String])] = Array((TestKey(2,ggs),CompactBuffer(dse)), (TestKey(1,abd),CompactBuffer(dss, qrf)))
package x.y.z val aNumber = 5 // won't work def functionThatWillNotWork(a: Int): Int = a + 1
Compilation failed.
package x.y.z object Utils { val aNumber = 5 // works! def functionThatWillWork(a: Int): Int = a + 1 }
Warning: classes defined within packages cannot be redefined without a cluster restart.
Compilation successful.
package x.y.zpackage import org.apache.spark.SparkContext case class IntArray(values: Array[Int]) class MyClass(sc: SparkContext) { def sparkSum(array: IntArray): Int = { sc.parallelize(array.values).reduce(_ + _) } } object MyClass { def sparkSum(sc: SparkContext, array: IntArray): Int = { sc.parallelize(array.values).reduce(_ + _) } }
Warning: classes defined within packages cannot be redefined without a cluster restart.
Compilation successful.
Package Cells
Package cells are special cells that get compiled when executed. These cells have no visibility with respect to the rest of the notebook. You may think of them as separate scala files.
This means that only
class
andobject
definitions may go inside this cell. You may not have any variable or function definitions lying around by itself. The following cell will not work.If you wish to use custom classes and/or objects defined within notebooks reliably in Spark, and across notebook sessions, you must use package cells to define those classes.
Unless you use package cells to define classes, you may also come across obscure bugs as follows:
Last refresh: Never