Common Errors in Notebooks

There are some common issues that creep up when using notebooks. This section outlines some of the frequently asked questions and best practices that you should follow.

Spark job fails with java.lang.NoClassDefFoundError

Sometimes you may come across an error like:

java.lang.NoClassDefFoundError: Could not initialize class line.....$read$

This can occur with a Spark Scala 2.11 cluster and a Scala notebook, if you mix together a case class definition and Dataset/DataFrame operations in the same notebook cell, and later use the case class in a Spark job in a different cell. For example, in the first cell, say you define a case class MyClass and also created a Dataset.

case class MyClass(value: Int)

val dataset = spark.createDataset(Seq(1))

Then in a later cell, you create instances of MyClass inside a Spark job. { i => MyClass(i) }.count()

Solution: Move the case class definition to a cell of its own.

case class MyClass(value: Int)   // no other code in this cell
val dataset = spark.createDataset(Seq(1)) { i => MyClass(i) }.count()

Spark job fails with java.lang.UnsupportedOperationException

Sometimes you may come across an error like:

java.lang.UnsupportedOperationException: Accumulator must be registered before send to executor

This can occur with a Spark Scala 2.10 cluster and a Scala notebook. The reason and solution for this error is same as that of Spark job fails with java.lang.NoClassDefFoundError.