The SparkSession

We have been getting a lot of questions about the relationship between SparkContext, SQLContext, and HiveContext in Spark 1.x. It was really strange to have “HiveContext” as an entry point when people want to use the DataFrame API. In Spark 2.0, we are introducing SparkSession, a new entry point that subsumes SQLContext and HiveContext. For backward compatibility, we keep the Hive and SQL Contexts.

While the notebook below is in Scala, similar (nearly identical) APIs exist in Python and Java.

To read the companion blog post, click here: