2.0.1-db1 cluster image

Important

This release has been deprecated. For more information about the Databricks Runtime deprecation policy and schedule, see Databricks support lifecycles.

The following release notes provide information about the Spark 2.0.1-db1 cluster image powered by Apache Spark.

Apache Spark

In this release, Spark refers to Apache Spark 2.0.1. For more information, please see Apache Spark 2.0.1 release notes. This version includes the following extra bug fixes:

SPARK-17697 [ML]: Fixed bug in summary calculations that pattern match against label without casting
SPARK-17721 [ML][MLLIB]: Fix for multiplying transposed SparseMatrix with SparseVector
SPARK-17712 [SQL]: Fix invalid pushdown of data-independent filters beneath aggregates
SPARK-16343 [SQL]: Improve the PushDownPredicate rule to pushdown predicates correctly in non-deterministic condition.

Changes and Improvements

Fixed special characters escaping using \ in R notebooks.
For Scala 2.10 version, you can now use classes defined in one Scala notebook cell in another cell inside the same notebook.
Performance improvement (partial aggregation support) on queries using percentile_approx.
Apache Hadoop upgraded to 2.7.3 from 2.7.2.
The default value of mapreduce.fileoutputcommitter.algorithm.version changed from 1 to 2.

System Environment

Operating System: Ubuntu 15.10
Java: 1.8.0_66-internal
Scala: 2.10.6 (Scala 2.10 cluster version)/2.11.8 (Scala 2.11 cluster version)
Python: 2.7.10
R: R version 3.2.3 (2015-12-10)