2.0.1-db1 Cluster Image

The following release notes provide information about the Spark 2.0.1-db1 cluster image powered by Apache Spark.

Apache Spark

In this release, Spark refers to Apache Spark 2.0.1. For more information, please see Apache Spark 2.0.1 release notes. This version includes the following extra bug fixes:

  • SPARK-17697 [ML]: Fixed bug in summary calculations that pattern match against label without casting
  • SPARK-17721 [ML][MLLIB]: Fix for multiplying transposed SparseMatrix with SparseVector
  • SPARK-17712 [SQL]: Fix invalid pushdown of data-independent filters beneath aggregates
  • SPARK-16343 [SQL]: Improve the PushDownPredicate rule to pushdown predicates correctly in non-deterministic condition.

Changes and Improvements

  • Fixed special characters escaping using \ in R notebooks.
  • For Scala 2.10 version, you can now use classes defined in one Scala notebook cell in another cell inside the same notebook.
  • Performance improvement (partial aggregation support) on queries using percentile_approx.
  • Apache Hadoop upgraded to 2.7.3 from 2.7.2.
  • The default value of mapreduce.fileoutputcommitter.algorithm.version changed from 1 to 2.

System Environment

  • Operating System: Ubuntu 15.10
  • Java: 1.8.0_66-internal
  • Scala: 2.10.6 (Scala 2.10 cluster version)/2.11.8 (Scala 2.11 cluster version)
  • Python: 2.7.10
  • R: R version 3.2.3 (2015-12-10)