2.0.2-db1 cluster image
Databricks released this image in November, 2016.
This release has been deprecated. For more information about the Databricks Runtime deprecation policy and schedule, see Databricks support lifecycles.
The following release notes provide information about the Spark 2.0.2-db1 cluster image powered by Apache Spark.
Apache Spark
2.0.2-db1 cluster image includes Apache Spark 2.0.2. Apache Spark 2.0.2 contains stability fixes, Apache Kafka support for Structured Streaming, and improved metrics for Structured Streaming. For more information, please see Apache Spark 2.0.2 release notes. 2.0.2-db1 cluster image also includes the following extra bug fixes and improvements:
SPARK-18280 [CORE]: Fix potential deadlock in StandaloneSchedulerBackend.dead.
SPARK-17703 [SQL]: Add unnamed version of addReferenceObj for minor objects.
SPARK-18137 [SQL]: Fix RewriteDistinctAggregates UnresolvedException when a UDAF has a foldable TypeCheck.
SPARK-17919 [R]: Make timeout to RBackend configurable in SparkR.
Changes and Improvements
Starting from this version, users can enable Spark Session Isolation when creating clusters. With Spark Session Isolation, different notebooks attached to a cluster are in different sessions with isolated runtime configurations and current database setting.
Operating system upgraded to Ubuntu 16.04.1 LTS from Ubuntu 15.10.
Java upgraded to 1.8.0_111 from 1.8.0_66-internal.
Python upgraded to 2.7.12 from 2.7.10.
Most pre-installed Python libraries upgraded. For the list of Python libraries and their version, see Pre-installed Python Libraries.
Fixed issues around unicode characters in Python notebooks
Performance improvement (better pipelining) on scanning files in S3.
Performance improvement on queries using percentile_approx.
Users can set
to enable spark session isolation. With Spark Session Isolation, different notebooks attached to a cluster are in different sessions with isolated runtime configurations and current database setting.Databricks redirects executor Log4j logs to stderr. Users can access stderr on the executor page and every worker page of the cluster. Worker pages will not show the log4j link and executors will not write logs to log4j files.
System Environment
Operating System: Ubuntu 16.04.1 LTS
Java: 1.8.0_111
Scala: 2.10.6 (Scala 2.10 cluster version)/2.11.8 (Scala 2.11 cluster version)
Python: 2.7.12
R: R version 3.2.3 (2015-12-10)
Pre-installed Python Libraries
Library |
Version |
Library |
Version |
Library |
Version |
ansi2html |
1.1.1 |
argparse |
1.2.1 |
boto |
2.42.0 |
boto3 |
1.4.1 |
botocore |
1.4.70 |
brewer2mpl |
1.4.1 |
certifi |
2016.2.28 |
cffi |
1.7.0 |
chardet |
2.3.0 |
colorama |
0.3.7 |
configobj |
5.0.6 |
cryptography |
1.5 |
cycler |
0.10.0 |
Cython |
0.24.1 |
decorator |
4.0.10 |
docutils |
0.12 |
enum34 |
1.1.6 |
et-xmlfile |
1.0.1 |
freetype-py |
1.0.2 |
funcsigs |
1.0.2 |
fusepy |
2.0.4 |
futures |
3.0.5 |
ggplot |
0.6.8 |
html5lib |
0.999 |
idna |
2.1 |
ipaddress |
1.0.16 |
ipython |
2.2.0 |
ipython-genutils |
0.1.0 |
jdcal |
1.2 |
Jinja2 |
2.8 |
jmespath |
0.9.0 |
llvmlite |
0.13.0 |
lxml |
3.6.4 |
MarkupSafe |
0.23 |
matplotlib |
1.5.3 |
mpld3 |
0.2 |
msgpack-python |
0.4.7 |
ndg-httpsclient |
0.3.3 |
numba |
0.28.1 |
numpy |
1.11.1 |
openpyxl |
2.3.2 |
pandas |
0.18.1 |
pathlib2 |
2.1.0 |
patsy |
0.4.1 |
pexpect |
4.0.1 |
pickleshare |
0.7.4 |
Pillow |
3.3.1 |
pip |
8.1.2 |
pkg_resources |
0.0.0 |
ply |
3.9 |
prompt-toolkit |
1.0.7 |
psycopg2 |
2.6.2 |
ptyprocess |
0.5.1 |
py4j |
0.10.3 |
pyasn1 |
0.1.9 |
pycparser |
2.14 |
Pygments |
2.1.3 |
PyGObject |
3.20.0 |
pyOpenSSL |
16.0.0 |
pyparsing |
2.1.4 |
pypng |
0.0.18 |
Python |
2.7.12 |
python-dateutil |
2.5.3 |
python-geohash |
0.8.5 |
pytz |
2016.6.1 |
requests |
2.11.1 |
s3transfer |
0.1.9 |
scikit-learn |
0.17.1 |
scipy |
0.18.1 |
scour |
0.32 |
seaborn |
0.7.1 |
setuptools |
28.6.0 |
simplejson |
3.8.2 |
simples3 |
1.0 |
singledispatch | |
six |
1.10.0 |
statsmodels |
0.6.1 |
traitlets |
4.3.0 |
urllib3 |
1.19.1 |
virtualenv |
15.0.1 |
wcwidth |
0.1.7 |
wheel |
0.30.0a0 |
wsgiref |
0.1.2 |
Pre-installed R Libraries
Library |
Version |
Library |
Version |
Library |
Version |
abind |
1.4-3 |
assertthat |
0.1 |
base |
3.2.3 |
BH |
1.60.0-2 |
bitops |
1.0-6 |
boot |
1.3-17 |
brew |
1.0-6 |
car |
2.1-3 |
caret |
6.0-71 |
chron |
2.3-47 |
class |
7.3-14 |
cluster |
2.0.5 |
codetools |
0.2-14 |
colorspace |
1.2-4 |
compiler |
3.2.3 |
crayon |
1.3.1 |
curl |
2.2 |
data.table |
1.9.6 |
datasets |
3.2.3 |
0.5-1 |
devtools |
1.12.0 |
dichromat |
2.0-0 |
digest |
0.6.9 |
doMC |
1.3.4 |
dplyr |
0.5.0 |
foreach |
1.4.3 |
foreign |
0.8-66 |
gbm |
2.1.1 |
ggplot2 |
2.1.0 |
git2r |
0.15.0 |
glmnet |
2.0-5 |
graphics |
3.2.3 |
grDevices |
3.2.3 |
grid |
3.2.3 |
gsubfn |
0.6-6 |
gtable |
0.1.2 |
h2o | |
httr |
1.2.1 |
hwriter |
1.3.2 |
hwriterPlus |
1.0-3 |
iterators |
1.0.8 |
jsonlite |
1.1 |
KernSmooth |
2.23-15 |
labeling |
0.3 |
lattice |
0.20-34 |
lazyeval |
0.2.0 |
littler |
0.3.0 |
lme4 |
1.1-12 |
lubridate |
1.6.0 |
magrittr |
1.5 |
mapproj |
1.2-4 |
maps |
3.0.2 |
7.3-45 |
Matrix |
1.2-7.1 |
MatrixModels |
0.4-1 |
memoise |
1.0.0 |
methods |
3.2.3 |
mgcv |
1.8-11 |
mime |
0.5 |
minqa |
1.2.4 |
multicore |
0.2 |
munsell |
0.4.2 |
mvtnorm |
1.0-5 |
nlme |
3.1-124 |
nloptr |
1.0.4 |
nnet |
7.3-12 |
openssl |
0.9.4 |
parallel |
3.2.3 |
pbkrtest |
0.4-6 |
pkgKitten |
0.1.3 |
plyr |
1.8.4 |
praise |
1.0.0 |
pROC |
1.8 |
proto |
0.3-10 |
quantreg |
5.29 |
R.methodsS3 |
1.7.1 |
R.oo |
1.20.0 |
R.utils |
2.4.0 |
R6 |
2.2.0 |
randomForest |
4.6-12 |
RColorBrewer |
1.1-2 |
Rcpp |
0.12.7 |
RcppEigen | |
RCurl |
1.95-4.8 |
reshape2 |
1.4.2 |
1.3-12 |
roxygen2 |
5.0.1 |
rpart |
4.1-10 |
Rserve |
1.7-3 |
RSQLite |
1.0.0 |
rstudioapi |
0.6 |
scales |
0.3.0 |
sp |
1.0-15 |
SparkR |
2.0.2 |
SparseM |
1.72 |
spatial |
7.3-11 |
splines |
3.2.3 |
sqldf |
0.4-10 |
statmod |
1.4.26 |
stats |
3.2.3 |
stats4 |
3.2.3 |
stringi |
1.0-1 |
stringr |
1.0.0 |
survival |
2.38-3 |
tcltk |
3.2.3 |
TeachingDemos |
2.10 |
testthat |
1.0.2 |
tibble |
1.2 |
tools |
3.2.3 |
utils |
3.2.3 |
whisker |
0.3-2 |
withr |
1.0.2 |