Databricks Runtime 3.0

Databricks released this image in early July, 2017.

Important

This release was deprecated on September 5, 2017. For more information about the Databricks Runtime deprecation policy and schedule, see Databricks Runtime Versions.

The following release notes provide information about the Databricks Runtime 3.0 powered by Apache Spark.

Changes and Improvements

Performance and DBIO

Databricks Runtime 3.0 includes a number of updates in DBIO that improve performance, data integrity, and security:

  • Higher S3 throughput: Improves read and write performance of your Spark jobs.
  • More efficient decoding: Boosts CPU efficiency when decoding nested parquet data structures like arrays and structs.
  • Data skipping: Allows users to leverage statistics on data files to prune files more effectively in query processing. This feature is experimental and APIs will likely be changed in the future. Please refer to the section of Data Skipping Index for more details.
  • Transactional writes to S3: Features transactional (atomic) writes (both appends and new writes) to S3. Speculation can be turned on safely.

As part of DBIO, Amazon Redshift connector enhancement includes:

  • Advanced push down into Redshift: Query fragments that contain limit, samples, and aggregations can now be pushed down into Redshift for execution to reduce data movement from Redshift clusters to Spark.
  • Automatic end-to-end encryption with Redshift: Data at rest and in transport can be encrypted automatically.

DBES

  • SQL ACLs Public Preview: Databricks is proud to announce the release of SQL access controls for Enterprise Customers. SQL ACLs allow administrators to set fine-grained permissions to tables, views and functions. This works perfectly for analysts that should have access to only a certain subset of the data. Please contact your account representative for pricing and how to get started with this feature.

Serverless

  • Improved multi-tenancy: When multiple users run workloads concurrently on the same cluster, Databricks Runtime 3.0 ensures that these users can get fair shares of the resources, so users running short, interactive queries are not blocked by users running large ETL jobs.
  • Auto scaling local storage: Databricks Runtime 3.0 can automatically configure local storage and scale them on demand. Users no longer need to estimate and provision EBS volumes.
  • Improved auto-scaling stability.

Streaming

Other Features and Improvements

  • Added Higher Order Functions support for manipulating array data.
  • Performance improvements for window functions.
  • Added a new CSV/JSON option badRecordsPath that provides a unified interface for handling both corrupt records and files to facilitate data cleaning and ETL. With this feature, users can obtain the exception records, reasons and time from the exception logs without interrupting the jobs.

Environment

  • Upgrade Ubuntu to 16.04.2 LTS.
  • Upgraded AWS SDK to 1.11.126.
  • Upgraded Jackson JSON library to 2.6.5.
  • Upgraded Python library futures to 3.1.1, pyparsing to 2.2.0, and setuptools to 36.0.1.

Apache Spark

The Databricks Runtime 3.0 includes Apache Spark 2.2.0. Apache Spark 2.2.0 is the third release on the 2.x line. This release removes the experimental tag from Structured Streaming. In addition, this release focuses more on usability, stability, and polish, resolving over 1100 tickets.

Core and Spark SQL

Structured Streaming

  • General Avaliability
    • [SPARK-20844] The Structured Streaming APIs are now GA and is no longer labeled experimental
  • Kafka Improvements
    • [SPARK-19719] Support for reading and writing data in streaming or batch to/from Apache Kafka
    • [SPARK-19968] Cached producer for lower latency kafka to kafka streams.
  • API updates
    • [SPARK-19067] Support for complex stateful processing and timeouts using [flat]MapGroupsWithState
    • [SPARK-19876] Support for one time triggers
  • Other notable changes

MLlib

  • New algorithms in DataFrame-based API
    • [SPARK-14709]: LinearSVC (Linear SVM Classifier) (Scala/Java/Python/R)
    • [SPARK-19635]: ChiSquare test in DataFrame-based API (Scala/Java/Python)
    • [SPARK-19636]: Correlation in DataFrame-based API (Scala/Java/Python)
    • [SPARK-13568]: Imputer feature transformer for imputing missing values (Scala/Java/Python)
    • [SPARK-18929]: Add Tweedie distribution for GLMs (Scala/Java/Python/R)
    • [SPARK-14503]: FPGrowth frequent pattern mining and AssociationRules (Scala/Java/Python/R)
  • Existing algorithms added to Python and R APIs
  • Other major features
    • [SPARK-19535]: ALSModel recommendForAllUsers, recommendForAllItems (Scala/Java/Python)
    • [SPARK-14489]: ALS coldStartStrategy Param for supporting new/unknown user and item IDs, and for improved compatibility with model tuning (CrossValidator and TrainValidationSplit)
    • [SPARK-20047]: Box-constrained Logistic Regression
  • Minor features
    • [SPARK-14567]: Add instrumentation logs to MLlib training algorithms
    • [SPARK-17471]: Matrix functionality: methods to compress, foreachActive, convert to col/row-major format
    • [SPARK-14272]: Add LogLikelihood to GaussianMixtureModel summary
    • [SPARK-19282]: RandomForestRegressionModel should expose getMaxDepth in R
    • [SPARK-17629]: local version of Word2Vec findSynonyms
    • [SPARK-17645]: ChiSqSelector adds support for feature selection using False Discovery Rate (FDR) and Family Wise Error rate (FWE)
    • [SPARK-14975]: GBTClassificationModel, GBTRegressionModel predict class conditional probabilities
  • Dependency changes
  • Major bug fixes
    • [SPARK-19110]: DistributedLDAModel.logPrior correctness fix
    • [SPARK-17975]: EMLDAOptimizer fails with ClassCastException (caused by GraphX checkpointing bug)
    • [SPARK-18715]: Fix wrong AIC calculation in Binomial GLM
    • [SPARK-16473]: BisectingKMeans failing during training with “java.util.NoSuchElementException: key not found” for certain inputs
    • [SPARK-19348]: pyspark.ml.Pipeline gets corrupted under multi-threaded use
  • Minor bug fixes
    • [SPARK-18274]: Memory leak in PySpark StringIndexer (follow-up to major fix)
    • [SPARK-19985]: Some ML Models fail to copy or do not set parent correctly
    • [SPARK-14772]: Make Python ML Params.copy treatment of uid and ParamMaps match Scala
    • [SPARK-19400]: GLM fails for intercept-only model
    • [SPARK-20214]: PySpark Vector conversion from scipy.sparse.dok_matrix fails to sort - indices
    • [SPARK-20423]: fix MLOR coeffs centering when regParam is 0
    • [SPARK-18374]: Incorrect words in StopWords/english.txt for StopWordsRemover
    • [SPARK-18036]: Decision Tree edge case improvements, including constant features
    • [SPARK-20615]: SparseVector.argmax edge case
  • Optimizations
  • Improved null and type handling

GraphX

  • Bug fixes
    • [SPARK-18847]: PageRank gives incorrect results for graphs with sinks
    • [SPARK-14804]: Graph vertexRDD/EdgeRDD checkpoint results ClassCastException
  • Optimizations
    • [SPARK-18845]: PageRank initial value improvement for faster convergence
    • [SPARK-5484]: Pregel should checkpoint periodically to avoid StackOverflowError

SparkR

  • New features
    • Spark SQL in R
    • MLlib in R
      • New algorithm APIs are listed in the MLlib section.
      • Fixes
        • [SPARK-19282]: RandomForestRegressionModel should expose getMaxDepth in R
        • [SPARK-19291]: spark.gaussianMixture should provide log-likelihood
  • Bug fixes
    • Spark Core / SQL
      • [SPARK-19925]: SparkR spark.getSparkFiles fails on executor
      • [SPARK-19342]: DataType Timestamp is converted to numeric in collect method
      • [SPARK-19232]: SparkR distribution cache location is wrong on Windows
    • MLlib
      • [SPARK-19395]: Model coefficients should be matrix type
      • [SPARK-19133]: SparkR GLM: allow specifying Gamma family
      • [SPARK-19319]: SparkR K-Means summary returns error when fewer clusters are found than requested
      • [SPARK-19066]: SparkR LDA does not set optimizer correctly. This is technically a behavior change, but it changes the behavior to match the documentation and Scala/Java/Python.
  • Other improvements
    • [SPARK-17838]: Strict type checking for arguments with better messages
    • [SPARK-19324]: Do not drop JVM stdout output in SparkR
    • [SPARK-19142]: spark.kmeans should take seed, initSteps, and tol as parameters

Deprecations

  • MLlib
    • [SPARK-18613]: spark.ml LDA classes should not expose spark.mllib in APIs. In spark.ml.LDAModel, deprecated oldLocalModel and getModel.
  • SparkR

Changes of Behavior

  • MLlib
    • [SPARK-19787]: DeveloperApi ALS.train() uses default regParam value 0.1 instead of 1.0, in order to match regular ALS API’s default regParam setting.
  • SPARKR
    • [SPARK-19291]: This added log-likelihood for SparkR Gaussian Mixture Models, but doing so introduced a SparkR model persistence incompatibility: Gaussian Mixture Models saved from SparkR 2.1 may not be loaded into SparkR 2.2. We plan to put in place backwards compatibility guarantees for SparkR in the future.

Known Issues

  • Log links on the executor page are not set correctly. Please use the worker page to access stdout and stderr links of an executor for now.

System Environment

  • Operating System: Ubuntu 16.04.2 LTS
  • Java: 1.8.0_131
  • Scala: 2.10.6 (Scala 2.10 cluster version)/2.11.8 (Scala 2.11 cluster version)
  • Python: 2.7.12 (or 3.5.2 if Python 3 support is enabled)
  • R: R version 3.2.3 (2015-12-10)

Pre-installed Python Libraries

Library Version Library Version Library Version
ansi2html 1.1.1 argparse 1.2.1 boto 2.42.0
boto3 1.4.1 botocore 1.4.70 brewer2mpl 1.4.1
certifi 2016.2.28 cffi 1.7.0 chardet 2.3.0
colorama 0.3.7 configobj 5.0.6 cryptography 1.5
cycler 0.10.0 Cython 0.24.1 decorator 4.0.10
docutils 0.13.1 enum34 1.1.6 et-xmlfile 1.0.1
freetype-py 1.0.2 funcsigs 1.0.2 fusepy 2.0.4
futures 3.1.1 ggplot 0.6.8 html5lib 0.999
idna 2.1 ipaddress 1.0.16 ipython 2.2.0
ipython-genutils 0.1.0 jdcal 1.2 Jinja2 2.8
jmespath 0.9.0 llvmlite 0.13.0 lxml 3.6.4
MarkupSafe 0.23 matplotlib 1.5.3 mpld3 0.2
msgpack-python 0.4.7 ndg-httpsclient 0.3.3 numba 0.28.1
numpy 1.11.1 openpyxl 2.3.2 pandas 0.18.1
pathlib2 2.1.0 patsy 0.4.1 pexpect 4.0.1
pickleshare 0.7.4 Pillow 3.3.1 pip 9.0.1
ply 3.9 prompt-toolkit 1.0.7 psycopg2 2.6.2
ptyprocess 0.5.1 py4j 0.10.3 pyasn1 0.1.9
pycparser 2.14 Pygments 2.1.3 PyGObject 3.20.0
pyOpenSSL 16.0.0 pyparsing 2.2.0 pypng 0.0.18
Python 2.7.12 python-dateutil 2.5.3 python-geohash 0.8.5
pytz 2016.6.1 requests 2.11.1 s3transfer 0.1.9
scikit-learn 0.17.1 scipy 0.18.1 scour 0.32
seaborn 0.7.1 setuptools 36.0.1 simplejson 3.8.2
simples3 1.0 singledispatch 3.4.0.3 six 1.10.0
statsmodels 0.6.1 traitlets 4.3.0 urllib3 1.19.1
virtualenv 15.0.1 wcwidth 0.1.7 wheel 0.30.0a0
wsgiref 0.1.2        

Pre-installed R Libraries

Library Version Library Version Library Version
abind 1.4-3 assertthat 0.1 base 3.2.3
BH 1.60.0-2 bitops 1.0-6 boot 1.3-17
brew 1.0-6 car 2.1-3 caret 6.0-71
chron 2.3-47 class 7.3-14 cluster 2.0.5
codetools 0.2-14 colorspace 1.2-4 compiler 3.2.3
crayon 1.3.1 curl 2.2 data.table 1.9.6
datasets 3.2.3 DBI 0.5-1 devtools 1.12.0
dichromat 2.0-0 digest 0.6.9 doMC 1.3.4
dplyr 0.5.0 foreach 1.4.3 foreign 0.8-66
gbm 2.1.1 ggplot2 2.1.0 git2r 0.15.0
glmnet 2.0-5 graphics 3.2.3 grDevices 3.2.3
grid 3.2.3 gsubfn 0.6-6 gtable 0.1.2
h2o 3.10.0.8 httr 1.2.1 hwriter 1.3.2
hwriterPlus 1.0-3 iterators 1.0.8 jsonlite 1.1
KernSmooth 2.23-15 labeling 0.3 lattice 0.20-34
lazyeval 0.2.0 littler 0.3.0 lme4 1.1-12
lubridate 1.6.0 magrittr 1.5 mapproj 1.2-4
maps 3.0.2 MASS 7.3-45 Matrix 1.2-7.1
MatrixModels 0.4-1 memoise 1.0.0 methods 3.2.3
mgcv 1.8-11 mime 0.5 minqa 1.2.4
multicore 0.2 munsell 0.4.2 mvtnorm 1.0-5
nlme 3.1-124 nloptr 1.0.4 nnet 7.3-12
openssl 0.9.4 parallel 3.2.3 pbkrtest 0.4-6
pkgKitten 0.1.3 plyr 1.8.4 praise 1.0.0
pROC 1.8 proto 0.3-10 quantreg 5.29
R.methodsS3 1.7.1 R.oo 1.20.0 R.utils 2.4.0
R6 2.2.0 randomForest 4.6-12 RColorBrewer 1.1-2
Rcpp 0.12.7 RcppEigen 0.3.2.9.0 RCurl 1.95-4.8
reshape2 1.4.2 RODBC 1.3-12 roxygen2 5.0.1
rpart 4.1-10 Rserve 1.7-3 RSQLite 1.0.0
rstudioapi 0.6 scales 0.3.0 sp 1.0-15
SparkR 2.2.0 SparseM 1.72 spatial 7.3-11
splines 3.2.3 sqldf 0.4-10 statmod 1.4.26
stats 3.2.3 stats4 3.2.3 stringi 1.0-1
stringr 1.0.0 survival 2.38-3 tcltk 3.2.3
TeachingDemos 2.10 testthat 1.0.2 tibble 1.2
tools 3.2.3 utils 3.2.3 whisker 0.3-2
withr 1.0.2        

Pre-installed Java and Scala libraries (Scala 2.10 cluster version)

Group ID Artifact ID Version
antlr antlr 2.7.7
com.amazonaws amazon-kinesis-client 1.7.3
com.amazonaws aws-java-sdk-autoscaling 1.11.126
com.amazonaws aws-java-sdk-cloudformation 1.11.126
com.amazonaws aws-java-sdk-cloudfront 1.11.126
com.amazonaws aws-java-sdk-cloudhsm 1.11.126
com.amazonaws aws-java-sdk-cloudsearch 1.11.126
com.amazonaws aws-java-sdk-cloudtrail 1.11.126
com.amazonaws aws-java-sdk-cloudwatch 1.11.126
com.amazonaws aws-java-sdk-cloudwatchmetrics 1.11.126
com.amazonaws aws-java-sdk-codedeploy 1.11.126
com.amazonaws aws-java-sdk-cognitoidentity 1.11.126
com.amazonaws aws-java-sdk-cognitosync 1.11.126
com.amazonaws aws-java-sdk-config 1.11.126
com.amazonaws aws-java-sdk-core 1.11.126
com.amazonaws aws-java-sdk-datapipeline 1.11.126
com.amazonaws aws-java-sdk-directconnect 1.11.126
com.amazonaws aws-java-sdk-directory 1.11.126
com.amazonaws aws-java-sdk-dynamodb 1.11.126
com.amazonaws aws-java-sdk-ec2 1.11.126
com.amazonaws aws-java-sdk-ecs 1.11.126
com.amazonaws aws-java-sdk-efs 1.11.126
com.amazonaws aws-java-sdk-elasticache 1.11.126
com.amazonaws aws-java-sdk-elasticbeanstalk 1.11.126
com.amazonaws aws-java-sdk-elasticloadbalancing 1.11.126
com.amazonaws aws-java-sdk-elastictranscoder 1.11.126
com.amazonaws aws-java-sdk-emr 1.11.126
com.amazonaws aws-java-sdk-glacier 1.11.126
com.amazonaws aws-java-sdk-iam 1.11.126
com.amazonaws aws-java-sdk-importexport 1.11.126
com.amazonaws aws-java-sdk-kinesis 1.11.126
com.amazonaws aws-java-sdk-kms 1.11.126
com.amazonaws aws-java-sdk-lambda 1.11.126
com.amazonaws aws-java-sdk-logs 1.11.126
com.amazonaws aws-java-sdk-machinelearning 1.11.126
com.amazonaws aws-java-sdk-opsworks 1.11.126
com.amazonaws aws-java-sdk-rds 1.11.126
com.amazonaws aws-java-sdk-redshift 1.11.126
com.amazonaws aws-java-sdk-route53 1.11.126
com.amazonaws aws-java-sdk-s3 1.11.126
com.amazonaws aws-java-sdk-ses 1.11.126
com.amazonaws aws-java-sdk-simpledb 1.11.126
com.amazonaws aws-java-sdk-simpleworkflow 1.11.126
com.amazonaws aws-java-sdk-sns 1.11.126
com.amazonaws aws-java-sdk-sqs 1.11.126
com.amazonaws aws-java-sdk-ssm 1.11.126
com.amazonaws aws-java-sdk-storagegateway 1.11.126
com.amazonaws aws-java-sdk-sts 1.11.126
com.amazonaws aws-java-sdk-support 1.11.126
com.amazonaws aws-java-sdk-swf-libraries 1.11.22
com.amazonaws aws-java-sdk-workspaces 1.11.126
com.amazonaws jmespath-java 1.11.126
com.chuusai shapeless_2.10 2.3.2
com.clearspring.analytics stream 2.7.0
com.databricks Rserve 1.8-3
com.databricks dbml-local_2.10 0.1.2-spark2.1
com.databricks dbml-local_2.10-tests 0.1.2-spark2.1
com.databricks jets3t 0.7.1-0
com.databricks.scalapb compilerplugin_2.10 0.4.15-9
com.databricks.scalapb scalapb-runtime_2.10 0.4.15-9
com.esotericsoftware kryo-shaded 3.0.3
com.esotericsoftware minlog 1.3.0
com.fasterxml classmate 1.0.0
com.fasterxml.jackson.core jackson-annotations 2.6.5
com.fasterxml.jackson.core jackson-core 2.6.5
com.fasterxml.jackson.core jackson-databind 2.6.5
com.fasterxml.jackson.dataformat jackson-dataformat-cbor 2.6.5
com.fasterxml.jackson.datatype jackson-datatype-joda 2.6.5
com.fasterxml.jackson.module jackson-module-paranamer 2.6.5
com.fasterxml.jackson.module jackson-module-scala_2.10 2.6.5
com.github.fommil jniloader 1.1
com.github.fommil.netlib core 1.1.2
com.github.fommil.netlib native_ref-java 1.1
com.github.fommil.netlib native_ref-java-natives 1.1
com.github.fommil.netlib native_system-java 1.1
com.github.fommil.netlib native_system-java-natives 1.1
com.github.fommil.netlib netlib-native_ref-linux-x86_64-natives 1.1
com.github.fommil.netlib netlib-native_system-linux-x86_64-natives 1.1
com.github.rwl jtransforms 2.4.0
com.google.code.findbugs jsr305 2.0.1
com.google.code.gson gson 2.2.4
com.google.guava guava 15.0
com.google.protobuf protobuf-java 2.6.1
com.googlecode.javaewah JavaEWAH 0.3.2
com.h2database h2 1.3.174
com.jamesmurty.utils java-xmlbuilder 1.0
com.jcraft jsch 0.1.50
com.jolbox bonecp 0.8.0.RELEASE
com.mchange c3p0 0.9.5.1
com.mchange mchange-commons-java 0.2.10
com.microsoft.azure azure-data-lake-store-sdk 2.0.11
com.microsoft.azure azure-storage 2.0.0
com.ning compress-lzf 1.0.3
com.sun.mail javax.mail 1.5.2
com.thoughtworks.paranamer paranamer 2.6
com.trueaccord.lenses lenses_2.10 0.3
com.twitter chill-java 0.8.0
com.twitter chill_2.10 0.8.0
com.twitter parquet-hadoop-bundle 1.6.0
com.twitter util-app_2.10 6.23.0
com.twitter util-core_2.10 6.23.0
com.twitter util-jvm_2.10 6.23.0
com.typesafe config 1.2.1
com.typesafe scalalogging-slf4j_2.10 1.1.0
com.univocity univocity-parsers 2.2.1
com.zaxxer HikariCP 2.4.1
commons-beanutils commons-beanutils 1.7.0
commons-beanutils commons-beanutils-core 1.8.0
commons-cli commons-cli 1.2
commons-codec commons-codec 1.10
commons-collections commons-collections 3.2.2
commons-configuration commons-configuration 1.6
commons-dbcp commons-dbcp 1.4
commons-digester commons-digester 1.8
commons-httpclient commons-httpclient 3.1
commons-io commons-io 2.4
commons-lang commons-lang 2.6
commons-logging commons-logging 1.1.3
commons-net commons-net 2.2
commons-pool commons-pool 1.5.4
info.ganglia.gmetric4j gmetric4j 1.0.7
io.dropwizard.metrics metrics-core 3.1.2
io.dropwizard.metrics metrics-ganglia 3.1.2
io.dropwizard.metrics metrics-graphite 3.1.2
io.dropwizard.metrics metrics-healthchecks 3.1.2
io.dropwizard.metrics metrics-jetty9 3.1.2
io.dropwizard.metrics metrics-json 3.1.2
io.dropwizard.metrics metrics-jvm 3.1.2
io.dropwizard.metrics metrics-log4j 3.1.2
io.dropwizard.metrics metrics-servlets 3.1.2
io.netty netty 3.9.9.Final
io.netty netty-all 4.0.43.Final
io.prometheus simpleclient 0.0.16
io.prometheus simpleclient_common 0.0.16
io.prometheus simpleclient_dropwizard 0.0.16
io.prometheus simpleclient_servlet 0.0.16
io.prometheus.jmx collector 0.7
javax.activation activation 1.1.1
javax.annotation javax.annotation-api 1.2
javax.el javax.el-api 2.2.4
javax.jdo jdo-api 3.0.1
javax.servlet javax.servlet-api 3.1.0
javax.servlet.jsp jsp-api 2.1
javax.transaction jta 1.1
javax.validation validation-api 1.1.0.Final
javax.ws.rs javax.ws.rs-api 2.0.1
javax.xml.bind jaxb-api 2.2.2
javax.xml.stream stax-api 1.0-2
javolution javolution 5.5.1
jline jline 2.11
joda-time joda-time 2.9.3
log4j apache-log4j-extras 1.2.17
log4j log4j 1.2.17
mx4j mx4j 3.0.2
mysql mysql-connector-java 5.1.27
net.hydromatic eigenbase-properties 1.1.5
net.iharder base64 2.3.8
net.java.dev.jets3t jets3t 0.9.3
net.jpountz.lz4 lz4 1.3.0
net.razorvine pyrolite 4.13
net.sf.jpam jpam 1.1
net.sf.opencsv opencsv 2.3
net.sf.py4j py4j 0.10.4
net.sf.supercsv super-csv 2.2.0
net.sourceforge.f2j arpack_combined_all 0.1
org.acplt oncrpc 1.0.7
org.antlr ST4 4.0.4
org.antlr antlr-runtime 3.4
org.antlr antlr4-runtime 4.5.3
org.antlr stringtemplate 3.2.1
org.apache.ant ant 1.9.2
org.apache.ant ant-jsch 1.9.2
org.apache.ant ant-launcher 1.9.2
org.apache.avro avro 1.7.7
org.apache.avro avro-ipc 1.7.7
org.apache.avro avro-ipc-tests 1.7.7
org.apache.avro avro-mapred-hadoop2 1.7.7
org.apache.calcite calcite-avatica 1.2.0-incubating
org.apache.calcite calcite-core 1.2.0-incubating
org.apache.calcite calcite-linq4j 1.2.0-incubating
org.apache.commons commons-compress 1.4.1
org.apache.commons commons-crypto 1.0.0
org.apache.commons commons-lang3 3.5
org.apache.commons commons-math3 3.4.1
org.apache.curator curator-client 2.6.0
org.apache.curator curator-framework 2.6.0
org.apache.curator curator-recipes 2.6.0
org.apache.derby derby 10.10.2.0
org.apache.directory.api api-asn1-api 1.0.0-M20
org.apache.directory.api api-util 1.0.0-M20
org.apache.directory.server apacheds-i18n 2.0.0-M15
org.apache.directory.server apacheds-kerberos-codec 2.0.0-M15
org.apache.hadoop hadoop-annotations 2.7.3
org.apache.hadoop hadoop-auth 2.7.3
org.apache.hadoop hadoop-azure 2.7.3
org.apache.hadoop hadoop-client 2.7.3
org.apache.hadoop hadoop-common 2.7.3
org.apache.hadoop hadoop-hdfs 2.7.3
org.apache.hadoop hadoop-mapreduce-client-app 2.7.3
org.apache.hadoop hadoop-mapreduce-client-common 2.7.3
org.apache.hadoop hadoop-mapreduce-client-core 2.7.3
org.apache.hadoop hadoop-mapreduce-client-jobclient 2.7.3
org.apache.hadoop hadoop-mapreduce-client-shuffle 2.7.3
org.apache.hadoop hadoop-yarn-api 2.7.3
org.apache.hadoop hadoop-yarn-client 2.7.3
org.apache.hadoop hadoop-yarn-common 2.7.3
org.apache.hadoop hadoop-yarn-server-common 2.7.3
org.apache.htrace htrace-core 3.1.0-incubating
org.apache.httpcomponents httpclient 4.5.2
org.apache.httpcomponents httpcore 4.4.4
org.apache.ivy ivy 2.4.0
org.apache.parquet parquet-column 1.8.2
org.apache.parquet parquet-common 1.8.2
org.apache.parquet parquet-encoding 1.8.2
org.apache.parquet parquet-format 2.3.1
org.apache.parquet parquet-hadoop 1.8.2
org.apache.parquet parquet-jackson 1.8.2
org.apache.thrift libfb303 0.9.3
org.apache.thrift libthrift 0.9.3
org.apache.xbean xbean-asm5-shaded 4.4
org.apache.zookeeper zookeeper 3.4.6
org.bouncycastle bcprov-jdk15on 1.51
org.codehaus.jackson jackson-core-asl 1.9.13
org.codehaus.jackson jackson-jaxrs 1.9.13
org.codehaus.jackson jackson-mapper-asl 1.9.13
org.codehaus.jackson jackson-xc 1.9.13
org.codehaus.janino commons-compiler 3.0.0
org.codehaus.janino janino 3.0.0
org.datanucleus datanucleus-api-jdo 3.2.6
org.datanucleus datanucleus-core 3.2.10
org.datanucleus datanucleus-rdbms 3.2.9
org.eclipse.jetty jetty-client 9.3.11.v20160721
org.eclipse.jetty jetty-continuation 9.3.11.v20160721
org.eclipse.jetty jetty-http 9.3.11.v20160721
org.eclipse.jetty jetty-io 9.3.11.v20160721
org.eclipse.jetty jetty-jndi 9.3.11.v20160721
org.eclipse.jetty jetty-plus 9.3.11.v20160721
org.eclipse.jetty jetty-proxy 9.3.11.v20160721
org.eclipse.jetty jetty-security 9.3.11.v20160721
org.eclipse.jetty jetty-server 9.3.11.v20160721
org.eclipse.jetty jetty-servlet 9.3.11.v20160721
org.eclipse.jetty jetty-servlets 9.3.11.v20160721
org.eclipse.jetty jetty-util 9.3.11.v20160721
org.eclipse.jetty jetty-webapp 9.3.11.v20160721
org.eclipse.jetty jetty-xml 9.3.11.v20160721
org.fusesource.jansi jansi 1.4
org.fusesource.leveldbjni leveldbjni-all 1.8
org.glassfish.hk2 hk2-api 2.4.0-b34
org.glassfish.hk2 hk2-locator 2.4.0-b34
org.glassfish.hk2 hk2-utils 2.4.0-b34
org.glassfish.hk2 osgi-resource-locator 1.0.1
org.glassfish.hk2.external aopalliance-repackaged 2.4.0-b34
org.glassfish.hk2.external javax.inject 2.4.0-b34
org.glassfish.jersey.bundles.repackaged jersey-guava 2.22.2
org.glassfish.jersey.containers jersey-container-servlet 2.22.2
org.glassfish.jersey.containers jersey-container-servlet-core 2.22.2
org.glassfish.jersey.core jersey-client 2.22.2
org.glassfish.jersey.core jersey-common 2.22.2
org.glassfish.jersey.core jersey-server 2.22.2
org.glassfish.jersey.media jersey-media-jaxb 2.22.2
org.hibernate hibernate-validator 5.1.1.Final
org.iq80.snappy snappy 0.2
org.javassist javassist 3.18.1-GA
org.jboss.logging jboss-logging 3.1.3.GA
org.jdbi jdbi 2.63.1
org.joda joda-convert 1.7
org.jodd jodd-core 3.5.2
org.jpmml pmml-model 1.2.15
org.jpmml pmml-schema 1.2.15
org.json4s json4s-ast_2.10 3.2.11
org.json4s json4s-core_2.10 3.2.11
org.json4s json4s-jackson_2.10 3.2.11
org.mockito mockito-all 1.9.5
org.objenesis objenesis 2.1
org.postgresql postgresql 9.4-1204-jdbc41
org.roaringbitmap RoaringBitmap 0.5.11
org.rosuda.REngine REngine 2.1.0
org.scala-lang jline 2.10.6
org.scala-lang scala-compiler_2.10 2.10.6
org.scala-lang scala-library_2.10 2.10.6
org.scala-lang scala-reflect_2.10 2.10.6
org.scala-lang scalap_2.10 2.10.6
org.scala-sbt test-interface 1.0
org.scalacheck scalacheck_2.10 1.12.5
org.scalamacros quasiquotes_2.10 2.0.0
org.scalanlp breeze-macros_2.10 0.13.1
org.scalanlp breeze_2.10 0.13.1
org.scalatest scalatest_2.10 2.2.6
org.slf4j jcl-over-slf4j 1.7.16
org.slf4j jul-to-slf4j 1.7.16
org.slf4j slf4j-api 1.7.16
org.slf4j slf4j-log4j12 1.7.16
org.spark-project.hive hive-beeline 1.2.1.spark2
org.spark-project.hive hive-cli 1.2.1.spark2
org.spark-project.hive hive-exec 1.2.1.spark2
org.spark-project.hive hive-jdbc 1.2.1.spark2
org.spark-project.hive hive-metastore 1.2.1.spark2
org.spark-project.spark unused 1.0.0
org.spire-math spire-macros_2.10 0.13.0
org.spire-math spire_2.10 0.13.0
org.springframework spring-core 4.1.4.RELEASE
org.springframework spring-test 4.1.4.RELEASE
org.tukaani xz 1.0
org.typelevel machinist_2.10 0.6.1
org.typelevel macro-compat_2.10 1.1.1
org.xerial sqlite-jdbc 3.8.11.2
org.xerial.snappy snappy-java 1.1.2.6
org.yaml snakeyaml 1.16
oro oro 2.0.8
software.amazon.ion ion-java 1.0.2
stax stax-api 1.0.1
xmlenc xmlenc 0.52

Pre-installed Java and Scala libraries (Scala 2.11 cluster version)

Group ID Artifact ID Version
antlr antlr 2.7.7
com.amazonaws amazon-kinesis-client 1.7.3
com.amazonaws aws-java-sdk-autoscaling 1.11.126
com.amazonaws aws-java-sdk-cloudformation 1.11.126
com.amazonaws aws-java-sdk-cloudfront 1.11.126
com.amazonaws aws-java-sdk-cloudhsm 1.11.126
com.amazonaws aws-java-sdk-cloudsearch 1.11.126
com.amazonaws aws-java-sdk-cloudtrail 1.11.126
com.amazonaws aws-java-sdk-cloudwatch 1.11.126
com.amazonaws aws-java-sdk-cloudwatchmetrics 1.11.126
com.amazonaws aws-java-sdk-codedeploy 1.11.126
com.amazonaws aws-java-sdk-cognitoidentity 1.11.126
com.amazonaws aws-java-sdk-cognitosync 1.11.126
com.amazonaws aws-java-sdk-config 1.11.126
com.amazonaws aws-java-sdk-core 1.11.126
com.amazonaws aws-java-sdk-datapipeline 1.11.126
com.amazonaws aws-java-sdk-directconnect 1.11.126
com.amazonaws aws-java-sdk-directory 1.11.126
com.amazonaws aws-java-sdk-dynamodb 1.11.126
com.amazonaws aws-java-sdk-ec2 1.11.126
com.amazonaws aws-java-sdk-ecs 1.11.126
com.amazonaws aws-java-sdk-efs 1.11.126
com.amazonaws aws-java-sdk-elasticache 1.11.126
com.amazonaws aws-java-sdk-elasticbeanstalk 1.11.126
com.amazonaws aws-java-sdk-elasticloadbalancing 1.11.126
com.amazonaws aws-java-sdk-elastictranscoder 1.11.126
com.amazonaws aws-java-sdk-emr 1.11.126
com.amazonaws aws-java-sdk-glacier 1.11.126
com.amazonaws aws-java-sdk-iam 1.11.126
com.amazonaws aws-java-sdk-importexport 1.11.126
com.amazonaws aws-java-sdk-kinesis 1.11.126
com.amazonaws aws-java-sdk-kms 1.11.126
com.amazonaws aws-java-sdk-lambda 1.11.126
com.amazonaws aws-java-sdk-logs 1.11.126
com.amazonaws aws-java-sdk-machinelearning 1.11.126
com.amazonaws aws-java-sdk-opsworks 1.11.126
com.amazonaws aws-java-sdk-rds 1.11.126
com.amazonaws aws-java-sdk-redshift 1.11.126
com.amazonaws aws-java-sdk-route53 1.11.126
com.amazonaws aws-java-sdk-s3 1.11.126
com.amazonaws aws-java-sdk-ses 1.11.126
com.amazonaws aws-java-sdk-simpledb 1.11.126
com.amazonaws aws-java-sdk-simpleworkflow 1.11.126
com.amazonaws aws-java-sdk-sns 1.11.126
com.amazonaws aws-java-sdk-sqs 1.11.126
com.amazonaws aws-java-sdk-ssm 1.11.126
com.amazonaws aws-java-sdk-storagegateway 1.11.126
com.amazonaws aws-java-sdk-sts 1.11.126
com.amazonaws aws-java-sdk-support 1.11.126
com.amazonaws aws-java-sdk-swf-libraries 1.11.22
com.amazonaws aws-java-sdk-workspaces 1.11.126
com.amazonaws jmespath-java 1.11.126
com.chuusai shapeless_2.11 2.3.2
com.clearspring.analytics stream 2.7.0
com.databricks Rserve 1.8-3
com.databricks dbml-local_2.11 0.1.2-spark2.1
com.databricks dbml-local_2.11-tests 0.1.2-spark2.1
com.databricks jets3t 0.7.1-0
com.databricks.scalapb compilerplugin_2.11 0.4.15-9
com.databricks.scalapb scalapb-runtime_2.11 0.4.15-9
com.esotericsoftware kryo-shaded 3.0.3
com.esotericsoftware minlog 1.3.0
com.fasterxml classmate 1.0.0
com.fasterxml.jackson.core jackson-annotations 2.6.5
com.fasterxml.jackson.core jackson-core 2.6.5
com.fasterxml.jackson.core jackson-databind 2.6.5
com.fasterxml.jackson.dataformat jackson-dataformat-cbor 2.6.5
com.fasterxml.jackson.datatype jackson-datatype-joda 2.6.5
com.fasterxml.jackson.module jackson-module-paranamer 2.6.5
com.fasterxml.jackson.module jackson-module-scala_2.11 2.6.5
com.github.fommil jniloader 1.1
com.github.fommil.netlib core 1.1.2
com.github.fommil.netlib native_ref-java 1.1
com.github.fommil.netlib native_ref-java-natives 1.1
com.github.fommil.netlib native_system-java 1.1
com.github.fommil.netlib native_system-java-natives 1.1
com.github.fommil.netlib netlib-native_ref-linux-x86_64-natives 1.1
com.github.fommil.netlib netlib-native_system-linux-x86_64-natives 1.1
com.github.rwl jtransforms 2.4.0
com.google.code.findbugs jsr305 2.0.1
com.google.code.gson gson 2.2.4
com.google.guava guava 15.0
com.google.protobuf protobuf-java 2.6.1
com.googlecode.javaewah JavaEWAH 0.3.2
com.h2database h2 1.3.174
com.jamesmurty.utils java-xmlbuilder 1.0
com.jcraft jsch 0.1.50
com.jolbox bonecp 0.8.0.RELEASE
com.mchange c3p0 0.9.5.1
com.mchange mchange-commons-java 0.2.10
com.microsoft.azure azure-data-lake-store-sdk 2.0.11
com.microsoft.azure azure-storage 2.0.0
com.ning compress-lzf 1.0.3
com.sun.mail javax.mail 1.5.2
com.thoughtworks.paranamer paranamer 2.6
com.trueaccord.lenses lenses_2.11 0.3
com.twitter chill-java 0.8.0
com.twitter chill_2.11 0.8.0
com.twitter parquet-hadoop-bundle 1.6.0
com.twitter util-app_2.11 6.23.0
com.twitter util-core_2.11 6.23.0
com.twitter util-jvm_2.11 6.23.0
com.typesafe config 1.2.1
com.typesafe.scala-logging scala-logging-api_2.11 2.1.2
com.typesafe.scala-logging scala-logging-slf4j_2.11 2.1.2
com.univocity univocity-parsers 2.2.1
com.zaxxer HikariCP 2.4.1
commons-beanutils commons-beanutils 1.7.0
commons-beanutils commons-beanutils-core 1.8.0
commons-cli commons-cli 1.2
commons-codec commons-codec 1.10
commons-collections commons-collections 3.2.2
commons-configuration commons-configuration 1.6
commons-dbcp commons-dbcp 1.4
commons-digester commons-digester 1.8
commons-httpclient commons-httpclient 3.1
commons-io commons-io 2.4
commons-lang commons-lang 2.6
commons-logging commons-logging 1.1.3
commons-net commons-net 2.2
commons-pool commons-pool 1.5.4
info.ganglia.gmetric4j gmetric4j 1.0.7
io.dropwizard.metrics metrics-core 3.1.2
io.dropwizard.metrics metrics-ganglia 3.1.2
io.dropwizard.metrics metrics-graphite 3.1.2
io.dropwizard.metrics metrics-healthchecks 3.1.2
io.dropwizard.metrics metrics-jetty9 3.1.2
io.dropwizard.metrics metrics-json 3.1.2
io.dropwizard.metrics metrics-jvm 3.1.2
io.dropwizard.metrics metrics-log4j 3.1.2
io.dropwizard.metrics metrics-servlets 3.1.2
io.netty netty 3.9.9.Final
io.netty netty-all 4.0.43.Final
io.prometheus simpleclient 0.0.16
io.prometheus simpleclient_common 0.0.16
io.prometheus simpleclient_dropwizard 0.0.16
io.prometheus simpleclient_servlet 0.0.16
io.prometheus.jmx collector 0.7
javax.activation activation 1.1.1
javax.annotation javax.annotation-api 1.2
javax.el javax.el-api 2.2.4
javax.jdo jdo-api 3.0.1
javax.servlet javax.servlet-api 3.1.0
javax.servlet.jsp jsp-api 2.1
javax.transaction jta 1.1
javax.validation validation-api 1.1.0.Final
javax.ws.rs javax.ws.rs-api 2.0.1
javax.xml.bind jaxb-api 2.2.2
javax.xml.stream stax-api 1.0-2
javolution javolution 5.5.1
jline jline 2.11
joda-time joda-time 2.9.3
log4j apache-log4j-extras 1.2.17
log4j log4j 1.2.17
mx4j mx4j 3.0.2
mysql mysql-connector-java 5.1.27
net.hydromatic eigenbase-properties 1.1.5
net.iharder base64 2.3.8
net.java.dev.jets3t jets3t 0.9.3
net.jpountz.lz4 lz4 1.3.0
net.razorvine pyrolite 4.13
net.sf.jpam jpam 1.1
net.sf.opencsv opencsv 2.3
net.sf.py4j py4j 0.10.4
net.sf.supercsv super-csv 2.2.0
net.sourceforge.f2j arpack_combined_all 0.1
org.acplt oncrpc 1.0.7
org.antlr ST4 4.0.4
org.antlr antlr-runtime 3.4
org.antlr antlr4-runtime 4.5.3
org.antlr stringtemplate 3.2.1
org.apache.ant ant 1.9.2
org.apache.ant ant-jsch 1.9.2
org.apache.ant ant-launcher 1.9.2
org.apache.avro avro 1.7.7
org.apache.avro avro-ipc 1.7.7
org.apache.avro avro-ipc-tests 1.7.7
org.apache.avro avro-mapred-hadoop2 1.7.7
org.apache.calcite calcite-avatica 1.2.0-incubating
org.apache.calcite calcite-core 1.2.0-incubating
org.apache.calcite calcite-linq4j 1.2.0-incubating
org.apache.commons commons-compress 1.4.1
org.apache.commons commons-crypto 1.0.0
org.apache.commons commons-lang3 3.5
org.apache.commons commons-math3 3.4.1
org.apache.curator curator-client 2.6.0
org.apache.curator curator-framework 2.6.0
org.apache.curator curator-recipes 2.6.0
org.apache.derby derby 10.10.2.0
org.apache.directory.api api-asn1-api 1.0.0-M20
org.apache.directory.api api-util 1.0.0-M20
org.apache.directory.server apacheds-i18n 2.0.0-M15
org.apache.directory.server apacheds-kerberos-codec 2.0.0-M15
org.apache.hadoop hadoop-annotations 2.7.3
org.apache.hadoop hadoop-auth 2.7.3
org.apache.hadoop hadoop-azure 2.7.3
org.apache.hadoop hadoop-client 2.7.3
org.apache.hadoop hadoop-common 2.7.3
org.apache.hadoop hadoop-hdfs 2.7.3
org.apache.hadoop hadoop-mapreduce-client-app 2.7.3
org.apache.hadoop hadoop-mapreduce-client-common 2.7.3
org.apache.hadoop hadoop-mapreduce-client-core 2.7.3
org.apache.hadoop hadoop-mapreduce-client-jobclient 2.7.3
org.apache.hadoop hadoop-mapreduce-client-shuffle 2.7.3
org.apache.hadoop hadoop-yarn-api 2.7.3
org.apache.hadoop hadoop-yarn-client 2.7.3
org.apache.hadoop hadoop-yarn-common 2.7.3
org.apache.hadoop hadoop-yarn-server-common 2.7.3
org.apache.htrace htrace-core 3.1.0-incubating
org.apache.httpcomponents httpclient 4.5.2
org.apache.httpcomponents httpcore 4.4.4
org.apache.ivy ivy 2.4.0
org.apache.parquet parquet-column 1.8.2
org.apache.parquet parquet-common 1.8.2
org.apache.parquet parquet-encoding 1.8.2
org.apache.parquet parquet-format 2.3.1
org.apache.parquet parquet-hadoop 1.8.2
org.apache.parquet parquet-jackson 1.8.2
org.apache.thrift libfb303 0.9.3
org.apache.thrift libthrift 0.9.3
org.apache.xbean xbean-asm5-shaded 4.4
org.apache.zookeeper zookeeper 3.4.6
org.bouncycastle bcprov-jdk15on 1.51
org.codehaus.jackson jackson-core-asl 1.9.13
org.codehaus.jackson jackson-jaxrs 1.9.13
org.codehaus.jackson jackson-mapper-asl 1.9.13
org.codehaus.jackson jackson-xc 1.9.13
org.codehaus.janino commons-compiler 3.0.0
org.codehaus.janino janino 3.0.0
org.datanucleus datanucleus-api-jdo 3.2.6
org.datanucleus datanucleus-core 3.2.10
org.datanucleus datanucleus-rdbms 3.2.9
org.eclipse.jetty jetty-client 9.3.11.v20160721
org.eclipse.jetty jetty-continuation 9.3.11.v20160721
org.eclipse.jetty jetty-http 9.3.11.v20160721
org.eclipse.jetty jetty-io 9.3.11.v20160721
org.eclipse.jetty jetty-jndi 9.3.11.v20160721
org.eclipse.jetty jetty-plus 9.3.11.v20160721
org.eclipse.jetty jetty-proxy 9.3.11.v20160721
org.eclipse.jetty jetty-security 9.3.11.v20160721
org.eclipse.jetty jetty-server 9.3.11.v20160721
org.eclipse.jetty jetty-servlet 9.3.11.v20160721
org.eclipse.jetty jetty-servlets 9.3.11.v20160721
org.eclipse.jetty jetty-util 9.3.11.v20160721
org.eclipse.jetty jetty-webapp 9.3.11.v20160721
org.eclipse.jetty jetty-xml 9.3.11.v20160721
org.fusesource.leveldbjni leveldbjni-all 1.8
org.glassfish.hk2 hk2-api 2.4.0-b34
org.glassfish.hk2 hk2-locator 2.4.0-b34
org.glassfish.hk2 hk2-utils 2.4.0-b34
org.glassfish.hk2 osgi-resource-locator 1.0.1
org.glassfish.hk2.external aopalliance-repackaged 2.4.0-b34
org.glassfish.hk2.external javax.inject 2.4.0-b34
org.glassfish.jersey.bundles.repackaged jersey-guava 2.22.2
org.glassfish.jersey.containers jersey-container-servlet 2.22.2
org.glassfish.jersey.containers jersey-container-servlet-core 2.22.2
org.glassfish.jersey.core jersey-client 2.22.2
org.glassfish.jersey.core jersey-common 2.22.2
org.glassfish.jersey.core jersey-server 2.22.2
org.glassfish.jersey.media jersey-media-jaxb 2.22.2
org.hibernate hibernate-validator 5.1.1.Final
org.iq80.snappy snappy 0.2
org.javassist javassist 3.18.1-GA
org.jboss.logging jboss-logging 3.1.3.GA
org.jdbi jdbi 2.63.1
org.joda joda-convert 1.7
org.jodd jodd-core 3.5.2
org.jpmml pmml-model 1.2.15
org.jpmml pmml-schema 1.2.15
org.json4s json4s-ast_2.11 3.2.11
org.json4s json4s-core_2.11 3.2.11
org.json4s json4s-jackson_2.11 3.2.11
org.mockito mockito-all 1.9.5
org.objenesis objenesis 2.1
org.postgresql postgresql 9.4-1204-jdbc41
org.roaringbitmap RoaringBitmap 0.5.11
org.rosuda.REngine REngine 2.1.0
org.scala-lang scala-compiler_2.11 2.11.8
org.scala-lang scala-library_2.11 2.11.8
org.scala-lang scala-reflect_2.11 2.11.8
org.scala-lang scalap_2.11 2.11.8
org.scala-lang.modules scala-parser-combinators_2.11 1.0.2
org.scala-lang.modules scala-xml_2.11 1.0.2
org.scala-sbt test-interface 1.0
org.scalacheck scalacheck_2.11 1.12.5
org.scalanlp breeze-macros_2.11 0.13.1
org.scalanlp breeze_2.11 0.13.1
org.scalatest scalatest_2.11 2.2.6
org.slf4j jcl-over-slf4j 1.7.16
org.slf4j jul-to-slf4j 1.7.16
org.slf4j slf4j-api 1.7.16
org.slf4j slf4j-log4j12 1.7.16
org.spark-project.hive hive-beeline 1.2.1.spark2
org.spark-project.hive hive-cli 1.2.1.spark2
org.spark-project.hive hive-exec 1.2.1.spark2
org.spark-project.hive hive-jdbc 1.2.1.spark2
org.spark-project.hive hive-metastore 1.2.1.spark2
org.spark-project.spark unused 1.0.0
org.spire-math spire-macros_2.11 0.13.0
org.spire-math spire_2.11 0.13.0
org.springframework spring-core 4.1.4.RELEASE
org.springframework spring-test 4.1.4.RELEASE
org.tukaani xz 1.0
org.typelevel machinist_2.11 0.6.1
org.typelevel macro-compat_2.11 1.1.1
org.xerial sqlite-jdbc 3.8.11.2
org.xerial.snappy snappy-java 1.1.2.6
org.yaml snakeyaml 1.16
oro oro 2.0.8
software.amazon.ion ion-java 1.0.2
stax stax-api 1.0.1
xmlenc xmlenc 0.52