Databricks Runtime 4.2

Databricks released this image in July 2018.

Important

This release was deprecated on March 5, 2019. For more information about the Databricks Runtime deprecation policy and schedule, see Databricks Runtime Versioning and Deprecation Policy.

The following release notes provide information about Databricks Runtime 4.2, powered by Apache Spark.

Databricks Delta

Databricks Runtime 4.2 adds major quality improvements and functionality to Databricks Delta. Databricks highly recommends that all Databricks Delta customers upgrade to the new runtime. This release remains in Private Preview, but it represents a candidate release in anticipation of the upcoming general availability (GA) release.

New features

  • Multi-cluster writes - Databricks Delta now supports transactional writes from multiple clusters. To use this feature all writers must be running Databricks Runtime 4.2.
  • Streams can now be directly written to a Databricks Delta table registered in the Hive metastore using df.writeStream.table(...).

Improvements

  • All Databricks Delta commands and queries now support referring to a table using its path as an identifier (that is, delta.`/path/to/table`). Previously OPTIMIZE and VACUUM required non-standard use of string literals (that is, '/path/to/table').
  • DESCRIBE HISTORY now includes the commit ID and is ordered newest to oldest by default.

Bug fixes

  • Filtering based on partition predicates now operates correctly even when the case of the predicates differs from that of the table.
  • Fixed missing column AnalysisException when performing equality checks on boolean columns in Databricks Delta tables (that is, booleanValue = true).
  • CREATE TABLE no longer modifies the transaction log when creating a pointer to an existing table. This prevents unnecessary conflicts with concurrent streams and allows the creation of metastore pointer to tables where the user only has read access to the data.
  • Calling display() on a stream with large amounts of data no longer causes OOM in the driver.
  • AnalysisException is now thrown when an underlying Databricks Delta path is deleted, rather than returning empty results.
  • Databricks Delta configs that require a specific protocol version (for example, appendOnly) can only be applied to tables at an appropriate version.
  • When updating the state of a Databricks Delta table, long lineages are now automatically truncated to avoid a StackOverFlowError.

Structured Streaming

New features

  • Databricks Delta, SQS, and Kafka now fully support Trigger.Once. Previously rate limits (for example maxOffsetsPerTrigger or maxFilesPerTrigger) specified as source options or defaults could result in only a partial execution of available data. These options are now ignored when Trigger.Once is used, allowing all currently available data to be processed.

  • Added new streaming foreachBatch() in Scala, where the you can define a function to process the output of every microbatch using DataFrame operations. This enables the following:

    • Using existing batch data sources to write microbatch outputs to systems that do not have a streaming data source yet (for example, use Cassandra batch writer on every micro-batch output).
    • Writing microbatch output to multiple locations.
    • Applying DataFrame and table operations on microbatch outputs that are not supported in streaming DataFrames yet (for example, upsert microbatch output into a Databricks Delta table).
  • Added from_avro/to_avro functions to read and write Avro data within a DataFrame instead of just files, similar to from_json/to_json.

    See Read and Write Avro Data Anywhere for more details.

  • Added support for streaming foreach() in Python (already available in Scala).

    See foreach and foreachBatch documentation for more details.

Improvements

  • Faster generation of output results and/or state cleanup with stateful operations (mapGroupsWithState, stream-stream join, streaming aggregation, streaming dropDuplicates) when there is no data in the input stream.

Bug fixes

  • Fixed correctness bug SPARK-24588 in stream-stream join where join reported fewer results when there is an explicit repartition before it (for example, df1.repartition("a", "b").join(df, "a")).

Other Changes and Improvements

  • Fixed the S3A filesystem to enable reading Parquet files over S3 with client-side encryption. Reads were previously throwing java.io.EOFException errors.
  • Added support for SQL Deny command for table access control enabled clusters. Users can now deny specific permissions in the same way they could be granted before. A denied permission will supersede a granted one. Admins and Owners of a particular object are still always allowed to perform actions.
  • New Azure Data Lake Storage Gen2 data source that uses the ABFS driver. See Azure Data Lake Storage Gen2
  • Upgraded some installed Python libraries:
    • pip: from 10.0.0b2 to 10.0.1
    • setuptools: from 39.0.1 to 39.2.0
    • tornado: 5.0.1 to 5.0.2
    • wheel: 0.31.0 to 0.31.1
  • Upgraded several installed R libraries. See Installed R Libraries.
  • Improved Parquet support
  • Upgraded Apache ORC from 1.4.1 to 1.4.3

Apache Spark

Databricks Runtime 4.2 includes Apache Spark 2.3.1. This release includes all fixes and improvements included in Databricks Runtime 4.1, as well as the following additional bug fixes and improvements made to Spark:

  • [SPARK-24588][SS] streaming join should require HashClusteredPartitioning from children
  • [SPARK-23931][SQL] Make arrays_zip in function.scala @scala.annotation.varargs.
  • [SPARK-24633][SQL] Fix codegen when split is required for arrays_zip
  • [SPARK-24578][CORE] Cap sub-region’s size of returned nio buffer
  • [SPARK-24613][SQL] Cache with UDF could not be matched with subsequent dependent caches
  • [SPARK-24583][SQL] Wrong schema type in InsertIntoDataSourceCommand
  • [SPARK-24565][SS] Add API for in Structured Streaming for exposing output rows of each microbatch as a DataFrame
  • [SPARK-24396][SS][PYSPARK] Add Structured Streaming ForeachWriter for Python
  • [SPARK-24216][SQL] Spark TypedAggregateExpression uses getSimpleName that is not safe in Scala
  • [SPARK-24452][SQL][CORE] Avoid possible overflow in int add or multiple
  • [SPARK-24187][R][SQL] Add array_join function to SparkR
  • [SPARK-24525][SS] Provide an option to limit number of rows in a MemorySink
  • [SPARK-24331][SPARKR][SQL] Adding arrays_overlap, array_repeat, map_entries to SparkR
  • [SPARK-23931][SQL] Adds arrays_zip function to Spark SQL
  • [SPARK-24186][R][SQL] change reverse and concat to collection functions in R
  • [SPARK-24198][SPARKR][SQL] Adding slice function to SparkR
  • [SPARK-23920][SQL] add array_remove to remove all elements that equal element from array
  • [SPARK-24197][SPARKR][SQL] Adding array_sort function to SparkR
  • [SPARK-24340][CORE] Clean up non-shuffle disk block manager files following executor exits on a standalone cluster
  • [SPARK-23935][SQL] Adding map_entries function
  • [SPARK-24500][SQL] Make sure streams are materialized during Tree transforms.
  • [SPARK-24495][SQL] EnsureRequirement returns wrong plan when reordering equal keys
  • [SPARK-24506][UI] Add UI filters to tabs added after binding
  • [SPARK-24468][SQL] Handle negative scale when adjusting precision for decimal operations
  • [SPARK-24313][SQL] Fix collection operations’ interpreted evaluation for complex types
  • [SPARK-23922][SQL] Add arrays_overlap function
  • [SPARK-24369][SQL] Correct handling for multiple distinct aggregations having the same argument set
  • [SPARK-24455][CORE] fix typo in TaskSchedulerImpl comment
  • [SPARK-24397][PYSPARK] Added TaskContext.getLocalProperty(key) in Python
  • [SPARK-24117][SQL] Unified the getSizePerRow
  • [SPARK-24156][SS] Fix error recovering from the failure in a no-data batch
  • [SPARK-24414][UI] Calculate the correct number of tasks for a stage.
  • [SPARK-23754][PYTHON] Re-raising StopIteration in client code
  • [SPARK-23991][DSTREAMS] Fix data loss when WAL write fails in allocateBlocksToBatch
  • [SPARK-24373][SQL] Add AnalysisBarrier to RelationalGroupedDataset’s and KeyValueGroupedDataset’s child
  • [SPARK-24392][PYTHON] Label pandas_udf as Experimental
  • [SPARK-24334] Fix race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator
  • [SPARK-19112][CORE] Add missing shortCompressionCodecNames to configuration.
  • [SPARK-24244][SPARK-24368][SQL] Passing only required columns to the CSV parser
  • [SPARK-24366][SQL] Improving of error messages for type converting
  • [SPARK-24371][SQL] Added isInCollection in DataFrame API for Scala an…
  • [SPARK-23925][SQL] Add array_repeat collection function
  • [MINOR] Add port SSL config in toString and scaladoc
  • [SPARK-24378][SQL] Fix date_trunc function incorrect examples
  • [SPARK-24364][SS] Prevent InMemoryFileIndex from failing if file path doesn’t exist
  • [SPARK-24257][SQL] LongToUnsafeRowMap calculate the new size may be wrong
  • [SPARK-24348][SQL] element_at” error fix
  • [SPARK-23930][SQL] Add slice function
  • [SPARK-23416][SS] Add a specific stop method for ContinuousExecution.
  • [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
  • [SPARK-24350][SQL] Fixes ClassCastException in the “array_position” function
  • [SPARK-24321][SQL] Extract common code from Divide/Remainder to a base trait
  • [SPARK-24309][CORE] AsyncEventQueue should stop on interrupt.
  • [SPARK-23850][SQL] Add separate config for SQL options redaction.
  • [SPARK-22371][CORE] Return None instead of throwing an exception when an accumulator is garbage collected.
  • [SPARK-24002][SQL] Task not serializable caused by org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes
  • [SPARK-23921][SQL] Add array_sort function
  • [SPARK-23923][SQL] Add cardinality function
  • [SPARK-24159][SS] Enable no-data micro batches for streaming mapGroupswithState
  • [SPARK-24158][SS] Enable no-data batches for streaming joins
  • [SPARK-24157][SS] Enabled no-data batches in MicroBatchExecution for streaming aggregation and deduplication
  • [SPARK-23799][SQL] FilterEstimation.evaluateInSet produces wrong stats for STRING
  • [SPARK-17916][SQL] Fix empty string being parsed as null when nullValue is set.
  • [SPARK-23916][SQL] Add array_join function
  • [SPARK-23408][SS] Synchronize successive AddData actions in Streaming*JoinSuite
  • [SPARK-23780][R] Failed to use googleVis library with new SparkR
  • [SPARK-23821][SQL] Collection function: flatten
  • [SPARK-23627][SQL] Provide isEmpty in Dataset
  • [SPARK-24027][SQL] Support MapType with StringType for keys as the root type by from_json
  • [SPARK-24035][SQL] SQL syntax for Pivot - fix antlr warning
  • [SPARK-23736][SQL] Extending the concat function to support array columns
  • [SPARK-24246][SQL] Improve AnalysisException by setting the cause when it’s available
  • [SPARK-24263][R] SparkR java check breaks with openjdk
  • [SPARK-24262][PYTHON] Fix typo in UDF type match error message
  • [SPARK-24067][STREAMING][KAFKA] Allow non-consecutive offsets
  • [SPARK-10878][CORE] Fix race condition when multiple clients resolves artifacts at the same time
  • [SPARK-19181][CORE] Fixing flaky “SparkListenerSuite.local metrics”
  • [SPARK-24068]Propagating DataFrameReader’s options to Text datasource on schema inferring
  • [SPARK-24214][SS] Fix toJSON for StreamingRelationV2/StreamingExecutionRelation/ContinuousExecutionRelation
  • [SPARK-23919][SPARK-23924][SPARK-24054][SQL] Add array_position/element_at function
  • [SPARK-23926][SQL] Extending reverse function to support ArrayType arguments
  • [SPARK-23809][SQL] Active SparkSession should be set by getOrCreate
  • [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support custom encoding for json files
  • [SPARK-24035][SQL] SQL syntax for Pivot
  • [SPARK-24069][R] Add array_min / array_max functions
  • [SPARK-23976][CORE] Detect length overflow in UTF8String.concat()/ByteArray.concat()
  • [SPARK-24188][CORE] Restore “/version” API endpoint.
  • [SPARK-24128][SQL] Mention configuration option in implicit CROSS JOIN error
  • [SPARK-23291][SQL][R] R’s substr should not reduce starting position by 1 when calling Scala API
  • [SPARK-23697][CORE] LegacyAccumulatorWrapper should define isZero correctly
  • [SPARK-24168][SQL] WindowExec should not access SQLConf at executor side
  • [SPARK-24143]filter empty blocks when convert mapstatus to (blockId, size) pair
  • [SPARK-23917][SPARK-23918][SQL] Add array_max/array_min function
  • [SPARK-23905][SQL] Add UDF weekday
  • [SPARK-16406][SQL] Improve performance of LogicalPlan.resolve
  • [SPARK-24013][SQL] Remove unneeded compress in ApproximatePercentile
  • [SPARK-23433][CORE] Late zombie task completions update all tasksets
  • [SPARK-24169][SQL] JsonToStructs should not access SQLConf at executor side
  • [SPARK-24133][SQL] Backport [SPARK-24133]Check for integer overflows when resizing WritableColumnVectors
  • [SPARK-24166][SQL] InMemoryTableScanExec should not access SQLConf at executor side
  • [SPARK-24133][SQL] Check for integer overflows when resizing WritableColumnVectors
  • [SPARK-24085][SQL] Query returns UnsupportedOperationException when scalar subquery is present in partitioning expression
  • [SPARK-24062][THRIFT SERVER] Fix SASL encryption cannot enabled issue in thrift server
  • [SPARK-23004][SS] Ensure StateStore.commit is called only once in a streaming aggregation task
  • [SPARK-23188][SQL] Make vectorized columar reader batch size configurable
  • [SPARK-23375][SPARK-23973][SQL] Eliminate unneeded Sort in Optimizer
  • [SPARK-23877][SQL] Use filter predicates to prune partitions in metadata-only queries
  • [SPARK-24033][SQL] Fix Mismatched of Window Frame specifiedwindowframe(RowFrame, -1, -1)
  • [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3
  • Fix a missing null check issue that is more likely triggered by streamlined experssion code generation, and exposed by SPARK-23986 because it made the generated source code a bit longer, and it triggered the problematic code path (code splitting by Expression.reduceCodeSize()).
  • [SPARK-23989][SQL] exchange should copy data before non-serialized shuffle
  • [SPARK-24021][CORE] fix bug in BlacklistTracker’s updateBlacklistForFetchFailure
  • [SPARK-24014][PYSPARK] Add onStreamingStarted method to StreamingListener
  • [SPARK-23963][SQL] Properly handle large number of columns in query on text-based Hive table
  • [SPARK-23948] Trigger mapstage’s job listener in submitMissingTasks
  • [SPARK-23986][SQL] freshName can generate non-unique names
  • [SPARK-23835][SQL] Add not-null check to Tuples’ arguments deserialization

Maintenance updates

Maintenance updates made to Databricks Runtime 4.2 since its initial release include:

  • Feb 26, 2019

    • Fixed a bug affecting JDBC/ODBC server.
  • Feb 12, 2019

    • [SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
    • Excluding the hidden files when building HadoopRDD.
    • Fixed Parquet Filter Conversion for IN predicate when its value is empty.
    • Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (in HIPAA-Compliant Deployment) or when spark.network.crypto.enabled is set to true).
  • Jan 30, 2019

    • Fixed an issue that can cause df.rdd.count() with UDT to return incorrect answer for certain cases.
  • Jan 8, 2019

    • Fixed issue that causes the error org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted.
    • Redacted credentials from RDD names in Spark UI
    • [SPARK-26352]join reordering should not change the order of output attributes.
    • [SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
    • Databricks Delta is enabled.
    • Databricks IO Cache is enabled for the IO Cache Accelerated instance type.
  • Dec 18, 2018

    • [SPARK-25002]Avro: revise the output record namespace.
    • Fixed an issue affecting certain queries using Join and Limit.
    • [SPARK-26307]Fixed CTAS when INSERT a partitioned table using Hive SerDe.
    • Only ignore corrupt files after one or more retries when spark.sql.files.ignoreCorruptFiles or spark.sql.files.ignoreMissingFiles flag is enabled.
    • [SPARK-26181]the hasMinMaxStats method of ColumnStatsMap is not correct.
    • Fixed an issue affecting installing Python Wheels in environments without Internet access.
    • Fixed a performance issue in query analyzer.
    • Fixed an issue in PySpark that caused DataFrame actions failed with “connection refused” error.
    • Fixed an issue affecting certain self union queries.
  • Nov 20, 2018

    • [SPARK-17916][SPARK-25241]Fix empty string being parsed as null when nullValue is set.
    • Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.
    • Fixed an issue affecting reading timestamp columns from Redshift.
  • Nov 6, 2018

    • [SPARK-25741]Long URLs are not rendered properly in web UI.
    • [SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
  • Oct 9, 2018
    • Fixed a bug affecting the output of running SHOW CREATE TABLE on Databricks Delta tables.
    • Fixed a bug affecting Union operation.
  • Sep 25, 2018
    • [SPARK-25368][SQL] Incorrect constraint inference returns wrong result.
    • [SPARK-25402][SQL] Null handling in BooleanSimplification.
    • Fixed NotSerializableException in Avro data source.
  • Sep 11, 2018
    • [SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when failOnDataLoss=false.
    • [SPARK-24987][SS] Fix Kafka consumer leak when no new offsets for TopicPartition.
    • Filter reduction should handle null value correctly.
  • Aug 28, 2018
    • Fixed a bug in Databricks Delta Delete command that would incorrectly delete the rows where the condition evaluates to null.
  • Aug 23, 2018
    • Fixed NoClassDefError for Delta Snapshot
    • [SPARK-23935]mapEntry throws org.codehaus.commons.compiler.CompileException.
    • [SPARK-24957][SQL] Average with decimal followed by aggregation returns wrong result. The incorrect results of AVERAGE might be returned. The CAST added in the Average operator will be bypassed if the result of Divide is the same type which it is casted to.
    • [SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
    • Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
    • [SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
    • [SPARK-25084]“distribute by” on multiple columns (wrap in brackets) may lead to codegen issue.
    • [SPARK-24934][SQL] Explicitly whitelist supported types in upper/lower bounds for in-memory partition pruning. When complex data types are used in query filters against cached data, Spark always returns an empty result set. The in-memory stats-based pruning generates incorrect results, because null is set for upper/lower bounds for complex types. The fix is to not use in-memory stats-based pruning for complex types.
    • Fixed secret manager redaction when command partially succeed.
    • Fixed nullable map issue in Parquet reader.
  • Aug 2, 2018
    • Added writeStream.table API in Python.
    • Fixed an issue affecting Delta checkpointing.
    • [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter. SQL cache is not being used when using DataFrameWriter to write a DataFrame with UDF. This is a regression caused by the changes we made in AnalysisBarrier, since not all the Analyzer rules are idempotent.
    • Fixed an issue that could cause mergeInto command to produce incorrect results.
    • Improved stability on accessing Azure Data Lake Storage Gen1.
    • [SPARK-24809]Serializing LongHashedRelation in executor may result in data error.
    • [SPARK-24878][SQL] Fix reverse function for array type of primitive type containing null.
  • July 11, 2018
    • Fixed a bug in query execution that would cause aggregations on decimal columns with different precisions to return incorrect results in some cases.
    • Fixed a NullPointerException bug that was thrown during advanced aggregation operations like grouping sets.

System environment

  • Operating System: Ubuntu 16.04.4 LTS
  • Java: 1.8.0_162
  • Scala: 2.11.8
  • Python: 2.7.12 for Python 2 clusters and 3.5.2 for Python 3 clusters. For details, see Python Clusters.
  • R: R version 3.4.4 (2018-03-15)
  • GPU clusters: The following NVIDIA GPU libraries are installed:
    • Tesla driver 375.66
    • CUDA 9.0
    • cuDNN 7.0

Installed Python libraries

Library Version Library Version Library Version
ansi2html 1.1.1 argparse 1.2.1 backports-abc 0.5
boto 2.42.0 boto3 1.4.1 botocore 1.4.70
brewer2mpl 1.4.1 certifi 2016.2.28 cffi 1.7.0
chardet 2.3.0 colorama 0.3.7 configobj 5.0.6
cryptography 1.5 cycler 0.10.0 Cython 0.24.1
decorator 4.0.10 docutils 0.14 enum34 1.1.6
et-xmlfile 1.0.1 freetype-py 1.0.2 funcsigs 1.0.2
fusepy 2.0.4 futures 3.2.0 ggplot 0.6.8
html5lib 0.999 idna 2.1 ipaddress 1.0.16
ipython 2.2.0 ipython-genutils 0.1.0 jdcal 1.2
Jinja2 2.8 jmespath 0.9.0 llvmlite 0.13.0
lxml 3.6.4 MarkupSafe 0.23 matplotlib 1.5.3
mpld3 0.2 msgpack-python 0.4.7 ndg-httpsclient 0.3.3
numba 0.28.1 numpy 1.11.1 openpyxl 2.3.2
pandas 0.19.2 pathlib2 2.1.0 patsy 0.4.1
pexpect 4.0.1 pickleshare 0.7.4 Pillow 3.3.1
pip 10.0.1 ply 3.9 prompt-toolkit 1.0.7
psycopg2 2.6.2 ptyprocess 0.5.1 py4j 0.10.3
pyarrow 0.8.0 pyasn1 0.1.9 pycparser 2.14
Pygments 2.1.3 PyGObject 3.20.0 pyOpenSSL 16.0.0
pyparsing 2.2.0 pypng 0.0.18 Python 2.7.12
python-dateutil 2.5.3 python-geohash 0.8.5 pytz 2016.6.1
requests 2.11.1 s3transfer 0.1.9 scikit-learn 0.18.1
scipy 0.18.1 scour 0.32 seaborn 0.7.1
setuptools 39.2.0 simplejson 3.8.2 simples3 1.0
singledispatch 3.4.0.3 six 1.10.0 statsmodels 0.6.1
tornado 5.0.2 traitlets 4.3.0 urllib3 1.19.1
virtualenv 15.0.1 wcwidth 0.1.7 wheel 0.31.1
wsgiref 0.1.2        

Installed R libraries

Library Version Library Version Library Version
abind 1.4-5 assertthat 0.2.0 backports 1.1.2
base 3.4.4 BH 1.66.0-1 bindr 0.1.1
bindrcpp 0.2.2 bit 1.1-12 bit64 0.9-7
bitops 1.0-6 blob 1.1.1 boot 1.3-20
brew 1.0-6 broom 0.4.4 car 3.0-0
carData 3.0-1 caret 6.0-79 cellranger 1.1.0
chron 2.3-52 class 7.3-14 cli 1.0.0
cluster 2.0.7-1 codetools 0.2-15 colorspace 1.3-2
commonmark 1.4 compiler 3.4.4 crayon 1.3.4
curl 3.2 CVST 0.2-1 data.table 1.10.4-3
datasets 3.4.4 DBI 0.8 ddalpha 1.3.1.1
DEoptimR 1.0-8 desc 1.1.1 devtools 1.13.5
dichromat 2.0-0 digest 0.6.15 dimRed 0.1.0
doMC 1.3.5 dplyr 0.7.4 DRR 0.0.3
forcats 0.3.0 foreach 1.4.4 foreign 0.8-70
gbm 2.1.3 ggplot2 2.2.1 git2r 0.21.0
glmnet 2.0-16 glue 1.2.0 gower 0.1.2
graphics 3.4.4 grDevices 3.4.4 grid 3.4.4
gsubfn 0.7 gtable 0.2.0 h2o 3.16.0.2
haven 1.1.1 hms 0.4.2 httr 1.3.1
hwriter 1.3.2 hwriterPlus 1.0-3 ipred 0.9-6
iterators 1.0.9 jsonlite 1.5 kernlab 0.9-25
KernSmooth 2.23-15 labeling 0.3 lattice 0.20-35
lava 1.6.1 lazyeval 0.2.1 littler 0.3.3
lme4 1.1-17 lubridate 1.7.3 magrittr 1.5
mapproj 1.2.6 maps 3.3.0 maptools 0.9-2
MASS 7.3-50 Matrix 1.2-14 MatrixModels 0.4-1
memoise 1.1.0 methods 3.4.4 mgcv 1.8-24
mime 0.5 minqa 1.2.4 mnormt 1.5-5
ModelMetrics 1.1.0 munsell 0.4.3 mvtnorm 1.0-7
nlme 3.1-137 nloptr 1.0.4 nnet 7.3-12
numDeriv 2016.8-1 openssl 1.0.1 openxlsx 4.0.17
parallel 3.4.4 pbkrtest 0.4-7 pillar 1.2.1
pkgconfig 2.0.1 pkgKitten 0.1.4 plogr 0.2.0
plyr 1.8.4 praise 1.0.0 prettyunits 1.0.2
pROC 1.11.0 prodlim 1.6.1 proto 1.0.0
psych 1.8.3.3 purrr 0.2.4 quantreg 5.35
R.methodsS3 1.7.1 R.oo 1.21.0 R.utils 2.6.0
R6 2.2.2 randomForest 4.6-14 RColorBrewer 1.1-2
Rcpp 0.12.16 RcppEigen 0.3.3.4.0 RcppRoll 0.2.2
RCurl 1.95-4.10 readr 1.1.1 readxl 1.0.0
recipes 0.1.2 rematch 1.0.1 reshape2 1.4.3
rio 0.5.10 rlang 0.2.0 robustbase 0.92-8
RODBC 1.3-15 roxygen2 6.0.1 rpart 4.1-13
rprojroot 1.3-2 Rserve 1.7-3 RSQLite 2.1.0
rstudioapi 0.7 scales 0.5.0 sfsmisc 1.1-2
sp 1.2-7 SparkR 2.3.1 SparseM 1.77
spatial 7.3-11 splines 3.4.4 sqldf 0.4-11
SQUAREM 2017.10-1 statmod 1.4.30 stats 3.4.4
stats4 3.4.4 stringi 1.1.7 stringr 1.3.0
survival 2.42-3 tcltk 3.4.4 TeachingDemos 2.10
testthat 2.0.0 tibble 1.4.2 tidyr 0.8.0
tidyselect 0.2.4 timeDate 3043.102 tools 3.4.4
utf8 1.1.3 utils 3.4.4 viridisLite 0.3.0
whisker 0.3-2 withr 2.1.2 xml2 1.2.0

Installed Java and Scala libraries (Scala 2.11 cluster version)

Group ID Artifact ID Version
antlr antlr 2.7.7
com.amazonaws amazon-kinesis-client 1.7.3
com.amazonaws aws-java-sdk-autoscaling 1.11.313
com.amazonaws aws-java-sdk-cloudformation 1.11.313
com.amazonaws aws-java-sdk-cloudfront 1.11.313
com.amazonaws aws-java-sdk-cloudhsm 1.11.313
com.amazonaws aws-java-sdk-cloudsearch 1.11.313
com.amazonaws aws-java-sdk-cloudtrail 1.11.313
com.amazonaws aws-java-sdk-cloudwatch 1.11.313
com.amazonaws aws-java-sdk-cloudwatchmetrics 1.11.313
com.amazonaws aws-java-sdk-codedeploy 1.11.313
com.amazonaws aws-java-sdk-cognitoidentity 1.11.313
com.amazonaws aws-java-sdk-cognitosync 1.11.313
com.amazonaws aws-java-sdk-config 1.11.313
com.amazonaws aws-java-sdk-core 1.11.313
com.amazonaws aws-java-sdk-datapipeline 1.11.313
com.amazonaws aws-java-sdk-directconnect 1.11.313
com.amazonaws aws-java-sdk-directory 1.11.313
com.amazonaws aws-java-sdk-dynamodb 1.11.313
com.amazonaws aws-java-sdk-ec2 1.11.313
com.amazonaws aws-java-sdk-ecs 1.11.313
com.amazonaws aws-java-sdk-efs 1.11.313
com.amazonaws aws-java-sdk-elasticache 1.11.313
com.amazonaws aws-java-sdk-elasticbeanstalk 1.11.313
com.amazonaws aws-java-sdk-elasticloadbalancing 1.11.313
com.amazonaws aws-java-sdk-elastictranscoder 1.11.313
com.amazonaws aws-java-sdk-emr 1.11.313
com.amazonaws aws-java-sdk-glacier 1.11.313
com.amazonaws aws-java-sdk-iam 1.11.313
com.amazonaws aws-java-sdk-importexport 1.11.313
com.amazonaws aws-java-sdk-kinesis 1.11.313
com.amazonaws aws-java-sdk-kms 1.11.313
com.amazonaws aws-java-sdk-lambda 1.11.313
com.amazonaws aws-java-sdk-logs 1.11.313
com.amazonaws aws-java-sdk-machinelearning 1.11.313
com.amazonaws aws-java-sdk-opsworks 1.11.313
com.amazonaws aws-java-sdk-rds 1.11.313
com.amazonaws aws-java-sdk-redshift 1.11.313
com.amazonaws aws-java-sdk-route53 1.11.313
com.amazonaws aws-java-sdk-s3 1.11.313
com.amazonaws aws-java-sdk-ses 1.11.313
com.amazonaws aws-java-sdk-simpledb 1.11.313
com.amazonaws aws-java-sdk-simpleworkflow 1.11.313
com.amazonaws aws-java-sdk-sns 1.11.313
com.amazonaws aws-java-sdk-sqs 1.11.313
com.amazonaws aws-java-sdk-ssm 1.11.313
com.amazonaws aws-java-sdk-storagegateway 1.11.313
com.amazonaws aws-java-sdk-sts 1.11.313
com.amazonaws aws-java-sdk-support 1.11.313
com.amazonaws aws-java-sdk-swf-libraries 1.11.22
com.amazonaws aws-java-sdk-workspaces 1.11.313
com.amazonaws jmespath-java 1.11.313
com.carrotsearch hppc 0.7.2
com.chuusai shapeless_2.11 2.3.2
com.clearspring.analytics stream 2.7.0
com.databricks Rserve 1.8-3
com.databricks dbml-local_2.11 0.4.1-db1-spark2.3
com.databricks dbml-local_2.11-tests 0.4.1-db1-spark2.3
com.databricks jets3t 0.7.1-0
com.databricks.scalapb compilerplugin_2.11 0.4.15-9
com.databricks.scalapb scalapb-runtime_2.11 0.4.15-9
com.esotericsoftware kryo-shaded 3.0.3
com.esotericsoftware minlog 1.3.0
com.fasterxml classmate 1.0.0
com.fasterxml.jackson.core jackson-annotations 2.6.7
com.fasterxml.jackson.core jackson-core 2.6.7
com.fasterxml.jackson.core jackson-databind 2.6.7.1
com.fasterxml.jackson.dataformat jackson-dataformat-cbor 2.6.7
com.fasterxml.jackson.datatype jackson-datatype-joda 2.6.7
com.fasterxml.jackson.module jackson-module-paranamer 2.6.7
com.fasterxml.jackson.module jackson-module-scala_2.11 2.6.7.1
com.github.fommil jniloader 1.1
com.github.fommil.netlib core 1.1.2
com.github.fommil.netlib native_ref-java 1.1
com.github.fommil.netlib native_ref-java-natives 1.1
com.github.fommil.netlib native_system-java 1.1
com.github.fommil.netlib native_system-java-natives 1.1
com.github.fommil.netlib netlib-native_ref-linux-x86_64-natives 1.1
com.github.fommil.netlib netlib-native_system-linux-x86_64-natives 1.1
com.github.luben zstd-jni 1.3.2-2
com.github.rwl jtransforms 2.4.0
com.google.code.findbugs jsr305 2.0.1
com.google.code.gson gson 2.2.4
com.google.guava guava 15.0
com.google.protobuf protobuf-java 2.6.1
com.googlecode.javaewah JavaEWAH 0.3.2
com.h2database h2 1.3.174
com.jamesmurty.utils java-xmlbuilder 1.1
com.jcraft jsch 0.1.50
com.jolbox bonecp 0.8.0.RELEASE
com.mchange c3p0 0.9.5.1
com.mchange mchange-commons-java 0.2.10
com.microsoft.azure azure-data-lake-store-sdk 2.2.8
com.microsoft.sqlserver mssql-jdbc 6.2.2.jre8
com.ning compress-lzf 1.0.3
com.sun.mail javax.mail 1.5.2
com.thoughtworks.paranamer paranamer 2.8
com.trueaccord.lenses lenses_2.11 0.3
com.twitter chill-java 0.8.4
com.twitter chill_2.11 0.8.4
com.twitter parquet-hadoop-bundle 1.6.0
com.twitter util-app_2.11 6.23.0
com.twitter util-core_2.11 6.23.0
com.twitter util-jvm_2.11 6.23.0
com.typesafe config 1.2.1
com.typesafe.scala-logging scala-logging-api_2.11 2.1.2
com.typesafe.scala-logging scala-logging-slf4j_2.11 2.1.2
com.univocity univocity-parsers 2.5.9
com.vlkan flatbuffers 1.2.0-3f79e055
com.zaxxer HikariCP 3.1.0
commons-beanutils commons-beanutils 1.7.0
commons-beanutils commons-beanutils-core 1.8.0
commons-cli commons-cli 1.2
commons-codec commons-codec 1.10
commons-collections commons-collections 3.2.2
commons-configuration commons-configuration 1.6
commons-dbcp commons-dbcp 1.4
commons-digester commons-digester 1.8
commons-httpclient commons-httpclient 3.1
commons-io commons-io 2.4
commons-lang commons-lang 2.6
commons-logging commons-logging 1.1.3
commons-net commons-net 2.2
commons-pool commons-pool 1.5.4
info.ganglia.gmetric4j gmetric4j 1.0.7
io.airlift aircompressor 0.8
io.dropwizard.metrics metrics-core 3.1.5
io.dropwizard.metrics metrics-ganglia 3.1.5
io.dropwizard.metrics metrics-graphite 3.1.5
io.dropwizard.metrics metrics-healthchecks 3.1.5
io.dropwizard.metrics metrics-jetty9 3.1.5
io.dropwizard.metrics metrics-json 3.1.5
io.dropwizard.metrics metrics-jvm 3.1.5
io.dropwizard.metrics metrics-log4j 3.1.5
io.dropwizard.metrics metrics-servlets 3.1.5
io.netty netty 3.9.9.Final
io.netty netty-all 4.1.17.Final
io.prometheus simpleclient 0.0.16
io.prometheus simpleclient_common 0.0.16
io.prometheus simpleclient_dropwizard 0.0.16
io.prometheus simpleclient_servlet 0.0.16
io.prometheus.jmx collector 0.7
javax.activation activation 1.1.1
javax.annotation javax.annotation-api 1.2
javax.el javax.el-api 2.2.4
javax.jdo jdo-api 3.0.1
javax.servlet javax.servlet-api 3.1.0
javax.servlet.jsp jsp-api 2.1
javax.transaction jta 1.1
javax.validation validation-api 1.1.0.Final
javax.ws.rs javax.ws.rs-api 2.0.1
javax.xml.bind jaxb-api 2.2.2
javax.xml.stream stax-api 1.0-2
javolution javolution 5.5.1
jline jline 2.11
joda-time joda-time 2.9.3
log4j apache-log4j-extras 1.2.17
log4j log4j 1.2.17
net.hydromatic eigenbase-properties 1.1.5
net.iharder base64 2.3.8
net.java.dev.jets3t jets3t 0.9.4
net.razorvine pyrolite 4.13
net.sf.jpam jpam 1.1
net.sf.opencsv opencsv 2.3
net.sf.supercsv super-csv 2.2.0
net.snowflake snowflake-jdbc 3.6.3
net.snowflake spark-snowflake_2.11 2.3.2
net.sourceforge.f2j arpack_combined_all 0.1
org.acplt oncrpc 1.0.7
org.antlr ST4 4.0.4
org.antlr antlr-runtime 3.4
org.antlr antlr4-runtime 4.7
org.antlr stringtemplate 3.2.1
org.apache.ant ant 1.9.2
org.apache.ant ant-jsch 1.9.2
org.apache.ant ant-launcher 1.9.2
org.apache.arrow arrow-format 0.8.0
org.apache.arrow arrow-memory 0.8.0
org.apache.arrow arrow-vector 0.8.0
org.apache.avro avro 1.7.7
org.apache.avro avro-ipc 1.7.7
org.apache.avro avro-ipc-tests 1.7.7
org.apache.avro avro-mapred-hadoop2 1.7.7
org.apache.calcite calcite-avatica 1.2.0-incubating
org.apache.calcite calcite-core 1.2.0-incubating
org.apache.calcite calcite-linq4j 1.2.0-incubating
org.apache.commons commons-compress 1.4.1
org.apache.commons commons-crypto 1.0.0
org.apache.commons commons-lang3 3.5
org.apache.commons commons-math3 3.4.1
org.apache.curator curator-client 2.7.1
org.apache.curator curator-framework 2.7.1
org.apache.curator curator-recipes 2.7.1
org.apache.derby derby 10.12.1.1
org.apache.directory.api api-asn1-api 1.0.0-M20
org.apache.directory.api api-util 1.0.0-M20
org.apache.directory.server apacheds-i18n 2.0.0-M15
org.apache.directory.server apacheds-kerberos-codec 2.0.0-M15
org.apache.hadoop hadoop-annotations 2.7.3
org.apache.hadoop hadoop-auth 2.7.3
org.apache.hadoop hadoop-client 2.7.3
org.apache.hadoop hadoop-common 2.7.3
org.apache.hadoop hadoop-hdfs 2.7.3
org.apache.hadoop hadoop-mapreduce-client-app 2.7.3
org.apache.hadoop hadoop-mapreduce-client-common 2.7.3
org.apache.hadoop hadoop-mapreduce-client-core 2.7.3
org.apache.hadoop hadoop-mapreduce-client-jobclient 2.7.3
org.apache.hadoop hadoop-mapreduce-client-shuffle 2.7.3
org.apache.hadoop hadoop-yarn-api 2.7.3
org.apache.hadoop hadoop-yarn-client 2.7.3
org.apache.hadoop hadoop-yarn-common 2.7.3
org.apache.hadoop hadoop-yarn-server-common 2.7.3
org.apache.htrace htrace-core 3.1.0-incubating
org.apache.httpcomponents httpclient 4.5.4
org.apache.httpcomponents httpcore 4.4.8
org.apache.ivy ivy 2.4.0
org.apache.orc orc-core-nohive 1.4.3
org.apache.orc orc-mapreduce-nohive 1.4.3
org.apache.parquet parquet-column 1.8.3-databricks2
org.apache.parquet parquet-common 1.8.3-databricks2
org.apache.parquet parquet-encoding 1.8.3-databricks2
org.apache.parquet parquet-format 2.3.1
org.apache.parquet parquet-hadoop 1.8.3-databricks2
org.apache.parquet parquet-jackson 1.8.3-databricks2
org.apache.thrift libfb303 0.9.3
org.apache.thrift libthrift 0.9.3
org.apache.xbean xbean-asm5-shaded 4.4
org.apache.zookeeper zookeeper 3.4.6
org.bouncycastle bcprov-jdk15on 1.58
org.codehaus.jackson jackson-core-asl 1.9.13
org.codehaus.jackson jackson-jaxrs 1.9.13
org.codehaus.jackson jackson-mapper-asl 1.9.13
org.codehaus.jackson jackson-xc 1.9.13
org.codehaus.janino commons-compiler 3.0.8
org.codehaus.janino janino 3.0.8
org.datanucleus datanucleus-api-jdo 3.2.6
org.datanucleus datanucleus-core 3.2.10
org.datanucleus datanucleus-rdbms 3.2.9
org.eclipse.jetty jetty-client 9.3.20.v20170531
org.eclipse.jetty jetty-continuation 9.3.20.v20170531
org.eclipse.jetty jetty-http 9.3.20.v20170531
org.eclipse.jetty jetty-io 9.3.20.v20170531
org.eclipse.jetty jetty-jndi 9.3.20.v20170531
org.eclipse.jetty jetty-plus 9.3.20.v20170531
org.eclipse.jetty jetty-proxy 9.3.20.v20170531
org.eclipse.jetty jetty-security 9.3.20.v20170531
org.eclipse.jetty jetty-server 9.3.20.v20170531
org.eclipse.jetty jetty-servlet 9.3.20.v20170531
org.eclipse.jetty jetty-servlets 9.3.20.v20170531
org.eclipse.jetty jetty-util 9.3.20.v20170531
org.eclipse.jetty jetty-webapp 9.3.20.v20170531
org.eclipse.jetty jetty-xml 9.3.20.v20170531
org.fusesource.leveldbjni leveldbjni-all 1.8
org.glassfish.hk2 hk2-api 2.4.0-b34
org.glassfish.hk2 hk2-locator 2.4.0-b34
org.glassfish.hk2 hk2-utils 2.4.0-b34
org.glassfish.hk2 osgi-resource-locator 1.0.1
org.glassfish.hk2.external aopalliance-repackaged 2.4.0-b34
org.glassfish.hk2.external javax.inject 2.4.0-b34
org.glassfish.jersey.bundles.repackaged jersey-guava 2.22.2
org.glassfish.jersey.containers jersey-container-servlet 2.22.2
org.glassfish.jersey.containers jersey-container-servlet-core 2.22.2
org.glassfish.jersey.core jersey-client 2.22.2
org.glassfish.jersey.core jersey-common 2.22.2
org.glassfish.jersey.core jersey-server 2.22.2
org.glassfish.jersey.media jersey-media-jaxb 2.22.2
org.hibernate hibernate-validator 5.1.1.Final
org.iq80.snappy snappy 0.2
org.javassist javassist 3.18.1-GA
org.jboss.logging jboss-logging 3.1.3.GA
org.jdbi jdbi 2.63.1
org.joda joda-convert 1.7
org.jodd jodd-core 3.5.2
org.json4s json4s-ast_2.11 3.2.11
org.json4s json4s-core_2.11 3.2.11
org.json4s json4s-jackson_2.11 3.2.11
org.lz4 lz4-java 1.4.0
org.mariadb.jdbc mariadb-java-client 2.1.2
org.mockito mockito-all 1.9.5
org.objenesis objenesis 2.1
org.postgresql postgresql 42.1.4
org.roaringbitmap RoaringBitmap 0.5.11
org.rocksdb rocksdbjni 5.2.1
org.rosuda.REngine REngine 2.1.0
org.scala-lang scala-compiler_2.11 2.11.8
org.scala-lang scala-library_2.11 2.11.8
org.scala-lang scala-reflect_2.11 2.11.8
org.scala-lang scalap_2.11 2.11.8
org.scala-lang.modules scala-parser-combinators_2.11 1.0.2
org.scala-lang.modules scala-xml_2.11 1.0.5
org.scala-sbt test-interface 1.0
org.scalacheck scalacheck_2.11 1.12.5
org.scalanlp breeze-macros_2.11 0.13.2
org.scalanlp breeze_2.11 0.13.2
org.scalatest scalatest_2.11 2.2.6
org.slf4j jcl-over-slf4j 1.7.16
org.slf4j jul-to-slf4j 1.7.16
org.slf4j slf4j-api 1.7.16
org.slf4j slf4j-log4j12 1.7.16
org.spark-project.hive hive-beeline 1.2.1.spark2
org.spark-project.hive hive-cli 1.2.1.spark2
org.spark-project.hive hive-exec 1.2.1.spark2
org.spark-project.hive hive-jdbc 1.2.1.spark2
org.spark-project.hive hive-metastore 1.2.1.spark2
org.spark-project.spark unused 1.0.0
org.spire-math spire-macros_2.11 0.13.0
org.spire-math spire_2.11 0.13.0
org.springframework spring-core 4.1.4.RELEASE
org.springframework spring-test 4.1.4.RELEASE
org.tukaani xz 1.0
org.typelevel machinist_2.11 0.6.1
org.typelevel macro-compat_2.11 1.1.1
org.xerial sqlite-jdbc 3.8.11.2
org.xerial.snappy snappy-java 1.1.2.6
org.yaml snakeyaml 1.16
oro oro 2.0.8
software.amazon.ion ion-java 1.0.2
stax stax-api 1.0.1
xmlenc xmlenc 0.52