Databricks Runtime 7.0 (Unsupported)

Databricks released this image in June 2020.

The following release notes provide information about Databricks Runtime 7.0, powered by Apache Spark 3.0.

New features

Databricks Runtime 7.0 includes the following new features:

  • Scala 2.12

    Databricks Runtime 7.0 upgrades Scala from 2.11.12 to 2.12.10. The change list between Scala 2.12 and 2.11 is in the Scala 2.12.0 release notes.

  • Auto Loader (Public Preview), released in Databricks Runtime 6.4, has been improved in Databricks Runtime 7.0

    Auto Loader gives you a more efficient way to process new data files incrementally as they arrive on a cloud blob store during ETL. This is an improvement over file-based structured streaming, which identifies new files by repeatedly listing the cloud directory and tracking the files that have been seen, and can be very inefficient as the directory grows. Auto Loader is also more convenient and effective than file-notification-based structured streaming, which requires that you manually configure file-notification services on the cloud and doesn’t let you backfill existing files. For details, see Auto Loader.

  • COPY INTO (Public Preview), which lets you load data into Delta Lake with idempotent retries, has been improved in Databricks Runtime 7.0

    Released as a Public Preview in Databricks Runtime 6.4, the COPY INTO SQL command lets you load data into Delta Lake with idempotent retries. To load data into Delta Lake today you have to use Apache Spark DataFrame APIs. If there are failures during loads, you have to handle them effectively. The new COPY INTO command provides a familiar declarative interface to load data in SQL. The command keeps track of previously loaded files and you safely re-run it in case of failures. For details, see COPY INTO.

Improvements

  • More Amazon Kinesis concurrent streams:

    The Amazon Kinesis Structured Streaming source uses ListShards by default to get the list of shards in a Kinesis stream. This requires additional IAM permissions to successfully run your stream. In previous versions of Databricks Runtime, DescribeStream was used by default. ListShards has a significantly higher API limit than DescribeStream (100 requests per second per stream for ListShards versus 10 requests per second across your entire AWS account for DescribeStream). This change will allow users to run more than 10 concurrent Kinesis streams with Structured Streaming in Databricks.

  • Azure Synapse (formerly SQL Data Warehouse) connector supports the COPY statement.

    The main benefit of COPY is that lower privileged users can write data to Azure Synapse without needing strict CONTROL permissions on Azure Synapse.

  • The %matplotlib inline magic command is no longer required to display Matplolib objects inline in notebook cells. They are always displayed inline by default.

  • Matplolib figures are now rendered with transparent=False, so that user-specified backgrounds are not lost. This behavior can be overridden by setting Spark configuration spark.databricks.workspace.matplotlib.transparent true.

  • When running Structured Streaming production jobs on High Concurrency mode clusters, restarts of a job would occasionally fail, because the previously running job wasn’t terminated properly. Databricks Runtime 6.3 introduced the ability to set the SQL configuration spark.sql.streaming.stopActiveRunOnRestart true on your cluster to ensure that the previous run stops. This configuration is set by default in Databricks Runtime 7.0.

Major library changes

Python packages

Major Python packages upgraded:

  • boto3 1.9.162 -> 1.12.0

  • matplotlib 3.0.3 -> 3.1.3

  • numpy 1.16.2 -> 1.18.1

  • pandas 0.24.2 -> 1.0.1

  • pip 19.0.3 -> 20.0.2

  • pyarrow 0.13.0 -> 0.15.1

  • psycopg2 2.7.6 -> 2.8.4

  • scikit-learn 0.20.3 -> 0.22.1

  • scipy 1.2.1 -> 1.4.1

  • seaborn 0.9.0 -> 0.10.0

Python packages removed:

  • boto (use boto3)

  • pycurl

Note

The Python environment in Databricks Runtime 7.0 uses Python 3.7, which is different from the installed Ubuntu system Python: /usr/bin/python and /usr/bin/python2 are linked to Python 2.7 and /usr/bin/python3 is linked to Python 3.6.

R packages

R packages added:

  • broom

  • highr

  • isoband

  • knitr

  • markdown

  • modelr

  • reprex

  • rmarkdown

  • rvest

  • selectr

  • tidyverse

  • tinytex

  • xfun

R packages removed:

  • abind

  • bitops

  • car

  • carData

  • doMC

  • gbm

  • h2o

  • littler

  • lme4

  • mapproj

  • maps

  • maptools

  • MatrixModels

  • minqa

  • mvtnorm

  • nloptr

  • openxlsx

  • pbkrtest

  • pkgKitten

  • quantreg

  • R.methodsS3

  • R.oo

  • R.utils

  • RcppEigen

  • RCurl

  • rio

  • sp

  • SparseM

  • statmod

  • zip

Java and Scala libraries

  • AWS SDK (aws-java-sdk) upgraded to 1.11.655.

  • Amazon Kinesis Client upgraded to 1.12.0

  • Apache Hive version used for handling Hive user-defined functions and Hive SerDes upgraded to 2.3.

  • Previously Azure Storage and Key Vault jars were packaged as part of Databricks Runtime, which would prevent you from using different versions of those libraries attached to clusters. Classes under com.microsoft.azure.storage and com.microsoft.azure.keyvault are no longer on the class path in Databricks Runtime. If you depend on either of those class paths, you must now attach Azure Storage SDK or Azure Key Vault SDK to your clusters.

Behavior changes

This section lists behavior changes from Databricks Runtime 6.6 to Databricks Runtime 7.0. You should be aware of these as you migrate workloads from lower Databricks Runtime releases to Databricks Runtime 7.0 and above.

Spark behavior changes

Because Databricks Runtime 7.0 is the first Databricks Runtime built on Spark 3.0, there are many changes that you should be aware of when you migrate workloads from Databricks Runtime 5.5 LTS or 6.x, which are built on Spark 2.4. These changes are listed in the “Behavior changes” section of each functional area in the Apache Spark section of this release notes article:

Other behavior changes

  • The upgrade to Scala 2.12 involves the following changes:

    • Package cell serialization is handled differently. The following example illustrates the behavior change and how to handle it.

      Running foo.bar.MyObjectInPackageCell.run() as defined in the following package cell will trigger the error java.lang.NoClassDefFoundError: Could not initialize class foo.bar.MyObjectInPackageCell$

      package foo.bar
      
      case class MyIntStruct(int: Int)
      
      import org.apache.spark.sql.SparkSession
      import org.apache.spark.sql.functions._
      import org.apache.spark.sql.Column
      
      object MyObjectInPackageCell extends Serializable {
      
        // Because SparkSession cannot be created in Spark executors,
        // the following line triggers the error
        // Could not initialize class foo.bar.MyObjectInPackageCell$
        val spark = SparkSession.builder.getOrCreate()
      
        def foo: Int => Option[MyIntStruct] = (x: Int) => Some(MyIntStruct(100))
      
        val theUDF = udf(foo)
      
        val df = {
          val myUDFInstance = theUDF(col("id"))
          spark.range(0, 1, 1, 1).withColumn("u", myUDFInstance)
        }
      
        def run(): Unit = {
          df.collect().foreach(println)
        }
      }
      

      To work around this error, you can wrap MyObjectInPackageCell inside a serializable class.

    • Certain cases using DataStreamWriter.foreachBatch will require a source code update. This change is due to the fact that Scala 2.12 has automatic conversion from lambda expressions to SAM types and can cause ambiguity.

      For example, the following Scala code can’t compile:

      streams
        .writeStream
        .foreachBatch { (df, id) => myFunc(df, id) }
      

      To fix the compilation error, change foreachBatch { (df, id) => myFunc(df, id) } to foreachBatch(myFunc _) or use the Java API explicitly: foreachBatch(new VoidFunction2 ...).

  • With the AWS SDK upgrade to 1.11.655, the use of org.apache.hadoop.fs.s3native.NativeS3FileSystem requires AWS Signature v4 and bucket endpoint setup. A 403 Forbidden error may be thrown if a user has configured AWS Signature v2 to sign requests to S3 with the S3N file system or a user accesses an S3 path that contains “+” characters and uses the legacy S3N file system (for example s3n://bucket/path/+file).

  • Because the Apache Hive version used for handling Hive user-defined functions and Hive SerDes is upgraded to 2.3, two changes are required:

    • Hive’s SerDe interface is replaced by an abstract class AbstractSerDe. For any custom Hive SerDe implementation, migrating to AbstractSerDe is required.

    • Setting spark.sql.hive.metastore.jars to builtin means that the Hive 2.3 metastore client will be used to access metastores for Databricks Runtime 7.0. If you need to access Hive 1.2 based external metastores, set spark.sql.hive.metastore.jars to the folder that contains Hive 1.2 jars.

Deprecations and removals

  • Data skipping index was deprecated in Databricks Runtime 4.3 and removed in Databricks Runtime 7.0. We recommend that you use Delta tables instead, which offer improved data skipping capabilities.

  • In Databricks Runtime 7.0, the underlying version of Apache Spark uses Scala 2.12. Since libraries compiled against Scala 2.11 can disable Databricks Runtime 7.0 clusters in unexpected ways, clusters running Databricks Runtime 7.0 and above do not install libraries configured to be installed on all clusters. The cluster Libraries tab shows a status Skipped and a deprecation message that explains the changes in library handling. However, if you have a cluster that was created on an earlier version of Databricks Runtime before Databricks platform version 3.20 was released to your workspace, and you now edit that cluster to use Databricks Runtime 7.0, any libraries that were configured to be installed on all clusters will be installed on that cluster. In this case, any incompatible JARs in the installed libraries can cause the cluster to be disabled. The workaround is either to clone the cluster or to create a new cluster.

  • org.apache.hadoop.fs.s3native.NativeS3FileSystem and org.apache.hadoop.fs.s3.S3FileSystem are no longer supported for accessing S3.

    We strongly encourage you to use com.databricks.s3a.S3AFileSystem, which is the default for s3a://, s3://, and s3n:// file system schemes in Databricks Runtime. If you need assistance with migration to com.databricks.s3a.S3AFileSystem, contact Databricks support or your Databricks representative.

  • The ability to use Local file APIs was removed in Databricks Runtime 7.0 on Community Edition. We recommend that you use %fs cp to copy your data to and from a local directory instead.

Apache Spark

Databricks Runtime 7.0 includes Apache Spark 3.0.

Core, Spark SQL, Structured Streaming

Highlights

Performance enhancements

Extensibility enhancements

  • Catalog plugin API (SPARK-31121)

  • Data source V2 API refactoring (SPARK-25390)

  • Hive 3.0 and 3.1 metastore support (SPARK-27970),(SPARK-24360)

  • Extend Spark plugin interface to driver (SPARK-29396)

  • Extend Spark metrics system with user-defined metrics using executor plugins (SPARK-28091)

  • Developer APIs for extended Columnar Processing Support (SPARK-27396)

  • Built-in source migration using DSV2: parquet, ORC, CSV, JSON, Kafka, Text, Avro (SPARK-27589)

  • Allow FunctionInjection in SparkExtensions (SPARK-25560)

  • Allows Aggregator to be registered as a UDAF (SPARK-27296)

Connector enhancements

  • Support High Performance S3A committers (SPARK-23977)

  • Column pruning through nondeterministic expressions (SPARK-29768)

  • Support spark.sql.statistics.fallBackToHdfs in data source tables (SPARK-25474)

  • Allow partition pruning with subquery filters on file source (SPARK-26893)

  • Avoid pushdown of subqueries in data source filters (SPARK-25482)

  • Recursive data loading from file sources (SPARK-27990)

  • Parquet/ORC

  • CSV

    • Support filters pushdown in CSV datasource (SPARK-30323)

  • Hive SerDe

    • No schema inference when reading Hive serde table with native data source (SPARK-27119)

    • Hive CTAS commands should use data source if it is convertible (SPARK-25271)

    • Use native data source to optimize inserting partitioned Hive table (SPARK-28573)

  • Apache Kafka

    • Add support for Kafka headers (SPARK-23539)

    • Add Kafka delegation token support (SPARK-25501)

    • Introduce new option to Kafka source: offset by timestamp (starting/ending) (SPARK-26848)

    • Support the minPartitions option in Kafka batch source and streaming source v1 (SPARK-30656)

    • Upgrade Kafka to 2.4.1 (SPARK-31126)

  • New built-in data sources

Feature enhancements

SQL compatibility enhancements

  • Switch to Proleptic Gregorian calendar (SPARK-26651)

  • Build Spark’s own datetime pattern definition (SPARK-31408)

  • Introduce ANSI store assignment policy for table insertion (SPARK-28495)

  • Follow ANSI store assignment rule in table insertion by default (SPARK-28885)

  • Add a SQLConf spark.sql.ansi.enabled (SPARK-28989)

  • Support ANSI SQL filter clause for aggregate expression (SPARK-27986)

  • Support ANSI SQL OVERLAY function (SPARK-28077)

  • Support ANSI nested bracketed comments (SPARK-28880)

  • Throw exception on overflow for integers (SPARK-26218)

  • Overflow check for interval arithmetic operations (SPARK-30341)

  • Throw Exception when invalid string is cast to numeric type (SPARK-30292)

  • Make interval multiply and divide’s overflow behavior consistent with other operations (SPARK-30919)

  • Add ANSI type aliases for char and decimal (SPARK-29941)

  • SQL Parser defines ANSI compliant reserved keywords (SPARK-26215)

  • Forbid reserved keywords as identifiers when ANSI mode is on (SPARK-26976)

  • Support ANSI SQL LIKE ... ESCAPE syntax (SPARK-28083)

  • Support ANSI SQL Boolean-Predicate syntax (SPARK-27924)

  • Better support for correlated subquery processing (SPARK-18455)

Monitoring and debugability enhancements

  • New Structured Streaming UI (SPARK-29543)

  • SHS: Allow event logs for running streaming apps to be rolled over (SPARK-28594)

  • Add an API that allows a user to define and observe arbitrary metrics on batch and streaming queries (SPARK-29345)

  • Instrumentation for tracking per-query planning time (SPARK-26129)

  • Put the basic shuffle metrics in the SQL exchange operator (SPARK-26139)

  • SQL statement is shown in SQL Tab instead of callsite (SPARK-27045)

  • Add tooltip to SparkUI (SPARK-29449)

  • Improve the concurrent performance of History Server (SPARK-29043)

  • EXPLAIN FORMATTED command (SPARK-27395)

  • Support Dumping truncated plans and generated code to a file (SPARK-26023)

  • Enhance describe framework to describe the output of a query (SPARK-26982)

  • Add SHOW VIEWS command (SPARK-31113)

  • Improve the error messages of SQL parser (SPARK-27901)

  • Support Prometheus monitoring natively (SPARK-29429)

PySpark enhancements

  • Redesigned pandas UDFs with type hints (SPARK-28264)

  • Pandas UDF pipeline (SPARK-26412)

  • Support StructType as arguments and return types for Scalar Pandas UDF (SPARK-27240 )

  • Support Dataframe Cogroup via Pandas UDFs (SPARK-27463)

  • Add mapInPandas to allow an iterator of DataFrames (SPARK-28198)

  • Certain SQL functions should take column names as well (SPARK-26979)

  • Make PySpark SQL exceptions more Pythonic (SPARK-31849)

Documentation and test coverage enhancements

Other notable changes

  • Built-in Hive execution upgrade from 1.2.1 to 2.3.6  (SPARK-23710, SPARK-28723, SPARK-31381)

  • Use Apache Hive 2.3 dependency by default (SPARK-30034)

  • GA Scala 2.12 and remove 2.11 (SPARK-26132)

  • Improve logic for timing out executors in dynamic allocation (SPARK-20286)

  • Disk-persisted RDD blocks served by shuffle service and ignored for Dynamic Allocation (SPARK-27677)

  • Acquire new executors to avoid hang because of blocklisting (SPARK-22148)

  • Allow sharing of Netty’s memory pool allocators (SPARK-24920)

  • Fix deadlock between TaskMemoryManager and UnsafeExternalSorter$SpillableIterator (SPARK-27338)

  • Introduce AdmissionControl APIs for StructuredStreaming (SPARK-30669)

  • Spark History Main page performance improvement (SPARK-25973)

  • Speed up and slim down metric aggregation in SQL listener (SPARK-29562)

  • Avoid the network when shuffle blocks are fetched from the same host (SPARK-27651)

  • Improve file listing for DistributedFileSystem (SPARK-27801)

Behavior changes for Spark core, Spark SQL, and Structured Streaming

The following migration guides list behavior changes between Apache Spark 2.4 and 3.0. These changes may require updates to jobs that you have been running on lower Databricks Runtime versions:

The following behavior changes are not covered in these migration guides:

  • In Spark 3.0, the deprecated class org.apache.spark.sql.streaming.ProcessingTime has been removed. Use org.apache.spark.sql.streaming.Trigger.ProcessingTime instead. Likewise, org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger has been removed in favor of Trigger.Continuous, and org.apache.spark.sql.execution.streaming.OneTimeTrigger has been hidden in favor of Trigger.Once. (SPARK-28199)

  • In Databricks Runtime 7.0, when reading a Hive SerDe table, by default Spark disallows reading files under a subdirectory that is not a table partition. To enable it, set the configuration spark.databricks.io.hive.scanNonpartitionedDirectory.enabled as true. This does not affect Spark native table readers and file readers.

MLlib

Highlights

Behavior changes for MLlib

The following migration guide lists behavior changes between Apache Spark 2.4 and 3.0. These changes may require updates to jobs that you have been running on lower Databricks Runtime versions:

The following behavior changes are not covered in the migration guide:

  • In Spark 3.0, a multiclass logistic regression in Pyspark will now (correctly) return LogisticRegressionSummary, not the subclass BinaryLogisticRegressionSummary. The additional methods exposed by BinaryLogisticRegressionSummary would not work in this case anyway. (SPARK-31681)

  •  In Spark 3.0, pyspark.ml.param.shared.Has* mixins do not provide any set*(self, value) setter methods anymore, use the respective self.set(self.*, value) instead. See SPARK-29093 for details. (SPARK-29093)

SparkR

  • Arrow optimization in SparkR’s interoperability (SPARK-26759)

  • Performance enhancement via vectorized R gapply(), dapply(), createDataFrame, collect()

  • “Eager execution” for R shell, IDE (SPARK-24572)

  • R API for Power Iteration Clustering (SPARK-19827)

Behavior changes for SparkR

The following migration guide lists behavior changes between Apache Spark 2.4 and 3.0. These changes may require updates to jobs that you have been running on lower Databricks Runtime versions:

Programming guide

GraphX

Programming guide: GraphX Programming Guide.

Deprecations

Known issues

  • Parsing day of year using pattern letter ‘D’ returns the wrong result if the year field is missing. This can happen in SQL functions like to_timestamp which parses datetime string to datetime values using a pattern string. (SPARK-31939)

  • Join/Window/Aggregate inside subqueries may lead to wrong results if the keys have values -0.0 and 0.0. (SPARK-31958)

  • A window query may fail with ambiguous self-join error unexpectedly. (SPARK-31956)

  • Streaming queries with dropDuplicates operator may not be able to restart with the checkpoint written by Spark 2.x. (SPARK-31990)

System environment

  • Operating System: Ubuntu 18.04.4 LTS

  • Java: 1.8.0_252

  • Scala: 2.12.10

  • Python: 3.7.5

  • R: R version 3.6.3 (2020-02-29)

  • Delta Lake 0.7.0

Installed Python libraries

Library

Version

Library

Version

Library

Version

asn1crypto

1.3.0

backcall

0.1.0

boto3

1.12.0

botocore

1.15.0

certifi

2020.4.5

cffi

1.14.0

chardet

3.0.4

cryptography

2.8

cycler

0.10.0

Cython

0.29.15

decorator

4.4.1

docutils

0.15.2

entrypoints

0.3

idna

2.8

ipykernel

5.1.4

ipython

7.12.0

ipython-genutils

0.2.0

jedi

0.14.1

jmespath

0.9.4

joblib

0.14.1

jupyter-client

5.3.4

jupyter-core

4.6.1

kiwisolver

1.1.0

matplotlib

3.1.3

numpy

1.18.1

pandas

1.0.1

parso

0.5.2

patsy

0.5.1

pexpect

4.8.0

pickleshare

0.7.5

pip

20.0.2

prompt-toolkit

3.0.3

psycopg2

2.8.4

ptyprocess

0.6.0

pyarrow

0.15.1

pycparser

2.19

Pygments

2.5.2

PyGObject

3.26.1

pyOpenSSL

19.1.0

pyparsing

2.4.6

PySocks

1.7.1

python-apt

1.6.5+ubuntu0.3

python-dateutil

2.8.1

pytz

2019.3

pyzmq

18.1.1

requests

2.22.0

s3transfer

0.3.3

scikit-learn

0.22.1

scipy

1.4.1

seaborn

0.10.0

setuptools

45.2.0

six

1.14.0

ssh-import-id

5.7

statsmodels

0.11.0

tornado

6.0.3

traitlets

4.3.3

unattended-upgrades

0.1

urllib3

1.25.8

virtualenv

16.7.10

wcwidth

0.1.8

wheel

0.34.2

Installed R libraries

R libraries are installed from (Microsoft CRAN snapshot on 2020-04-22).

Library

Version

Library

Version

Library

Version

askpass

1.1

assertthat

0.2.1

backports

1.1.6

base

3.6.3

base64enc

0.1-3

BH

1.72.0-3

bit

1.1-15.2

bit64

0.9-7

blob

1.2.1

boot

1.3-25

brew

1.0-6

broom

0.5.6

callr

3.4.3

caret

6.0-86

cellranger

1.1.0

chron

2.3-55

class

7.3-17

cli

2.0.2

clipr

0.7.0

cluster

2.1.0

codetools

0.2-16

colorspace

1.4-1

commonmark

1.7

compiler

3.6.3

config

0.3

covr

3.5.0

crayon

1.3.4

crosstalk

1.1.0.1

curl

4.3

data.table

1.12.8

datasets

3.6.3

DBI

1.1.0

dbplyr

1.4.3

desc

1.2.0

devtools

2.3.0

digest

0.6.25

dplyr

0.8.5

DT

0.13

ellipsis

0.3.0

evaluate

0.14

fansi

0.4.1

farver

2.0.3

fastmap

1.0.1

forcats

0.5.0

foreach

1.5.0

foreign

0.8-76

forge

0.2.0

fs

1.4.1

generics

0.0.2

ggplot2

3.3.0

gh

1.1.0

git2r

0.26.1

glmnet

3.0-2

globals

0.12.5

glue

1.4.0

gower

0.2.1

graphics

3.6.3

grDevices

3.6.3

grid

3.6.3

gridExtra

2.3

gsubfn

0.7

gtable

0.3.0

haven

2.2.0

highr

0.8

hms

0.5.3

htmltools

0.4.0

htmlwidgets

1.5.1

httpuv

1.5.2

httr

1.4.1

hwriter

1.3.2

hwriterPlus

1.0-3

ini

0.3.1

ipred

0.9-9

isoband

0.2.1

iterators

1.0.12

jsonlite

1.6.1

KernSmooth

2.23-17

knitr

1.28

labeling

0.3

later

1.0.0

lattice

0.20-41

lava

1.6.7

lazyeval

0.2.2

lifecycle

0.2.0

lubridate

1.7.8

magrittr

1.5

markdown

1.1

MASS

7.3-51.6

Matrix

1.2-18

memoise

1.1.0

methods

3.6.3

mgcv

1.8-31

mime

0.9

ModelMetrics

1.2.2.2

modelr

0.1.6

munsell

0.5.0

nlme

3.1-147

nnet

7.3-14

numDeriv

2016.8-1.1

openssl

1.4.1

parallel

3.6.3

pillar

1.4.3

pkgbuild

1.0.6

pkgconfig

2.0.3

pkgload

1.0.2

plogr

0.2.0

plyr

1.8.6

praise

1.0.0

prettyunits

1.1.1

pROC

1.16.2

processx

3.4.2

prodlim

2019.11.13

progress

1.2.2

promises

1.1.0

proto

1.0.0

ps

1.3.2

purrr

0.3.4

r2d3

0.2.3

R6

2.4.1

randomForest

4.6-14

rappdirs

0.3.1

rcmdcheck

1.3.3

RColorBrewer

1.1-2

Rcpp

1.0.4.6

readr

1.3.1

readxl

1.3.1

recipes

0.1.10

rematch

1.0.1

rematch2

2.1.1

remotes

2.1.1

reprex

0.3.0

reshape2

1.4.4

rex

1.2.0

rjson

0.2.20

rlang

0.4.5

rmarkdown

2.1

RODBC

1.3-16

roxygen2

7.1.0

rpart

4.1-15

rprojroot

1.3-2

Rserve

1.8-6

RSQLite

2.2.0

rstudioapi

0.11

rversions

2.0.1

rvest

0.3.5

scales

1.1.0

selectr

0.4-2

sessioninfo

1.1.1

shape

1.4.4

shiny

1.4.0.2

sourcetools

0.1.7

sparklyr

1.2.0

SparkR

3.0.0

spatial

7.3-11

splines

3.6.3

sqldf

0.4-11

SQUAREM

2020.2

stats

3.6.3

stats4

3.6.3

stringi

1.4.6

stringr

1.4.0

survival

3.1-12

sys

3.3

tcltk

3.6.3

TeachingDemos

2.10

testthat

2.3.2

tibble

3.0.1

tidyr

1.0.2

tidyselect

1.0.0

tidyverse

1.3.0

timeDate

3043.102

tinytex

0.22

tools

3.6.3

usethis

1.6.0

utf8

1.1.4

utils

3.6.3

vctrs

0.2.4

viridisLite

0.3.0

whisker

0.4

withr

2.2.0

xfun

0.13

xml2

1.3.1

xopen

1.0.0

xtable

1.8-4

yaml

2.2.1

Installed Java and Scala libraries (Scala 2.12 cluster version)

Group ID

Artifact ID

Version

antlr

antlr

2.7.7

com.amazonaws

amazon-kinesis-client

1.12.0

com.amazonaws

aws-java-sdk-autoscaling

1.11.655

com.amazonaws

aws-java-sdk-cloudformation

1.11.655

com.amazonaws

aws-java-sdk-cloudfront

1.11.655

com.amazonaws

aws-java-sdk-cloudhsm

1.11.655

com.amazonaws

aws-java-sdk-cloudsearch

1.11.655

com.amazonaws

aws-java-sdk-cloudtrail

1.11.655

com.amazonaws

aws-java-sdk-cloudwatch

1.11.655

com.amazonaws

aws-java-sdk-cloudwatchmetrics

1.11.655

com.amazonaws

aws-java-sdk-codedeploy

1.11.655

com.amazonaws

aws-java-sdk-cognitoidentity

1.11.655

com.amazonaws

aws-java-sdk-cognitosync

1.11.655

com.amazonaws

aws-java-sdk-config

1.11.655

com.amazonaws

aws-java-sdk-core

1.11.655

com.amazonaws

aws-java-sdk-datapipeline

1.11.655

com.amazonaws

aws-java-sdk-directconnect

1.11.655

com.amazonaws

aws-java-sdk-directory

1.11.655

com.amazonaws

aws-java-sdk-dynamodb

1.11.655

com.amazonaws

aws-java-sdk-ec2

1.11.655

com.amazonaws

aws-java-sdk-ecs

1.11.655

com.amazonaws

aws-java-sdk-efs

1.11.655

com.amazonaws

aws-java-sdk-elasticache

1.11.655

com.amazonaws

aws-java-sdk-elasticbeanstalk

1.11.655

com.amazonaws

aws-java-sdk-elasticloadbalancing

1.11.655

com.amazonaws

aws-java-sdk-elastictranscoder

1.11.655

com.amazonaws

aws-java-sdk-emr

1.11.655

com.amazonaws

aws-java-sdk-glacier

1.11.655

com.amazonaws

aws-java-sdk-iam

1.11.655

com.amazonaws

aws-java-sdk-importexport

1.11.655

com.amazonaws

aws-java-sdk-kinesis

1.11.655

com.amazonaws

aws-java-sdk-kms

1.11.655

com.amazonaws

aws-java-sdk-lambda

1.11.655

com.amazonaws

aws-java-sdk-logs

1.11.655

com.amazonaws

aws-java-sdk-machinelearning

1.11.655

com.amazonaws

aws-java-sdk-opsworks

1.11.655

com.amazonaws

aws-java-sdk-rds

1.11.655

com.amazonaws

aws-java-sdk-redshift

1.11.655

com.amazonaws

aws-java-sdk-route53

1.11.655

com.amazonaws

aws-java-sdk-s3

1.11.655

com.amazonaws

aws-java-sdk-ses

1.11.655

com.amazonaws

aws-java-sdk-simpledb

1.11.655

com.amazonaws

aws-java-sdk-simpleworkflow

1.11.655

com.amazonaws

aws-java-sdk-sns

1.11.655

com.amazonaws

aws-java-sdk-sqs

1.11.655

com.amazonaws

aws-java-sdk-ssm

1.11.655

com.amazonaws

aws-java-sdk-storagegateway

1.11.655

com.amazonaws

aws-java-sdk-sts

1.11.655

com.amazonaws

aws-java-sdk-support

1.11.655

com.amazonaws

aws-java-sdk-swf-libraries

1.11.22

com.amazonaws

aws-java-sdk-workspaces

1.11.655

com.amazonaws

jmespath-java

1.11.655

com.chuusai

shapeless_2.12

2.3.3

com.clearspring.analytics

stream

2.9.6

com.databricks

Rserve

1.8-3

com.databricks

jets3t

0.7.1-0

com.databricks.scalapb

compilerplugin_2.12

0.4.15-10

com.databricks.scalapb

scalapb-runtime_2.12

0.4.15-10

com.esotericsoftware

kryo-shaded

4.0.2

com.esotericsoftware

minlog

1.3.0

com.fasterxml

classmate

1.3.4

com.fasterxml.jackson.core

jackson-annotations

2.10.0

com.fasterxml.jackson.core

jackson-core

2.10.0

com.fasterxml.jackson.core

jackson-databind

2.10.0

com.fasterxml.jackson.dataformat

jackson-dataformat-cbor

2.10.0

com.fasterxml.jackson.datatype

jackson-datatype-joda

2.10.0

com.fasterxml.jackson.module

jackson-module-paranamer

2.10.0

com.fasterxml.jackson.module

jackson-module-scala_2.12

2.10.0

com.github.ben-manes.caffeine

caffeine

2.3.4

com.github.fommil

jniloader

1.1

com.github.fommil.netlib

core

1.1.2

com.github.fommil.netlib

native_ref-java

1.1

com.github.fommil.netlib

native_ref-java-natives

1.1

com.github.fommil.netlib

native_system-java

1.1

com.github.fommil.netlib

native_system-java-natives

1.1

com.github.fommil.netlib

netlib-native_ref-linux-x86_64-natives

1.1

com.github.fommil.netlib

netlib-native_system-linux-x86_64-natives

1.1

com.github.joshelser

dropwizard-metrics-hadoop-metrics2-reporter

0.1.2

com.github.luben

zstd-jni

1.4.4-3

com.github.wendykierp

JTransforms

3.1

com.google.code.findbugs

jsr305

3.0.0

com.google.code.gson

gson

2.2.4

com.google.flatbuffers

flatbuffers-java

1.9.0

com.google.guava

guava

15.0

com.google.protobuf

protobuf-java

2.6.1

com.h2database

h2

1.4.195

com.helger

profiler

1.1.1

com.jcraft

jsch

0.1.50

com.jolbox

bonecp

0.8.0.RELEASE

com.microsoft.azure

azure-data-lake-store-sdk

2.2.8

com.microsoft.sqlserver

mssql-jdbc

8.2.1.jre8

com.ning

compress-lzf

1.0.3

com.sun.mail

javax.mail

1.5.2

com.tdunning

json

1.8

com.thoughtworks.paranamer

paranamer

2.8

com.trueaccord.lenses

lenses_2.12

0.4.12

com.twitter

chill-java

0.9.5

com.twitter

chill_2.12

0.9.5

com.twitter

util-app_2.12

7.1.0

com.twitter

util-core_2.12

7.1.0

com.twitter

util-function_2.12

7.1.0

com.twitter

util-jvm_2.12

7.1.0

com.twitter

util-lint_2.12

7.1.0

com.twitter

util-registry_2.12

7.1.0

com.twitter

util-stats_2.12

7.1.0

com.typesafe

config

1.2.1

com.typesafe.scala-logging

scala-logging_2.12

3.7.2

com.univocity

univocity-parsers

2.8.3

com.zaxxer

HikariCP

3.1.0

commons-beanutils

commons-beanutils

1.9.4

commons-cli

commons-cli

1.2

commons-codec

commons-codec

1.10

commons-collections

commons-collections

3.2.2

commons-configuration

commons-configuration

1.6

commons-dbcp

commons-dbcp

1.4

commons-digester

commons-digester

1.8

commons-fileupload

commons-fileupload

1.3.3

commons-httpclient

commons-httpclient

3.1

commons-io

commons-io

2.4

commons-lang

commons-lang

2.6

commons-logging

commons-logging

1.1.3

commons-net

commons-net

3.1

commons-pool

commons-pool

1.5.4

info.ganglia.gmetric4j

gmetric4j

1.0.10

io.airlift

aircompressor

0.10

io.dropwizard.metrics

metrics-core

4.1.1

io.dropwizard.metrics

metrics-graphite

4.1.1

io.dropwizard.metrics

metrics-healthchecks

4.1.1

io.dropwizard.metrics

metrics-jetty9

4.1.1

io.dropwizard.metrics

metrics-jmx

4.1.1

io.dropwizard.metrics

metrics-json

4.1.1

io.dropwizard.metrics

metrics-jvm

4.1.1

io.dropwizard.metrics

metrics-servlets

4.1.1

io.netty

netty-all

4.1.47.Final

jakarta.annotation

jakarta.annotation-api

1.3.5

jakarta.validation

jakarta.validation-api

2.0.2

jakarta.ws.rs

jakarta.ws.rs-api

2.1.6

javax.activation

activation

1.1.1

javax.el

javax.el-api

2.2.4

javax.jdo

jdo-api

3.0.1

javax.servlet

javax.servlet-api

3.1.0

javax.servlet.jsp

jsp-api

2.1

javax.transaction

jta

1.1

javax.transaction

transaction-api

1.1

javax.xml.bind

jaxb-api

2.2.2

javax.xml.stream

stax-api

1.0-2

javolution

javolution

5.5.1

jline

jline

2.14.6

joda-time

joda-time

2.10.5

log4j

apache-log4j-extras

1.2.17

log4j

log4j

1.2.17

net.razorvine

pyrolite

4.30

net.sf.jpam

jpam

1.1

net.sf.opencsv

opencsv

2.3

net.sf.supercsv

super-csv

2.2.0

net.snowflake

snowflake-ingest-sdk

0.9.6

net.snowflake

snowflake-jdbc

3.12.0

net.snowflake

spark-snowflake_2.12

2.5.9-spark_2.4

net.sourceforge.f2j

arpack_combined_all

0.1

org.acplt.remotetea

remotetea-oncrpc

1.1.2

org.antlr

ST4

4.0.4

org.antlr

antlr-runtime

3.5.2

org.antlr

antlr4-runtime

4.7.1

org.antlr

stringtemplate

3.2.1

org.apache.ant

ant

1.9.2

org.apache.ant

ant-jsch

1.9.2

org.apache.ant

ant-launcher

1.9.2

org.apache.arrow

arrow-format

0.15.1

org.apache.arrow

arrow-memory

0.15.1

org.apache.arrow

arrow-vector

0.15.1

org.apache.avro

avro

1.8.2

org.apache.avro

avro-ipc

1.8.2

org.apache.avro

avro-mapred-hadoop2

1.8.2

org.apache.commons

commons-compress

1.8.1

org.apache.commons

commons-crypto

1.0.0

org.apache.commons

commons-lang3

3.9

org.apache.commons

commons-math3

3.4.1

org.apache.commons

commons-text

1.6

org.apache.curator

curator-client

2.7.1

org.apache.curator

curator-framework

2.7.1

org.apache.curator

curator-recipes

2.7.1

org.apache.derby

derby

10.12.1.1

org.apache.directory.api

api-asn1-api

1.0.0-M20

org.apache.directory.api

api-util

1.0.0-M20

org.apache.directory.server

apacheds-i18n

2.0.0-M15

org.apache.directory.server

apacheds-kerberos-codec

2.0.0-M15

org.apache.hadoop

hadoop-annotations

2.7.4

org.apache.hadoop

hadoop-auth

2.7.4

org.apache.hadoop

hadoop-client

2.7.4

org.apache.hadoop

hadoop-common

2.7.4

org.apache.hadoop

hadoop-hdfs

2.7.4

org.apache.hadoop

hadoop-mapreduce-client-app

2.7.4

org.apache.hadoop

hadoop-mapreduce-client-common

2.7.4

org.apache.hadoop

hadoop-mapreduce-client-core

2.7.4

org.apache.hadoop

hadoop-mapreduce-client-jobclient

2.7.4

org.apache.hadoop

hadoop-mapreduce-client-shuffle

2.7.4

org.apache.hadoop

hadoop-yarn-api

2.7.4

org.apache.hadoop

hadoop-yarn-client

2.7.4

org.apache.hadoop

hadoop-yarn-common

2.7.4

org.apache.hadoop

hadoop-yarn-server-common

2.7.4

org.apache.hive

hive-beeline

2.3.7

org.apache.hive

hive-cli

2.3.7

org.apache.hive

hive-common

2.3.7

org.apache.hive

hive-exec-core

2.3.7

org.apache.hive

hive-jdbc

2.3.7

org.apache.hive

hive-llap-client

2.3.7

org.apache.hive

hive-llap-common

2.3.7

org.apache.hive

hive-metastore

2.3.7

org.apache.hive

hive-serde

2.3.7

org.apache.hive

hive-shims

2.3.7

org.apache.hive

hive-storage-api

2.7.1

org.apache.hive

hive-vector-code-gen

2.3.7

org.apache.hive.shims

hive-shims-0.23

2.3.7

org.apache.hive.shims

hive-shims-common

2.3.7

org.apache.hive.shims

hive-shims-scheduler

2.3.7

org.apache.htrace

htrace-core

3.1.0-incubating

org.apache.httpcomponents

httpclient

4.5.6

org.apache.httpcomponents

httpcore

4.4.12

org.apache.ivy

ivy

2.4.0

org.apache.orc

orc-core

1.5.10

org.apache.orc

orc-mapreduce

1.5.10

org.apache.orc

orc-shims

1.5.10

org.apache.parquet

parquet-column

1.10.1.2-databricks4

org.apache.parquet

parquet-common

1.10.1.2-databricks4

org.apache.parquet

parquet-encoding

1.10.1.2-databricks4

org.apache.parquet

parquet-format

2.4.0

org.apache.parquet

parquet-hadoop

1.10.1.2-databricks4

org.apache.parquet

parquet-jackson

1.10.1.2-databricks4

org.apache.thrift

libfb303

0.9.3

org.apache.thrift

libthrift

0.12.0

org.apache.velocity

velocity

1.5

org.apache.xbean

xbean-asm7-shaded

4.15

org.apache.yetus

audience-annotations

0.5.0

org.apache.zookeeper

zookeeper

3.4.14

org.codehaus.jackson

jackson-core-asl

1.9.13

org.codehaus.jackson

jackson-jaxrs

1.9.13

org.codehaus.jackson

jackson-mapper-asl

1.9.13

org.codehaus.jackson

jackson-xc

1.9.13

org.codehaus.janino

commons-compiler

3.0.16

org.codehaus.janino

janino

3.0.16

org.datanucleus

datanucleus-api-jdo

4.2.4

org.datanucleus

datanucleus-core

4.1.17

org.datanucleus

datanucleus-rdbms

4.1.19

org.datanucleus

javax.jdo

3.2.0-m3

org.eclipse.jetty

jetty-client

9.4.18.v20190429

org.eclipse.jetty

jetty-continuation

9.4.18.v20190429

org.eclipse.jetty

jetty-http

9.4.18.v20190429

org.eclipse.jetty

jetty-io

9.4.18.v20190429

org.eclipse.jetty

jetty-jndi

9.4.18.v20190429

org.eclipse.jetty

jetty-plus

9.4.18.v20190429

org.eclipse.jetty

jetty-proxy

9.4.18.v20190429

org.eclipse.jetty

jetty-security

9.4.18.v20190429

org.eclipse.jetty

jetty-server

9.4.18.v20190429

org.eclipse.jetty

jetty-servlet

9.4.18.v20190429

org.eclipse.jetty

jetty-servlets

9.4.18.v20190429

org.eclipse.jetty

jetty-util

9.4.18.v20190429

org.eclipse.jetty

jetty-webapp

9.4.18.v20190429

org.eclipse.jetty

jetty-xml

9.4.18.v20190429

org.fusesource.leveldbjni

leveldbjni-all

1.8

org.glassfish.hk2

hk2-api

2.6.1

org.glassfish.hk2

hk2-locator

2.6.1

org.glassfish.hk2

hk2-utils

2.6.1

org.glassfish.hk2

osgi-resource-locator

1.0.3

org.glassfish.hk2.external

aopalliance-repackaged

2.6.1

org.glassfish.hk2.external

jakarta.inject

2.6.1

org.glassfish.jersey.containers

jersey-container-servlet

2.30

org.glassfish.jersey.containers

jersey-container-servlet-core

2.30

org.glassfish.jersey.core

jersey-client

2.30

org.glassfish.jersey.core

jersey-common

2.30

org.glassfish.jersey.core

jersey-server

2.30

org.glassfish.jersey.inject

jersey-hk2

2.30

org.glassfish.jersey.media

jersey-media-jaxb

2.30

org.hibernate.validator

hibernate-validator

6.1.0.Final

org.javassist

javassist

3.25.0-GA

org.jboss.logging

jboss-logging

3.3.2.Final

org.jdbi

jdbi

2.63.1

org.joda

joda-convert

1.7

org.jodd

jodd-core

3.5.2

org.json4s

json4s-ast_2.12

3.6.6

org.json4s

json4s-core_2.12

3.6.6

org.json4s

json4s-jackson_2.12

3.6.6

org.json4s

json4s-scalap_2.12

3.6.6

org.lz4

lz4-java

1.7.1

org.mariadb.jdbc

mariadb-java-client

2.1.2

org.objenesis

objenesis

2.5.1

org.postgresql

postgresql

42.1.4

org.roaringbitmap

RoaringBitmap

0.7.45

org.roaringbitmap

shims

0.7.45

org.rocksdb

rocksdbjni

6.2.2

org.rosuda.REngine

REngine

2.1.0

org.scala-lang

scala-compiler_2.12

2.12.10

org.scala-lang

scala-library_2.12

2.12.10

org.scala-lang

scala-reflect_2.12

2.12.10

org.scala-lang.modules

scala-collection-compat_2.12

2.1.1

org.scala-lang.modules

scala-parser-combinators_2.12

1.1.2

org.scala-lang.modules

scala-xml_2.12

1.2.0

org.scala-sbt

test-interface

1.0

org.scalacheck

scalacheck_2.12

1.14.2

org.scalactic

scalactic_2.12

3.0.8

org.scalanlp

breeze-macros_2.12

1.0

org.scalanlp

breeze_2.12

1.0

org.scalatest

scalatest_2.12

3.0.8

org.slf4j

jcl-over-slf4j

1.7.30

org.slf4j

jul-to-slf4j

1.7.30

org.slf4j

slf4j-api

1.7.30

org.slf4j

slf4j-log4j12

1.7.30

org.spark-project.spark

unused

1.0.0

org.springframework

spring-core

4.1.4.RELEASE

org.springframework

spring-test

4.1.4.RELEASE

org.threeten

threeten-extra

1.5.0

org.tukaani

xz

1.5

org.typelevel

algebra_2.12

2.0.0-M2

org.typelevel

cats-kernel_2.12

2.0.0-M4

org.typelevel

machinist_2.12

0.6.8

org.typelevel

macro-compat_2.12

1.1.1

org.typelevel

spire-macros_2.12

0.17.0-M1

org.typelevel

spire-platform_2.12

0.17.0-M1

org.typelevel

spire-util_2.12

0.17.0-M1

org.typelevel

spire_2.12

0.17.0-M1

org.xerial

sqlite-jdbc

3.8.11.2

org.xerial.snappy

snappy-java

1.1.7.5

org.yaml

snakeyaml

1.24

oro

oro

2.0.8

pl.edu.icm

JLargeArrays

1.5

software.amazon.ion

ion-java

1.0.2

stax

stax-api

1.0.1

xmlenc

xmlenc

0.52