Databricks Runtime 14.0 (EoS)

Note

Support for this Databricks Runtime version has ended. For the end-of-support date, see End-of-support history. For all supported Databricks Runtime versions, see Databricks Runtime release notes versions and compatibility.

The following release notes provide information about Databricks Runtime 14.0, powered by Apache Spark 3.5.0.

Databricks released this version in September 2023.

New features and improvements

Row tracking is GA

Row tracking for Delta Lake is now generally available. See Use row tracking for Delta tables.

Predictive I/O for updates is GA

Predictive I/O for updates is now generally available. See What is predictive I/O?.

Deletion vectors are GA

Deletion vectors are now generally available. See What are deletion vectors?.

Spark 3.5.0 is GA

Apache Spark 3.5.0 is now generally available. See Spark Release 3.5.0.

Public preview for user-defined table functions for Python

User-defined table functions (UDTFs) allow you to register functions that return tables instead of scalar values. See Python user-defined table functions (UDTFs).

Public preview for row-level concurrency

Row-level concurrency reduces conflicts between concurrent write operations by detecting changes at the row-level and automatically resolving competing changes in concurrent writes that update or delete different rows in the same data file. See Write conflicts with row-level concurrency.

Default current working directory has changed

The default current working directory (CWD) for code executed locally is now the directory containing the notebook or script being run. This includes code such as %sh and Python or R code not using Spark. See What is the default current working directory?.

Known issue with sparklyr

The installed version of the sparklyr package (version 1.8.1) is not compatible with Databricks Runtime 14.0. To use sparklyr, install version 1.8.3 or above.

Introducing Spark Connect in shared cluster architecture

With Databricks Runtime 14.0 and above, shared clusters now use Spark Connect with the Spark Driver from the Python REPL by default. Internal Spark APIs are no longer accessible from user code.

Spark Connect now interacts with the Spark Driver from the REPL, instead of the legacy REPL integration.

List available Spark versions API update

Enable Photon by setting runtime_engine = PHOTON, and enable aarch64 by choosing a graviton instance type. Databricks sets the correct Databricks Runtime version. Previously, the Spark version API would return implementation-specific runtimes for each version. See GET /api/2.0/clusters/spark-versions in the REST API Reference.

Breaking changes

In Databricks Runtime 14.0 and above, clusters with shared access mode use Spark Connect for client-server communication. This includes the following changes.

For more on shared access mode limitations, see Compute access mode limitations for Unity Catalog.

Python on clusters with shared access mode

  • sqlContext is not available. Databricks recommends using the spark variable for the SparkSession instance.

  • Spark Context (sc) is no longer available in Notebooks, or when using Databricks Connect on a cluster with shared access mode. The following sc functions are no longer available:

    • emptyRDD, range, init_batched_serializer, parallelize, pickleFile, textFile, wholeTextFiles, binaryFiles, binaryRecords, sequenceFile, newAPIHadoopFile, newAPIHadoopRDD, hadoopFile, hadoopRDD, union, runJob, setSystemProperty, uiWebUrl, stop, setJobGroup, setLocalProperty, getConf

  • The Dataset Info feature is no longer supported.

  • There is no longer a dependency on the JVM when querying Apache Spark and as a consequence, internal APIs related to the JVM, such as _jsc, _jconf, _jvm, _jsparkSession, _jreader, _jc, _jseq, _jdf, _jmap, and _jcols are no longer supported.

  • When accessing configuration values using spark.conf only dynamic runtime configuration values are accessible.

  • Delta Live Tables analysis commands are not supported on shared clusters yet.

Delta on clusters with shared access mode

  • In Python, there is no longer a dependency on JVM when querying Apache Spark. Internal APIs related to JVM, such as DeltaTable._jdt, DeltaTableBuilder._jbuilder, DeltaMergeBuilder._jbuilder, and DeltaOptimizeBuilder._jbuilder are no longer supported.

SQL on clusters with shared access mode

  • DBCACHE and DBUNCACHE commands are no longer supported.

  • Rare use cases like cache table db as show databases are no longer supported.

Library upgrades

  • Upgraded Python libraries:

    • asttokens from 2.2.1 to 2.0.5

    • attrs from 21.4.0 to 22.1.0

    • botocore from 1.27.28 to 1.27.96

    • certifi from 2022.9.14 to 2022.12.7

    • cryptography from 37.0.1 to 39.0.1

    • debugpy from 1.6.0 to 1.6.7

    • docstring-to-markdown from 0.12 to 0.11

    • executing from 1.2.0 to 0.8.3

    • facets-overview from 1.0.3 to 1.1.1

    • googleapis-common-protos from 1.56.4 to 1.60.0

    • grpcio from 1.48.1 to 1.48.2

    • idna from 3.3 to 3.4

    • ipykernel from 6.17.1 to 6.25.0

    • ipython from 8.10.0 to 8.14.0

    • Jinja2 from 2.11.3 to 3.1.2

    • jsonschema from 4.16.0 to 4.17.3

    • jupyter_core from 4.11.2 to 5.2.0

    • kiwisolver from 1.4.2 to 1.4.4

    • MarkupSafe from 2.0.1 to 2.1.1

    • matplotlib from 3.5.2 to 3.7.0

    • nbconvert from 6.4.4 to 6.5.4

    • nbformat from 5.5.0 to 5.7.0

    • nest-asyncio from 1.5.5 to 1.5.6

    • notebook from 6.4.12 to 6.5.2

    • numpy from 1.21.5 to 1.23.5

    • packaging from 21.3 to 22.0

    • pandas from 1.4.4 to 1.5.3

    • pathspec from 0.9.0 to 0.10.3

    • patsy from 0.5.2 to 0.5.3

    • Pillow from 9.2.0 to 9.4.0

    • pip from 22.2.2 to 22.3.1

    • protobuf from 3.19.4 to 4.24.0

    • pytoolconfig from 1.2.2 to 1.2.5

    • pytz from 2022.1 to 2022.7

    • s3transfer from 0.6.0 to 0.6.1

    • seaborn from 0.11.2 to 0.12.2

    • setuptools from 63.4.1 to 65.6.3

    • soupsieve from 2.3.1 to 2.3.2.post1

    • stack-data from 0.6.2 to 0.2.0

    • statsmodels from 0.13.2 to 0.13.5

    • terminado from 0.13.1 to 0.17.1

    • traitlets from 5.1.1 to 5.7.1

    • typing_extensions from 4.3.0 to 4.4.0

    • urllib3 from 1.26.11 to 1.26.14

    • virtualenv from 20.16.3 to 20.16.7

    • wheel from 0.37.1 to 0.38.4

  • Upgraded R libraries:

    • arrow from 10.0.1 to 12.0.1

    • base from 4.2.2 to 4.3.1

    • blob from 1.2.3 to 1.2.4

    • broom from 1.0.3 to 1.0.5

    • bslib from 0.4.2 to 0.5.0

    • cachem from 1.0.6 to 1.0.8

    • caret from 6.0-93 to 6.0-94

    • chron from 2.3-59 to 2.3-61

    • class from 7.3-21 to 7.3-22

    • cli from 3.6.0 to 3.6.1

    • clock from 0.6.1 to 0.7.0

    • commonmark from 1.8.1 to 1.9.0

    • compiler from 4.2.2 to 4.3.1

    • cpp11 from 0.4.3 to 0.4.4

    • curl from 5.0.0 to 5.0.1

    • data.table from 1.14.6 to 1.14.8

    • datasets from 4.2.2 to 4.3.1

    • dbplyr from 2.3.0 to 2.3.3

    • digest from 0.6.31 to 0.6.33

    • downlit from 0.4.2 to 0.4.3

    • dplyr from 1.1.0 to 1.1.2

    • dtplyr from 1.2.2 to 1.3.1

    • evaluate from 0.20 to 0.21

    • fastmap from 1.1.0 to 1.1.1

    • fontawesome from 0.5.0 to 0.5.1

    • fs from 1.6.1 to 1.6.2

    • future from 1.31.0 to 1.33.0

    • future.apply from 1.10.0 to 1.11.0

    • gargle from 1.3.0 to 1.5.1

    • ggplot2 from 3.4.0 to 3.4.2

    • gh from 1.3.1 to 1.4.0

    • glmnet from 4.1-6 to 4.1-7

    • googledrive from 2.0.0 to 2.1.1

    • googlesheets4 from 1.0.1 to 1.1.1

    • graphics from 4.2.2 to 4.3.1

    • grDevices from 4.2.2 to 4.3.1

    • grid from 4.2.2 to 4.3.1

    • gtable from 0.3.1 to 0.3.3

    • hardhat from 1.2.0 to 1.3.0

    • haven from 2.5.1 to 2.5.3

    • hms from 1.1.2 to 1.1.3

    • htmltools from 0.5.4 to 0.5.5

    • htmlwidgets from 1.6.1 to 1.6.2

    • httpuv from 1.6.8 to 1.6.11

    • httr from 1.4.4 to 1.4.6

    • ipred from 0.9-13 to 0.9-14

    • jsonlite from 1.8.4 to 1.8.7

    • KernSmooth from 2.23-20 to 2.23-21

    • knitr from 1.42 to 1.43

    • later from 1.3.0 to 1.3.1

    • lattice from 0.20-45 to 0.21-8

    • lava from 1.7.1 to 1.7.2.1

    • lubridate from 1.9.1 to 1.9.2

    • markdown from 1.5 to 1.7

    • MASS from 7.3-58.2 to 7.3-60

    • Matrix from 1.5-1 to 1.5-4.1

    • methods from 4.2.2 to 4.3.1

    • mgcv from 1.8-41 to 1.8-42

    • modelr from 0.1.10 to 0.1.11

    • nnet from 7.3-18 to 7.3-19

    • openssl from 2.0.5 to 2.0.6

    • parallel from 4.2.2 to 4.3.1

    • parallelly from 1.34.0 to 1.36.0

    • pillar from 1.8.1 to 1.9.0

    • pkgbuild from 1.4.0 to 1.4.2

    • pkgload from 1.3.2 to 1.3.2.1

    • pROC from 1.18.0 to 1.18.4

    • processx from 3.8.0 to 3.8.2

    • prodlim from 2019.11.13 to 2023.03.31

    • profvis from 0.3.7 to 0.3.8

    • ps from 1.7.2 to 1.7.5

    • Rcpp from 1.0.10 to 1.0.11

    • readr from 2.1.3 to 2.1.4

    • readxl from 1.4.2 to 1.4.3

    • recipes from 1.0.4 to 1.0.6

    • rlang from 1.0.6 to 1.1.1

    • rmarkdown from 2.20 to 2.23

    • Rserve from 1.8-12 to 1.8-11

    • RSQLite from 2.2.20 to 2.3.1

    • rstudioapi from 0.14 to 0.15.0

    • sass from 0.4.5 to 0.4.6

    • shiny from 1.7.4 to 1.7.4.1

    • sparklyr from 1.7.9 to 1.8.1

    • SparkR from 3.4.1 to 3.5.0

    • splines from 4.2.2 to 4.3.1

    • stats from 4.2.2 to 4.3.1

    • stats4 from 4.2.2 to 4.3.1

    • survival from 3.5-3 to 3.5-5

    • sys from 3.4.1 to 3.4.2

    • tcltk from 4.2.2 to 4.3.1

    • testthat from 3.1.6 to 3.1.10

    • tibble from 3.1.8 to 3.2.1

    • tidyverse from 1.3.2 to 2.0.0

    • tinytex from 0.44 to 0.45

    • tools from 4.2.2 to 4.3.1

    • tzdb from 0.3.0 to 0.4.0

    • usethis from 2.1.6 to 2.2.2

    • utils from 4.2.2 to 4.3.1

    • vctrs from 0.5.2 to 0.6.3

    • viridisLite from 0.4.1 to 0.4.2

    • vroom from 1.6.1 to 1.6.3

    • waldo from 0.4.0 to 0.5.1

    • xfun from 0.37 to 0.39

    • xml2 from 1.3.3 to 1.3.5

    • zip from 2.2.2 to 2.3.0

  • Upgraded Java libraries:

    • com.fasterxml.jackson.core.jackson-annotations from 2.14.2 to 2.15.2

    • com.fasterxml.jackson.core.jackson-core from 2.14.2 to 2.15.2

    • com.fasterxml.jackson.core.jackson-databind from 2.14.2 to 2.15.2

    • com.fasterxml.jackson.dataformat.jackson-dataformat-cbor from 2.14.2 to 2.15.2

    • com.fasterxml.jackson.datatype.jackson-datatype-joda from 2.14.2 to 2.15.2

    • com.fasterxml.jackson.datatype.jackson-datatype-jsr310 from 2.13.4 to 2.15.1

    • com.fasterxml.jackson.module.jackson-module-paranamer from 2.14.2 to 2.15.2

    • com.fasterxml.jackson.module.jackson-module-scala_2.12 from 2.14.2 to 2.15.2

    • com.github.luben.zstd-jni from 1.5.2-5 to 1.5.5-4

    • com.google.code.gson.gson from 2.8.9 to 2.10.1

    • com.google.crypto.tink.tink from 1.7.0 to 1.9.0

    • commons-codec.commons-codec from 1.15 to 1.16.0

    • commons-io.commons-io from 2.11.0 to 2.13.0

    • io.airlift.aircompressor from 0.21 to 0.24

    • io.dropwizard.metrics.metrics-core from 4.2.10 to 4.2.19

    • io.dropwizard.metrics.metrics-graphite from 4.2.10 to 4.2.19

    • io.dropwizard.metrics.metrics-healthchecks from 4.2.10 to 4.2.19

    • io.dropwizard.metrics.metrics-jetty9 from 4.2.10 to 4.2.19

    • io.dropwizard.metrics.metrics-jmx from 4.2.10 to 4.2.19

    • io.dropwizard.metrics.metrics-json from 4.2.10 to 4.2.19

    • io.dropwizard.metrics.metrics-jvm from 4.2.10 to 4.2.19

    • io.dropwizard.metrics.metrics-servlets from 4.2.10 to 4.2.19

    • io.netty.netty-all from 4.1.87.Final to 4.1.93.Final

    • io.netty.netty-buffer from 4.1.87.Final to 4.1.93.Final

    • io.netty.netty-codec from 4.1.87.Final to 4.1.93.Final

    • io.netty.netty-codec-http from 4.1.87.Final to 4.1.93.Final

    • io.netty.netty-codec-http2 from 4.1.87.Final to 4.1.93.Final

    • io.netty.netty-codec-socks from 4.1.87.Final to 4.1.93.Final

    • io.netty.netty-common from 4.1.87.Final to 4.1.93.Final

    • io.netty.netty-handler from 4.1.87.Final to 4.1.93.Final

    • io.netty.netty-handler-proxy from 4.1.87.Final to 4.1.93.Final

    • io.netty.netty-resolver from 4.1.87.Final to 4.1.93.Final

    • io.netty.netty-transport from 4.1.87.Final to 4.1.93.Final

    • io.netty.netty-transport-classes-epoll from 4.1.87.Final to 4.1.93.Final

    • io.netty.netty-transport-classes-kqueue from 4.1.87.Final to 4.1.93.Final

    • io.netty.netty-transport-native-epoll from 4.1.87.Final-linux-x86_64 to 4.1.93.Final-linux-x86_64

    • io.netty.netty-transport-native-kqueue from 4.1.87.Final-osx-x86_64 to 4.1.93.Final-osx-x86_64

    • io.netty.netty-transport-native-unix-common from 4.1.87.Final to 4.1.93.Final

    • org.apache.arrow.arrow-format from 11.0.0 to 12.0.1

    • org.apache.arrow.arrow-memory-core from 11.0.0 to 12.0.1

    • org.apache.arrow.arrow-memory-netty from 11.0.0 to 12.0.1

    • org.apache.arrow.arrow-vector from 11.0.0 to 12.0.1

    • org.apache.avro.avro from 1.11.1 to 1.11.2

    • org.apache.avro.avro-ipc from 1.11.1 to 1.11.2

    • org.apache.avro.avro-mapred from 1.11.1 to 1.11.2

    • org.apache.commons.commons-compress from 1.21 to 1.23.0

    • org.apache.hadoop.hadoop-client-runtime from 3.3.4 to 3.3.6

    • org.apache.logging.log4j.log4j-1.2-api from 2.19.0 to 2.20.0

    • org.apache.logging.log4j.log4j-api from 2.19.0 to 2.20.0

    • org.apache.logging.log4j.log4j-core from 2.19.0 to 2.20.0

    • org.apache.logging.log4j.log4j-slf4j2-impl from 2.19.0 to 2.20.0

    • org.apache.orc.orc-core from 1.8.4-shaded-protobuf to 1.9.0-shaded-protobuf

    • org.apache.orc.orc-mapreduce from 1.8.4-shaded-protobuf to 1.9.0-shaded-protobuf

    • org.apache.orc.orc-shims from 1.8.4 to 1.9.0

    • org.apache.xbean.xbean-asm9-shaded from 4.22 to 4.23

    • org.checkerframework.checker-qual from 3.19.0 to 3.31.0

    • org.glassfish.jersey.containers.jersey-container-servlet from 2.36 to 2.40

    • org.glassfish.jersey.containers.jersey-container-servlet-core from 2.36 to 2.40

    • org.glassfish.jersey.core.jersey-client from 2.36 to 2.40

    • org.glassfish.jersey.core.jersey-common from 2.36 to 2.40

    • org.glassfish.jersey.core.jersey-server from 2.36 to 2.40

    • org.glassfish.jersey.inject.jersey-hk2 from 2.36 to 2.40

    • org.javassist.javassist from 3.25.0-GA to 3.29.2-GA

    • org.mariadb.jdbc.mariadb-java-client from 2.7.4 to 2.7.9

    • org.postgresql.postgresql from 42.3.8 to 42.6.0

    • org.roaringbitmap.RoaringBitmap from 0.9.39 to 0.9.45

    • org.roaringbitmap.shims from 0.9.39 to 0.9.45

    • org.rocksdb.rocksdbjni from 7.8.3 to 8.3.2

    • org.scala-lang.modules.scala-collection-compat_2.12 from 2.4.3 to 2.9.0

    • org.slf4j.jcl-over-slf4j from 2.0.6 to 2.0.7

    • org.slf4j.jul-to-slf4j from 2.0.6 to 2.0.7

    • org.slf4j.slf4j-api from 2.0.6 to 2.0.7

    • org.xerial.snappy.snappy-java from 1.1.10.1 to 1.1.10.3

    • org.yaml.snakeyaml from 1.33 to 2.0

Apache Spark

Databricks Runtime 14.0. This release includes all Spark fixes and improvements included in Databricks Runtime 13.3 LTS, as well as the following additional bug fixes and improvements made to Spark:

  • You can now set cluster environment variable SNOWFLAKE_SPARK_CONNECTOR_VERSION=2.12 to use Spark-snowflake connector v2.12.0.

  • [SPARK-44877] [DBRRM-482][SC-140437][CONNECT][PYTHON] Support python protobuf functions for Spark Connect

  • [SPARK-44882] [DBRRM-463][SC-140430][PYTHON][CONNECT] Remove function uuid/random/chr from PySpark

  • [SPARK-44740] [DBRRM-462][SC-140320][CONNECT][FOLLOW] Fix metadata values for Artifacts

  • [SPARK-44822] [DBRRM-464][PYTHON][SQL] Make Python UDTFs by default non-deterministic

  • [SPARK-44836] [DBRRM-468][SC-140228][PYTHON] Refactor Arrow Python UDTF

  • [SPARK-44738] [DBRRM-462][SC-139347][PYTHON][CONNECT] Add missing client metadata to calls

  • [SPARK-44722] [DBRRM-462][SC-139306][CONNECT] ExecutePlanResponseReattachableIterator.calliter: AttributeError: ‘NoneType’ object has no attribute ‘message’

  • [SPARK-44625] [DBRRM-396][SC-139535][CONNECT] SparkConnectExecutionManager to track all executions

  • [SPARK-44663] [SC-139020][DBRRM-420][PYTHON] Disable arrow optimization by default for Python UDTFs

  • [SPARK-44709] [DBRRM-396][SC-139250][CONNECT] Run ExecuteGrpcResponseSender in reattachable execute in new thread to fix flow control

  • [SPARK-44656] [DBRRM-396][SC-138924][CONNECT] Make all iterators CloseableIterators

  • [SPARK-44671] [DBRRM-396][SC-138929][PYTHON][CONNECT] Retry ExecutePlan in case initial request didn’t reach server in Python client

  • [SPARK-44624] [DBRRM-396][SC-138919][CONNECT] Retry ExecutePlan in case initial request didn’t reach server

  • [SPARK-44574] [DBRRM-396][SC-138288][SQL][CONNECT] Errors that moved into sq/api should also use AnalysisException

  • [SPARK-44613] [DBRRM-396][SC-138473][CONNECT] Add Encoders object

  • [SPARK-44626] [DBRRM-396][SC-138828][SS][CONNECT] Followup on streaming query termination when client session is timed out for Spark Connect

  • [SPARK-44642] [DBRRM-396][SC-138882][CONNECT] ReleaseExecute in ExecutePlanResponseReattachableIterator after it gets error from server

  • [SPARK-41400] [DBRRM-396][SC-138287][CONNECT] Remove Connect Client Catalyst Dependency

  • [SPARK-44664] [DBRRM-396][PYTHON][CONNECT] Release the execute when closing the iterator in Python client

  • [SPARK-44631] [DBRRM-396][SC-138823][CONNECT][CORE][14.0.0] Remove session-based directory when the isolated session cache is evicted

  • [SPARK-42941] [DBRRM-396][SC-138389][SS][CONNECT] Python StreamingQueryListener

  • [SPARK-44636] [DBRRM-396][SC-138570][CONNECT] Leave no dangling iterators

  • [SPARK-44424] [DBRRM-396][CONNECT][PYTHON][14.0.0] Python client for reattaching to existing execute in Spark Connect

  • [SPARK-44637] [SC-138571] Synchronize accesses to ExecuteResponseObserver

  • [SPARK-44538] [SC-138178][CONNECT][SQL] Reinstate Row.jsonValue and friends

  • [SPARK-44421] [SC-138434][SPARK-44423][CONNECT] Reattachable execution in Spark Connect

  • [SPARK-44418] [SC-136807][PYTHON][CONNECT] Upgrade protobuf from 3.19.5 to 3.20.3

  • [SPARK-44587] [SC-138315][SQL][CONNECT] Increase protobuf marshaller recursion limit

  • [SPARK-44591] [SC-138292][CONNECT][SQL] Add jobTags to SparkListenerSQLExecutionStart

  • [SPARK-44610] [SC-138368][SQL] DeduplicateRelations should retain Alias metadata when creating a new instance

  • [SPARK-44542] [SC-138323][CORE] Eagerly load SparkExitCode class in exception handler

  • [SPARK-44264] [SC-138143][PYTHON]E2E Testing for Deepspeed

  • [SPARK-43997] [SC-138347][CONNECT] Add support for Java UDFs

  • [SPARK-44507] [SQL][CONNECT][14.x][14.0] Move AnalysisException to sql/api

  • [SPARK-44453] [SC-137013][PYTHON] Use difflib to display errors in assertDataFrameEqual

  • [SPARK-44394] [SC-138291][CONNECT][WEBUI][14.0] Add a Spark UI page for Spark Connect

  • [SPARK-44611] [SC-138415][CONNECT] Do not exclude scala-xml

  • [SPARK-44531] [SC-138044][CONNECT][SQL][14.x][14.0] Move encoder inference to sql/api

  • [SPARK-43744] [SC-138289][CONNECT][14.x][14.0] Fix class loading problem cau…

  • [SPARK-44590] [SC-138296][SQL][CONNECT] Remove the arrow batch record limit for SqlCommandResult

  • [SPARK-43968] [SC-138115][PYTHON] Improve error messages for Python UDTFs with wrong number of outputs

  • [SPARK-44432] [SC-138293][SS][CONNECT] Terminate streaming queries when a session times out in Spark Connect

  • [SPARK-44584] [SC-138295][CONNECT] Set client_type information for AddArtifactsRequest and ArtifactStatusesRequest in Scala Client

  • [SPARK-44552] [14.0][SC-138176][SQL] Remove private object ParseState definition from IntervalUtils

  • [SPARK-43660] [SC-136183][CONNECT][PS] Enable resample with Spark Connect

  • [SPARK-44287] [SC-136223][SQL] Use PartitionEvaluator API in RowToColumnarExec & ColumnarToRowExec SQL operators.

  • [SPARK-39634] [SC-137566][SQL] Allow file splitting in combination with row index generation

  • [SPARK-44533] [SC-138058][PYTHON] Add support for accumulator, broadcast, and Spark files in Python UDTF’s analyze

  • [SPARK-44479] [SC-138146][PYTHON] Fix ArrowStreamPandasUDFSerializer to accept no-column pandas DataFrame

  • [SPARK-44425] [SC-138177][CONNECT] Validate that user provided sessionId is an UUID

  • [SPARK-44535] [SC-138038][CONNECT][SQL] Move required Streaming API to sql/api

  • [SPARK-44264] [SC-136523][ML][PYTHON] Write a Deepspeed Distributed Learning Class DeepspeedTorchDistributor

  • [SPARK-42098] [SC-138164][SQL] Fix ResolveInlineTables can not handle with RuntimeReplaceable expression

  • [SPARK-44060] [SC-135693][SQL] Code-gen for build side outer shuffled hash join

  • [SPARK-44496] [SC-137682][SQL][CONNECT] Move Interfaces needed by SCSC to sql/api

  • [SPARK-44532] [SC-137893][CONNECT][SQL] Move ArrowUtils to sql/api

  • [SPARK-44413] [SC-137019][PYTHON] Clarify error for unsupported arg data type in assertDataFrameEqual

  • [SPARK-44530] [SC-138036][CORE][CONNECT] Move SparkBuildInfo to common/util

  • [SPARK-36612] [SC-133071][SQL] Support left outer join build left or right outer join build right in shuffled hash join

  • [SPARK-44519] [SC-137728][CONNECT] SparkConnectServerUtils generated incorrect parameters for jars

  • [SPARK-44449] [SC-137818][CONNECT] Upcasting for direct Arrow Deserialization

  • [SPARK-44131] [SC-136346][SQL] Add call_function and deprecate call_udf for Scala API

  • [SPARK-44541] [SQL] Remove useless function hasRangeExprAgainstEventTimeCol from UnsupportedOperationChecker

  • [SPARK-44523] [SC-137859][SQL] Filter’s maxRows/maxRowsPerPartition is 0 if condition is FalseLiteral

  • [SPARK-44540] [SC-137873][UI] Remove unused stylesheet and javascript files of jsonFormatter

  • [SPARK-44466] [SC-137856][SQL] Exclude configs starting with SPARK_DRIVER_PREFIX and SPARK_EXECUTOR_PREFIX from modifiedConfigs

  • [SPARK-44477] [SC-137508][SQL] Treat TYPE_CHECK_FAILURE_WITH_HINT as an error subclass

  • [SPARK-44509] [SC-137855][PYTHON][CONNECT] Add job cancellation API set in Spark Connect Python client

  • [SPARK-44059] [SC-137023] Add analyzer support of named arguments for built-in functions

  • [SPARK-38476] [SC-136448][CORE] Use error class in org.apache.spark.storage

  • [SPARK-44486] [SC-137817][PYTHON][CONNECT] Implement PyArrow self_destruct feature for toPandas

  • [SPARK-44361] [SC-137200][SQL] Use PartitionEvaluator API in MapInBatchExec

  • [SPARK-44510] [SC-137652][UI] Update dataTables to 1.13.5 and remove some unreached png files

  • [SPARK-44503] [SC-137808][SQL] Add SQL grammar for PARTITION BY and ORDER BY clause after TABLE arguments for TVF calls

  • [SPARK-38477] [SC-136319][CORE] Use error class in org.apache.spark.shuffle

  • [SPARK-44299] [SC-136088][SQL] Assign names to the error class LEGACYERROR_TEMP_227[4-6,8]

  • [SPARK-44422] [SC-137567][CONNECT] Spark Connect fine grained interrupt

  • [SPARK-44380] [SC-137415][SQL][PYTHON] Support for Python UDTF to analyze in Python

  • [SPARK-43923] [SC-137020][CONNECT] Post listenerBus events durin…

  • [SPARK-44303] [SC-136108][SQL] Assign names to the error class LEGACYERROR_TEMP_[2320-2324]

  • [SPARK-44294] [SC-135885][UI] Fix HeapHistogram column shows unexpectedly w/ select-all-box

  • [SPARK-44409] [SC-136975][SQL] Handle char/varchar in Dataset.to to keep consistent with others

  • [SPARK-44334] [SC-136576][SQL][UI] Status in the REST API response for a failed DDL/DML with no jobs should be FAILED rather than COMPLETED

  • [SPARK-42309] [SC-136703][SQL] Introduce INCOMPATIBLE_DATA_TO_TABLE and sub classes.

  • [SPARK-44367] [SC-137418][SQL][UI] Show error message on UI for each failed query

  • [SPARK-44474] [SC-137195][CONNECT] Reenable “Test observe response” at SparkConnectServiceSuite

  • [SPARK-44320] [SC-136446][SQL] Assign names to the error class LEGACYERROR_TEMP_[1067,1150,1220,1265,1277]

  • [SPARK-44310] [SC-136055][CONNECT] The Connect Server startup log should display the hostname and port

  • [SPARK-44309] [SC-136193][UI] Display Add/Remove Time of Executors on Executors Tab

  • [SPARK-42898] [SC-137556][SQL] Mark that string/date casts do not need time zone id

  • [SPARK-44475] [SC-137422][SQL][CONNECT] Relocate DataType and Parser to sql/api

  • [SPARK-44484] [SC-137562][SS]Add batchDuration to StreamingQueryProgress json method

  • [SPARK-43966] [SC-137559][SQL][PYTHON] Support non-deterministic table-valued functions

  • [SPARK-44439] [SC-136973][CONNECT][SS]Fixed listListeners to only send ids back to client

  • [SPARK-44341] [SC-137054][SQL][PYTHON] Define the computing logic through PartitionEvaluator API and use it in WindowExec and WindowInPandasExec

  • [SPARK-43839] [SC-132680][SQL] Convert _LEGACY_ERROR_TEMP_1337 to UNSUPPORTED_FEATURE.TIME_TRAVEL

  • [SPARK-44244] [SC-135703][SQL] Assign names to the error class LEGACYERROR_TEMP_[2305-2309]

  • [SPARK-44201] [SC-136778][CONNECT][SS]Add support for Streaming Listener in Scala for Spark Connect

  • [SPARK-44260] [SC-135618][SQL] Assign names to the error class LEGACYERROR_TEMP_[1215-1245-2329] & Use checkError() to check Exception in CharVarcharSuite

  • [SPARK-42454] [SC-136913][SQL] SPJ: encapsulate all SPJ related parameters in BatchScanExec

  • [SPARK-44292] [SC-135844][SQL] Assign names to the error class LEGACYERROR_TEMP_[2315-2319]

  • [SPARK-44396] [SC-137221][Connect] Direct Arrow Deserialization

  • [SPARK-44324] [SC-137172][SQL][CONNECT] Move CaseInsensitiveMap to sql/api

  • [SPARK-44395] [SC-136744][SQL] Add test back to StreamingTableSuite

  • [SPARK-44481] [SC-137401][CONNECT][PYTHON] Make pyspark.sql.is_remote an API

  • [SPARK-44278] [SC-137400][CONNECT] Implement a GRPC server interceptor that cleans up thread local properties

  • [SPARK-44264] [SC-137211][ML][PYTHON] Support Distributed Training of Functions Using Deepspeed

  • [SPARK-44430] [SC-136970][SQL] Add cause to AnalysisException when option is invalid

  • [SPARK-44264] [SC-137167][ML][PYTHON] Incorporating FunctionPickler Into TorchDistributor

  • [SPARK-44216] [SC-137046] [PYTHON] Make assertSchemaEqual API public

  • [SPARK-44398] [SC-136720][CONNECT] Scala foreachBatch API

  • [SPARK-43203] [SC-134528][SQL] Move all Drop Table case to DataSource V2

  • [SPARK-43755] [SC-137171][CONNECT][MINOR] Open AdaptiveSparkPlanHelper.allChildren instead of using copy in MetricGenerator

  • [SPARK-44264] [SC-137187][ML][PYTHON] Refactoring TorchDistributor To Allow for Custom “run_training_on_file” Function Pointer

  • [SPARK-43755] [SC-136838][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

  • [SPARK-44411] [SC-137198][SQL] Use PartitionEvaluator API in ArrowEvalPythonExec and BatchEvalPythonExec

  • [SPARK-44375] [SC-137197][SQL] Use PartitionEvaluator API in DebugExec

  • [SPARK-43967] [SC-137057][PYTHON] Support regular Python UDTFs with empty return values

  • [SPARK-43915] [SC-134766][SQL] Assign names to the error class LEGACYERROR_TEMP_[2438-2445]

  • [SPARK-43965] [SC-136929][PYTHON][CONNECT] Support Python UDTF in Spark Connect

  • [SPARK-44154] [SC-137050][SQL] Added more unit tests to BitmapExpressionUtilsSuite and made minor improvements to Bitmap Aggregate Expressions

  • [SPARK-44169] [SC-135497][SQL] Assign names to the error class LEGACYERROR_TEMP_[2300-2304]

  • [SPARK-44353] [SC-136578][CONNECT][SQL] Remove StructType.toAttributes

  • [SPARK-43964] [SC-136676][SQL][PYTHON] Support arrow-optimized Python UDTFs

  • [SPARK-44321] [SC-136308][CONNECT] Decouple ParseException from AnalysisException

  • [SPARK-44348] [SAS-1910][SC-136644][CORE][CONNECT][PYTHON] Reenable test_artifact with relevant changes

  • [SPARK-44145] [SC-136698][SQL] Callback when ready for execution

  • [SPARK-43983] [SC-136404][PYTHON][ML][CONNECT] Enable cross validator estimator test

  • [SPARK-44399] [SC-136669][PYHTON][CONNECT] Import SparkSession in Python UDF only when useArrow is None

  • [SPARK-43631] [SC-135300][CONNECT][PS] Enable Series.interpolate with Spark Connect

  • [SPARK-44374] [SC-136544][PYTHON][ML] Add example code for distributed ML for spark connect

  • [SPARK-44282] [SC-135948][CONNECT] Prepare DataType parsing for use in Spark Connect Scala Client

  • [SPARK-44052] [SC-134469][CONNECT][PS] Add util to get proper Column or DataFrame class for Spark Connect.

  • [SPARK-43983] [SC-136404][PYTHON][ML][CONNECT] Implement cross validator estimator

  • [SPARK-44290] [SC-136300][CONNECT] Session-based files and archives in Spark Connect

  • [SPARK-43710] [SC-134860][PS][CONNECT] Support functions.date_part for Spark Connect

  • [SPARK-44036] [SC-134036][CONNECT][PS] Cleanup & consolidate tickets to simplify the tasks.

  • [SPARK-44150] [SC-135790][PYTHON][CONNECT] Explicit Arrow casting for mismatched return type in Arrow Python UDF

  • [SPARK-43903] [SC-134754][PYTHON][CONNECT] Improve ArrayType input support in Arrow Python UDF

  • [SPARK-44250] [SC-135819][ML][PYTHON][CONNECT] Implement classification evaluator

  • [SPARK-44255] [SC-135704][SQL] Relocate StorageLevel to common/utils

  • [SPARK-42169] [SC-135735] [SQL] Implement code generation for to_csv function (StructsToCsv)

  • [SPARK-44249] [SC-135719][SQL][PYTHON] Refactor PythonUDTFRunner to send its return type separately

  • [SPARK-43353] [SC-132734][PYTHON] Migrate remaining session errors into error class

  • [SPARK-44133] [SC-134795][PYTHON] Upgrade MyPy from 0.920 to 0.982

  • [SPARK-42941] [SC-134707][SS][CONNECT][1/2] StreamingQueryListener - Event Serde in JSON format

  • [SPARK-43353] Revert “[SC-132734][ES-729763][PYTHON] Migrate remaining session errors into error class”

  • [SPARK-44100] [SC-134576][ML][CONNECT][PYTHON] Move namespace from pyspark.mlv2 to pyspark.ml.connect

  • [SPARK-44220] [SC-135484][SQL] Move StringConcat to sql/api

  • [SPARK-43992] [SC-133645][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listFunctions

  • [SPARK-43982] [SC-134529][ML][PYTHON][CONNECT] Implement pipeline estimator for ML on spark connect

  • [SPARK-43888] [SC-132893][CORE] Relocate Logging to common/utils

  • [SPARK-42941] Revert “[SC-134707][SS][CONNECT][1/2] StreamingQueryListener - Event Serde in JSON format”

  • [SPARK-43624] [SC-134557][PS][CONNECT] Add EWM to SparkConnectPlanner.

  • [SPARK-43981] [SC-134137][PYTHON][ML] Basic saving / loading implementation for ML on spark connect

  • [SPARK-43205] [SC-133371][SQL] fix SQLQueryTestSuite

  • [SPARK-43376] Revert “[SC-130433][SQL] Improve reuse subquery with table cache”

  • [SPARK-44040] [SC-134366][SQL] Fix compute stats when AggregateExec node above QueryStageExec

  • [SPARK-43919] [SC-133374][SQL] Extract JSON functionality out of Row

  • [SPARK-42618] [SC-134433][PYTHON][PS] Warning for the pandas-related behavior changes in next major release

  • [SPARK-43893] [SC-133381][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

  • [SPARK-43627] [SC-134290][SPARK-43626][PS][CONNECT] Enable pyspark.pandas.spark.functions.{kurt, skew} in Spark Connect.

  • [SPARK-43798] [SC-133990][SQL][PYTHON] Support Python user-defined table functions

  • [SPARK-43616] [SC-133849][PS][CONNECT] Enable pyspark.pandas.spark.functions.mode in Spark Connect

  • [SPARK-43133] [SC-133728] Scala Client DataStreamWriter Foreach support

  • [SPARK-43684] [SC-134107][SPARK-43685][SPARK-43686][SPARK-43691][CONNECT][PS] Fix (NullOps|NumOps).(eq|ne) for Spark Connect.

  • [SPARK-43645] [SC-134151][SPARK-43622][PS][CONNECT] Enable pyspark.pandas.spark.functions.{var, stddev} in Spark Connect

  • [SPARK-43617] [SC-133893][PS][CONNECT] Enable pyspark.pandas.spark.functions.product in Spark Connect

  • [SPARK-43610] [SC-133832][CONNECT][PS] Enable InternalFrame.attach_distributed_column in Spark Connect.

  • [SPARK-43621] [SC-133852][PS][CONNECT] Enable pyspark.pandas.spark.functions.repeat in Spark Connect

  • [SPARK-43921] [SC-133461][PROTOBUF] Generate Protobuf descriptor files at build time

  • [SPARK-43613] [SC-133727][PS][CONNECT] Enable pyspark.pandas.spark.functions.covar in Spark Connect

  • [SPARK-43376] [SC-130433][SQL] Improve reuse subquery with table cache

  • [SPARK-43612] [SC-132011][CONNECT][PYTHON] Implement SparkSession.addArtifact(s) in Python client

  • [SPARK-43920] [SC-133611][SQL][CONNECT] Create sql/api module

  • [SPARK-43097] [SC-133372][ML] New pyspark ML logistic regression estimator implemented on top of distributor

  • [SPARK-43783] [SC-133240][SPARK-43784][SPARK-43788][ML] Make MLv2 (ML on spark connect) supports pandas >= 2.0

  • [SPARK-43024] [SC-132716][PYTHON] Upgrade pandas to 2.0.0

  • [SPARK-43881] [SC-133140][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listDatabases

  • [SPARK-39281] [SC-131422][SQL] Speed up Timestamp type inference with legacy format in JSON/CSV data source

  • [SPARK-43792] [SC-132887][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listCatalogs

  • [SPARK-43132] [SC-131623] [SS] [CONNECT] Python Client DataStreamWriter foreach() API

  • [SPARK-43545] [SC-132378][SQL][PYTHON] Support nested timestamp type

  • [SPARK-43353] [SC-132734][PYTHON] Migrate remaining session errors into error class

  • [SPARK-43304] [SC-129969][CONNECT][PYTHON] Migrate NotImplementedError into PySparkNotImplementedError

  • [SPARK-43516] [SC-132202][ML][PYTHON][CONNECT] Base interfaces of sparkML for spark3.5: estimator/transformer/model/evaluator

  • [SPARK-43128] Revert “[SC-131628][CONNECT][SS] Make recentProgress and lastProgress return StreamingQueryProgress consistent with the native Scala Api”

  • [SPARK-43543] [SC-131839][PYTHON] Fix nested MapType behavior in Pandas UDF

  • [SPARK-38469] [SC-131425][CORE] Use error class in org.apache.spark.network

  • [SPARK-43309] [SC-129746][SPARK-38461][CORE] Extend INTERNAL_ERROR with categories and add error class INTERNAL_ERROR_BROADCAST

  • [SPARK-43265] [SC-129653] Move Error framework to a common utils module

  • [SPARK-43440] [SC-131229][PYTHON][CONNECT] Support registration of an Arrow-optimized Python UDF

  • [SPARK-43528] [SC-131531][SQL][PYTHON] Support duplicated field names in createDataFrame with pandas DataFrame

  • [SPARK-43412] [SC-130990][PYTHON][CONNECT] Introduce SQL_ARROW_BATCHED_UDF EvalType for Arrow-optimized Python UDFs

  • [SPARK-40912] [SC-130986][CORE]Overhead of Exceptions in KryoDeserializationStream

  • [SPARK-39280] [SC-131206][SQL] Speed up Timestamp type inference with user-provided format in JSON/CSV data source

  • [SPARK-43473] [SC-131372][PYTHON] Support struct type in createDataFrame from pandas DataFrame

  • [SPARK-43443] [SC-131024][SQL] Add benchmark for Timestamp type inference when use invalid value

  • [SPARK-41532] [SC-130523][CONNECT][CLIENT] Add check for operations that involve multiple data frames

  • [SPARK-43296] [SC-130627][CONNECT][PYTHON] Migrate Spark Connect session errors into error class

  • [SPARK-43324] [SC-130455][SQL] Handle UPDATE commands for delta-based sources

  • [SPARK-43347] [SC-130148][PYTHON] Remove Python 3.7 Support

  • [SPARK-43292] [SC-130525][CORE][CONNECT] Move ExecutorClassLoader to core module and simplify Executor#addReplClassLoaderIfNeeded

  • [SPARK-43081] [SC-129900] [ML] [CONNECT] Add torch distributor data loader that loads data from spark partition data

  • [SPARK-43331] [SC-130061][CONNECT] Add Spark Connect SparkSession.interruptAll

  • [SPARK-43306] [SC-130320][PYTHON] Migrate ValueError from Spark SQL types into error class

  • [SPARK-43261] [SC-129674][PYTHON] Migrate TypeError from Spark SQL types into error class.

  • [SPARK-42992] [SC-129465][PYTHON] Introduce PySparkRuntimeError

  • [SPARK-16484] [SC-129975][SQL] Add support for Datasketches HllSketch

  • [SPARK-43165] [SC-128823][SQL] Move canWrite to DataTypeUtils

  • [SPARK-43082] [SC-129112][CONNECT][PYTHON] Arrow-optimized Python UDFs in Spark Connect

  • [SPARK-43084] [SC-128654] [SS] Add applyInPandasWithState support for spark connect

  • [SPARK-42657] [SC-128621][CONNECT] Support to find and transfer client-side REPL classfiles to server as artifacts

  • [SPARK-43098] [SC-77059][SQL] Fix correctness COUNT bug when scalar subquery has group by clause

  • [SPARK-42884] [SC-126662][CONNECT] Add Ammonite REPL integration

  • [SPARK-42994] [SC-128333][ML][CONNECT] PyTorch Distributor support Local Mode

  • [SPARK-41498] [SC-125343]Revert ” Propagate metadata through Union”

  • [SPARK-42993] [SC-127829][ML][CONNECT] Make PyTorch Distributor compatible with Spark Connect

  • [SPARK-42683] [LC-75] Automatically rename conflicting metadata columns

  • [SPARK-42874] [SC-126442][SQL] Enable new golden file test framework for analysis for all input files

  • [SPARK-42779] [SC-126042][SQL] Allow V2 writes to indicate advisory shuffle partition size

  • [SPARK-42891] [SC-126458][CONNECT][PYTHON] Implement CoGrouped Map API

  • [SPARK-42791] [SC-126134][SQL] Create a new golden file test framework for analysis

  • [SPARK-42615] [SC-124237][CONNECT][PYTHON] Refactor the AnalyzePlan RPC and add session.version

  • [SPARK-41302] Revert “[ALL TESTS][SC-122423][SQL] Assign name to LEGACYERROR_TEMP_1185”

  • [SPARK-40770] [SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch

  • [SPARK-40770] Revert “[ALL TESTS][SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch”

  • [SPARK-42398] [SC-123500][SQL] Refine default column value DS v2 interface

  • [SPARK-40770] [ALL TESTS][SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch

  • [SPARK-40770] Revert “[SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch”

  • [SPARK-40770] [SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch

  • [SPARK-42038] [ALL TESTS] Revert “Revert “[SC-122533][SQL] SPJ: Support partially clustered distribution””

  • [SPARK-42038] Revert “[SC-122533][SQL] SPJ: Support partially clustered distribution”

  • [SPARK-42038] [SC-122533][SQL] SPJ: Support partially clustered distribution

  • [SPARK-40550] [SC-120989][SQL] DataSource V2: Handle DELETE commands for delta-based sources

  • [SPARK-40770] Revert “[SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch”

  • [SPARK-40770] [SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch

  • [SPARK-41302] Revert “[SC-122423][SQL] Assign name to LEGACYERROR_TEMP_1185”

  • [SPARK-40550] Revert “[SC-120989][SQL] DataSource V2: Handle DELETE commands for delta-based sources”

  • [SPARK-42123] Revert “[SC-121453][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output”

  • [SPARK-42146] [SC-121172][CORE] Refactor Utils#setStringField to make maven build pass when sql module use this method

  • [SPARK-42119] Revert “[SC-121342][SQL] Add built-in table-valued functions inline and inline_outer”

Highlights

  • Fix aes_decrypt and ln functions in Connect SPARK-45109

  • Fix inherited named tuples to work in createDataFrame SPARK-44980

  • CodeGenerator Cache is now classloader-specific [SPARK-44795]

  • Added SparkListenerConnectOperationStarted.planRequest [SPARK-44861]

  • Make Streaming Queries work with Connect’s artifact management [SPARK-44794]

  • ArrowDeserializer works with REPL generated classes [SPARK-44791]

  • Fixed Arrow-optimized Python UDF on Spark Connect [SPARK-44876]

  • Scala and Go client support in Spark Connect SPARK-42554 SPARK-43351

  • PyTorch-based distributed ML Support for Spark Connect SPARK-42471

  • Structured Streaming support for Spark Connect in Python and Scala SPARK-42938

  • Pandas API support for the Python Spark Connect Client SPARK-42497

  • Introduce Arrow Python UDFs SPARK-40307

  • Support Python user-defined table functions SPARK-43798

  • Migrate PySpark errors onto error classes SPARK-42986

  • PySpark Test Framework SPARK-44042

  • Add support for Datasketches HllSketch SPARK-16484

  • Built-in SQL Function Improvement SPARK-41231

  • IDENTIFIER clause SPARK-43205

  • Add SQL functions into Scala, Python and R API SPARK-43907

  • Add named argument support for SQL functions SPARK-43922

  • Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated SPARK-41469

  • Distributed ML <> spark connect SPARK-42471

  • DeepSpeed Distributor SPARK-44264

  • Implement changelog checkpointing for RocksDB state store SPARK-43421

  • Introduce watermark propagation among operators SPARK-42376

  • Introduce dropDuplicatesWithinWatermark SPARK-42931

  • RocksDB state store provider memory management enhancements SPARK-43311

Spark Connect

  • Refactoring of the sql module into sql and sql-api to produce a minimum set of dependencies that can be shared between the Scala Spark Connect client and Spark and avoids pulling all of the Spark transitive dependencies. SPARK-44273

  • Introducing the Scala client for Spark Connect SPARK-42554

  • Pandas API support for the Python Spark Connect Client SPARK-42497

  • PyTorch-based distributed ML Support for Spark Connect SPARK-42471

  • Structured Streaming support for Spark Connect in Python and Scala SPARK-42938

  • Initial version of the Go client SPARK-43351

  • Lot’s of compatibility improvements between Spark native and the Spark Connect clients across Python and Scala

  • Improved debugability and request handling for client applications (asynchronous processing, retries, long-lived queries)

Spark SQL

Features

  • Add metadata column file block start and length SPARK-42423

  • Support positional parameters in Scala/Java sql() SPARK-44066

  • Add named parameter support in parser for function calls SPARK-43922

  • Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSET for INSERT source relation SPARK-43071

  • Add SQL grammar for PARTITION BY and ORDER BY clause after TABLE arguments for TVF calls SPARK-44503

  • Include column default values in DESCRIBE and SHOW CREATE TABLE output SPARK-42123

  • Add optional pattern for Catalog.listCatalogs SPARK-43792

  • Add optional pattern for Catalog.listDatabases SPARK-43881

  • Callback when ready for execution SPARK-44145

  • Support Insert By Name statement SPARK-42750

  • Add call_function for Scala API SPARK-44131

  • Stable derived column aliases SPARK-40822

  • Support general constant expressions as CREATE/REPLACE TABLE OPTIONS values SPARK-43529

  • Support subqueries with correlation through INTERSECT/EXCEPT SPARK-36124

  • IDENTIFIER clause SPARK-43205

  • ANSI MODE: Conv should return an error if the internal conversion overflows SPARK-42427

Functions

  • Add support for Datasketches HllSketch SPARK-16484

  • Support the CBC mode by aes_encrypt()/aes_decrypt() SPARK-43038

  • Support TABLE argument parser rule for TableValuedFunction SPARK-44200

  • Implement bitmap functions SPARK-44154

  • Add the try_aes_decrypt() function SPARK-42701

  • array_insert should fail with 0 index SPARK-43011

  • Add to_varchar alias for to_char SPARK-43815

  • High-order function: array_compact implementation SPARK-41235

  • Add analyzer support of named arguments for built-in functions SPARK-44059

  • Add NULLs for INSERTs with user-specified lists of fewer columns than the target table SPARK-42521

  • Adds support for aes_encrypt IVs and AAD SPARK-43290

  • DECODE function returns wrong results when passed NULL SPARK-41668

  • Support udf ‘luhn_check’ SPARK-42191

  • Support implicit lateral column alias resolution on Aggregate SPARK-41631

  • Support implicit lateral column alias in queries with Window SPARK-42217

  • Add 3-args function aliases DATE_ADD and DATE_DIFF SPARK-43492

Data Sources

  • Char/Varchar Support for JDBC Catalog SPARK-42904

  • Support Get SQL Keywords Dynamically Thru JDBC API and TVF SPARK-43119

  • DataSource V2: Handle MERGE commands for delta-based sources SPARK-43885

  • DataSource V2: Handle MERGE commands for group-based sources SPARK-43963

  • DataSource V2: Handle UPDATE commands for group-based sources SPARK-43975

  • DataSource V2: Allow representing updates as deletes and inserts SPARK-43775

  • Allow jdbc dialects to override the query used to create a table SPARK-41516

  • SPJ: Support partially clustered distribution SPARK-42038

  • DSv2 allows CTAS/RTAS to reserve schema nullability SPARK-43390

  • Add spark.sql.files.maxPartitionNum SPARK-44021

  • Handle UPDATE commands for delta-based sources SPARK-43324

  • Allow V2 writes to indicate advisory shuffle partition size SPARK-42779

  • Support lz4raw compression codec for Parquet SPARK-43273

  • Avro: writing complex unions SPARK-25050

  • Speed up Timestamp type inference with user-provided format in JSON/CSV data source SPARK-39280

  • Avro to Support custom decimal type backed by Long SPARK-43901

  • Avoid shuffle in Storage-Partitioned Join when partition keys mismatch, but join expressions are compatible SPARK-41413

  • Change binary to unsupported dataType in CSV format SPARK-42237

  • Allow Avro to convert union type to SQL with field name stable with type SPARK-43333

  • Speed up Timestamp type inference with legacy format in JSON/CSV data source SPARK-39281

Query Optimization

  • Subexpression elimination support shortcut expression SPARK-42815

  • Improve join stats estimation if one side can keep uniqueness SPARK-39851

  • Introduce the group limit of Window for rank-based filter to optimize top-k computation SPARK-37099

  • Fix behavior of null IN (empty list) in optimization rules SPARK-44431

  • Infer and push down window limit through window if partitionSpec is empty SPARK-41171

  • Remove the outer join if they are all distinct aggregate functions SPARK-42583

  • Collapse two adjacent windows with the same partition/order in subquery SPARK-42525

  • Push down limit through Python UDFs SPARK-42115

  • Optimize the order of filtering predicates SPARK-40045

Code Generation and Query Execution

  • Runtime filter should supports multi level shuffle join side as filter creation side SPARK-41674

  • Codegen Support for HiveSimpleUDF SPARK-42052

  • Codegen Support for HiveGenericUDF SPARK-42051

  • Codegen Support for build side outer shuffled hash join SPARK-44060

  • Implement code generation for to_csv function (StructsToCsv) SPARK-42169

  • Make AQE support InMemoryTableScanExec SPARK-42101

  • Support left outer join build left or right outer join build right in shuffled hash join SPARK-36612

  • Respect RequiresDistributionAndOrdering in CTAS/RTAS SPARK-43088

  • Coalesce buckets in join applied on broadcast join stream side SPARK-43107

  • Set nullable correctly on coalesced join key in full outer USING join SPARK-44251

  • Fix IN subquery ListQuery nullability SPARK-43413

Other Notable Changes

  • Set nullable correctly for keys in USING joins SPARK-43718

  • Fix COUNT(*) is null bug in correlated scalar subquery SPARK-43156

  • Dataframe.joinWith outer-join should return a null value for unmatched row SPARK-37829

  • Automatically rename conflicting metadata columns SPARK-42683

  • Document the Spark SQL error classes in user-facing documentation SPARK-42706

PySpark

Features

  • Support positional parameters in Python sql() SPARK-44140

  • Support parameterized SQL by sql() SPARK-41666

  • Support Python user-defined table functions SPARK-43797

  • Support to set Python executable for UDF and pandas function APIs in workers during runtime SPARK-43574

  • Add DataFrame.offset to PySpark SPARK-43213

  • Implement dir() in pyspark.sql.dataframe.DataFrame to include columns SPARK-43270

  • Add option to use large variable width vectors for arrow UDF operations SPARK-39979

  • Make mapInPandas / mapInArrow support barrier mode execution SPARK-42896

  • Add JobTag APIs to PySpark SparkContext SPARK-44194

  • Support for Python UDTF to analyze in Python SPARK-44380

  • Expose TimestampNTZType in pyspark.sql.types SPARK-43759

  • Support nested timestamp type SPARK-43545

  • Support UserDefinedType in createDataFrame from pandas DataFrame and toPandas [[SPARK-43817](https://issues.apache.org/jira/browse/SPARK-43817)][SPARK-43702]https://issues.apache.org/jira/browse/SPARK-43702)

  • Add descriptor binary option to Pyspark Protobuf API SPARK-43799

  • Accept generics tuple as typing hints of Pandas UDF SPARK-43886

  • Add array_prepend function SPARK-41233

  • Add assertDataFrameEqual util function SPARK-44061

  • Support arrow-optimized Python UDTFs SPARK-43964

  • Allow custom precision for fp approx equality SPARK-44217

  • Make assertSchemaEqual API public SPARK-44216

  • Support fill_value for ps.Series SPARK-42094

  • Support struct type in createDataFrame from pandas DataFrame SPARK-43473

Other Notable Changes

  • Add autocomplete support for df[|] in pyspark.sql.dataframe.DataFrame [SPARK-43892]

  • Deprecate & remove the APIs that will be removed in pandas 2.0 [SPARK-42593]

  • Make Python the first tab for code examples - Spark SQL, DataFrames and Datasets Guide SPARK-42493

  • Updating remaining Spark documentation code examples to show Python by default SPARK-42642

  • Use deduplicated field names when creating Arrow RecordBatch [SPARK-41971]

  • Support duplicated field names in createDataFrame with pandas DataFrame [SPARK-43528]

  • Allow columns parameter when creating DataFrame with Series [SPARK-42194]

Core

  • Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks SPARK-40082

  • Introduce PartitionEvaluator for SQL operator execution SPARK-43061

  • Allow ShuffleDriverComponent to declare if shuffle data is reliably stored SPARK-42689

  • Add max attempts limitation for stages to avoid potential infinite retry SPARK-42577

  • Support log level configuration with static Spark conf SPARK-43782

  • Optimize PercentileHeap SPARK-42528

  • Add reason argument to TaskScheduler.cancelTasks SPARK-42602

  • Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated SPARK-41469

  • Fixing accumulator undercount in the case of the retry task with rdd cache SPARK-41497

  • Use RocksDB for spark.history.store.hybridStore.diskBackend by default SPARK-42277

  • NonFateSharingCache wrapper for Guava Cache SPARK-43300

  • Improve the performance of MapOutputTracker.updateMapOutput SPARK-43043

  • Allowing apps to control whether their metadata gets saved in the db by the External Shuffle Service SPARK-43179

  • Add SPARK_DRIVER_POD_IP env variable to executor pods SPARK-42769

  • Mounts the hadoop config map on the executor pod SPARK-43504

Structured Streaming

  • Add support for tracking pinned blocks memory usage for RocksDB state store SPARK-43120

  • Add RocksDB state store provider memory management enhancements SPARK-43311

  • Introduce dropDuplicatesWithinWatermark SPARK-42931

  • Introduce a new callback onQueryIdle() to StreamingQueryListener SPARK-43183

  • Add option to skip commit coordinator as part of StreamingWrite API for DSv2 sources/sinks SPARK-42968

  • Introduce a new callback “onQueryIdle” to StreamingQueryListener SPARK-43183

  • Implement Changelog based Checkpointing for RocksDB State Store Provider SPARK-43421

  • Add support for WRITE_FLUSH_BYTES for RocksDB used in streaming stateful operators SPARK-42792

  • Add support for setting max_write_buffer_number and write_buffer_size for RocksDB used in streaming SPARK-42819

  • RocksDB StateStore lock acquisition should happen after getting input iterator from inputRDD SPARK-42566

  • Introduce watermark propagation among operators SPARK-42376

  • Cleanup orphan sst and log files in RocksDB checkpoint directory SPARK-42353

  • Expand QueryTerminatedEvent to contain error class if it exists in exception SPARK-43482

ML

  • Support Distributed Training of Functions Using Deepspeed SPARK-44264

  • Base interfaces of sparkML for spark3.5: estimator/transformer/model/evaluator SPARK-43516

  • Make MLv2 (ML on spark connect) supports pandas >= 2.0 SPARK-43783

  • Update MLv2 Transformer interfaces SPARK-43516

  • New pyspark ML logistic regression estimator implemented on top of distributor SPARK-43097

  • Add Classifier.getNumClasses back SPARK-42526

  • Write a Deepspeed Distributed Learning Class DeepspeedTorchDistributor SPARK-44264

  • Basic saving / loading implementation for ML on spark connect SPARK-43981

  • Improve logistic regression model saving SPARK-43097

  • Implement pipeline estimator for ML on spark connect SPARK-43982

  • Implement cross validator estimator SPARK-43983

  • Implement classification evaluator SPARK-44250

  • Make PyTorch Distributor compatible with Spark Connect SPARK-42993

UI

  • Add a Spark UI page for Spark Connect SPARK-44394

  • Support Heap Histogram column in Executors tab SPARK-44153

  • Show error message on UI for each failed query SPARK-44367

  • Display Add/Remove Time of Executors on Executors Tab SPARK-44309

Build and Others

Removals, Behavior Changes and Deprecations

Upcoming Removal

The following features will be removed in the next Spark major release

  • Support for Java 8 and Java 11, and the minimal supported Java version will be Java 17

  • Support for Scala 2.12, and the minimal supported Scala version will be 2.13

Databricks ODBC/JDBC driver support

Databricks supports ODBC/JDBC drivers released in the past 2 years. Please download the recently released drivers and upgrade (download ODBC, download JDBC).

System environment

  • Operating System: Ubuntu 22.04.3 LTS

  • Java: Zulu 8.70.0.23-CA-linux64

  • Scala: 2.12.15

  • Python: 3.10.12

  • R: 4.3.1

  • Delta Lake: 2.4.0

Installed Python libraries

Library

Version

Library

Version

Library

Version

anyio

3.5.0

argon2-cffi

21.3.0

argon2-cffi-bindings

21.2.0

asttokens

2.0.5

attrs

22.1.0

backcall

0.2.0

beautifulsoup4

4.11.1

black

22.6.0

bleach

4.1.0

blinker

1.4

boto3

1.24.28

botocore

1.27.96

certifi

2022.12.7

cffi

1.15.1

chardet

4.0.0

charset-normalizer

2.0.4

click

8.0.4

comm

0.1.2

contourpy

1.0.5

cryptography

39.0.1

cycler

0.11.0

Cython

0.29.32

databricks-sdk

0.1.6

dbus-python

1.2.18

debugpy

1.6.7

decorator

5.1.1

defusedxml

0.7.1

distlib

0.3.7

docstring-to-markdown

0.11

entrypoints

0.4

executing

0.8.3

facets-overview

1.1.1

fastjsonschema

2.18.0

filelock

3.12.2

fonttools

4.25.0

GCC runtime library

1.10.0

googleapis-common-protos

1.60.0

grpcio

1.48.2

grpcio-status

1.48.1

httplib2

0.20.2

idna

3.4

importlib-metadata

4.6.4

ipykernel

6.25.0

ipython

8.14.0

ipython-genutils

0.2.0

ipywidgets

7.7.2

jedi

0.18.1

jeepney

0.7.1

Jinja2

3.1.2

jmespath

0.10.0

joblib

1.2.0

jsonschema

4.17.3

jupyter-client

7.3.4

jupyter-server

1.23.4

jupyter_core

5.2.0

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.0

keyring

23.5.0

kiwisolver

1.4.4

launchpadlib

1.10.16

lazr.restfulclient

0.14.4

lazr.uri

1.0.6

lxml

4.9.1

MarkupSafe

2.1.1

matplotlib

3.7.0

matplotlib-inline

0.1.6

mccabe

0.7.0

mistune

0.8.4

more-itertools

8.10.0

mypy-extensions

0.4.3

nbclassic

0.5.2

nbclient

0.5.13

nbconvert

6.5.4

nbformat

5.7.0

nest-asyncio

1.5.6

nodeenv

1.8.0

notebook

6.5.2

notebook_shim

0.2.2

numpy

1.23.5

oauthlib

3.2.0

packaging

22.0

pandas

1.5.3

pandocfilters

1.5.0

parso

0.8.3

pathspec

0.10.3

patsy

0.5.3

pexpect

4.8.0

pickleshare

0.7.5

Pillow

9.4.0

pip

22.3.1

platformdirs

2.5.2

plotly

5.9.0

pluggy

1.0.0

prometheus-client

0.14.1

prompt-toolkit

3.0.36

protobuf

4.24.0

psutil

5.9.0

psycopg2

2.9.3

ptyprocess

0.7.0

pure-eval

0.2.2

pyarrow

8.0.0

pycparser

2.21

pydantic

1.10.6

pyflakes

3.0.1

Pygments

2.11.2

PyGObject

3.42.1

PyJWT

2.3.0

pyodbc

4.0.32

pyparsing

3.0.9

pyright

1.1.294

pyrsistent

0.18.0

python-dateutil

2.8.2

python-lsp-jsonrpc

1.0.0

python-lsp-server

1.7.1

pytoolconfig

1.2.5

pytz

2022.7

pyzmq

23.2.0

requests

2.28.1

rope

1.7.0

s3transfer

0.6.1

scikit-learn

1.1.1

seaborn

0.12.2

SecretStorage

3.3.1

Send2Trash

1.8.0

setuptools

65.6.3

six

1.16.0

sniffio

1.2.0

soupsieve

2.3.2.post1

ssh-import-id

5.11

stack-data

0.2.0

statsmodels

0.13.5

tenacity

8.1.0

terminado

0.17.1

threadpoolctl

2.2.0

tinycss2

1.2.1

tokenize-rt

4.2.1

tomli

2.0.1

tornado

6.1

traitlets

5.7.1

typing_extensions

4.4.0

ujson

5.4.0

unattended-upgrades

0.1

urllib3

1.26.14

virtualenv

20.16.7

wadllib

1.3.6

wcwidth

0.2.5

webencodings

0.5.1

websocket-client

0.58.0

whatthepatch

1.0.2

wheel

0.38.4

widgetsnbextension

3.6.1

yapf

0.31.0

zipp

1.0.0

Installed R libraries

R libraries are installed from the Posit Package Manager CRAN snapshot on 2023-07-13.

Library

Version

Library

Version

Library

Version

arrow

12.0.1

askpass

1.1

assertthat

0.2.1

backports

1.4.1

base

4.3.1

base64enc

0.1-3

bit

4.0.5

bit64

4.0.5

blob

1.2.4

boot

1.3-28

brew

1.0-8

brio

1.1.3

broom

1.0.5

bslib

0.5.0

cachem

1.0.8

callr

3.7.3

caret

6.0-94

cellranger

1.1.0

chron

2.3-61

class

7.3-22

cli

3.6.1

clipr

0.8.0

clock

0.7.0

cluster

2.1.4

codetools

0.2-19

colorspace

2.1-0

commonmark

1.9.0

compiler

4.3.1

config

0.3.1

conflicted

1.2.0

cpp11

0.4.4

crayon

1.5.2

credentials

1.3.2

curl

5.0.1

data.table

1.14.8

datasets

4.3.1

DBI

1.1.3

dbplyr

2.3.3

desc

1.4.2

devtools

2.4.5

diagram

1.6.5

diffobj

0.3.5

digest

0.6.33

downlit

0.4.3

dplyr

1.1.2

dtplyr

1.3.1

e1071

1.7-13

ellipsis

0.3.2

evaluate

0.21

fansi

1.0.4

farver

2.1.1

fastmap

1.1.1

fontawesome

0.5.1

forcats

1.0.0

foreach

1.5.2

foreign

0.8-82

forge

0.2.0

fs

1.6.2

future

1.33.0

future.apply

1.11.0

gargle

1.5.1

generics

0.1.3

gert

1.9.2

ggplot2

3.4.2

gh

1.4.0

gitcreds

0.1.2

glmnet

4.1-7

globals

0.16.2

glue

1.6.2

googledrive

2.1.1

googlesheets4

1.1.1

gower

1.0.1

graphics

4.3.1

grDevices

4.3.1

grid

4.3.1

gridExtra

2.3

gsubfn

0.7

gtable

0.3.3

hardhat

1.3.0

haven

2.5.3

highr

0.10

hms

1.1.3

htmltools

0.5.5

htmlwidgets

1.6.2

httpuv

1.6.11

httr

1.4.6

httr2

0.2.3

ids

1.0.1

ini

0.3.1

ipred

0.9-14

isoband

0.2.7

iterators

1.0.14

jquerylib

0.1.4

jsonlite

1.8.7

KernSmooth

2.23-21

knitr

1.43

labeling

0.4.2

later

1.3.1

lattice

0.21-8

lava

1.7.2.1

lifecycle

1.0.3

listenv

0.9.0

lubridate

1.9.2

magrittr

2.0.3

markdown

1.7

MASS

7.3-60

Matrix

1.5-4.1

memoise

2.0.1

methods

4.3.1

mgcv

1.8-42

mime

0.12

miniUI

0.1.1.1

ModelMetrics

1.2.2.2

modelr

0.1.11

munsell

0.5.0

nlme

3.1-162

nnet

7.3-19

numDeriv

2016.8-1.1

openssl

2.0.6

parallel

4.3.1

parallelly

1.36.0

pillar

1.9.0

pkgbuild

1.4.2

pkgconfig

2.0.3

pkgdown

2.0.7

pkgload

1.3.2.1

plogr

0.2.0

plyr

1.8.8

praise

1.0.0

prettyunits

1.1.1

pROC

1.18.4

processx

3.8.2

prodlim

2023.03.31

profvis

0.3.8

progress

1.2.2

progressr

0.13.0

promises

1.2.0.1

proto

1.0.0

proxy

0.4-27

ps

1.7.5

purrr

1.0.1

r2d3

0.2.6

R6

2.5.1

ragg

1.2.5

randomForest

4.7-1.1

rappdirs

0.3.3

rcmdcheck

1.4.0

RColorBrewer

1.1-3

Rcpp

1.0.11

RcppEigen

0.3.3.9.3

readr

2.1.4

readxl

1.4.3

recipes

1.0.6

rematch

1.0.1

rematch2

2.1.2

remotes

2.4.2

reprex

2.0.2

reshape2

1.4.4

rlang

1.1.1

rmarkdown

2.23

RODBC

1.3-20

roxygen2

7.2.3

rpart

4.1.19

rprojroot

2.0.3

Rserve

1.8-11

RSQLite

2.3.1

rstudioapi

0.15.0

rversions

2.1.2

rvest

1.0.3

sass

0.4.6

scales

1.2.1

selectr

0.4-2

sessioninfo

1.2.2

shape

1.4.6

shiny

1.7.4.1

sourcetools

0.1.7-1

sparklyr

1.8.1

SparkR

3.5.0

spatial

7.3-15

splines

4.3.1

sqldf

0.4-11

SQUAREM

2021.1

stats

4.3.1

stats4

4.3.1

stringi

1.7.12

stringr

1.5.0

survival

3.5-5

sys

3.4.2

systemfonts

1.0.4

tcltk

4.3.1

testthat

3.1.10

textshaping

0.3.6

tibble

3.2.1

tidyr

1.3.0

tidyselect

1.2.0

tidyverse

2.0.0

timechange

0.2.0

timeDate

4022.108

tinytex

0.45

tools

4.3.1

tzdb

0.4.0

urlchecker

1.0.1

usethis

2.2.2

utf8

1.2.3

utils

4.3.1

uuid

1.1-0

vctrs

0.6.3

viridisLite

0.4.2

vroom

1.6.3

waldo

0.5.1

whisker

0.4.1

withr

2.5.0

xfun

0.39

xml2

1.3.5

xopen

1.0.0

xtable

1.8-4

yaml

2.3.7

zip

2.3.0

Installed Java and Scala libraries (Scala 2.12 cluster version)

Group ID

Artifact ID

Version

antlr

antlr

2.7.7

com.amazonaws

amazon-kinesis-client

1.12.0

com.amazonaws

aws-java-sdk-autoscaling

1.12.390

com.amazonaws

aws-java-sdk-cloudformation

1.12.390

com.amazonaws

aws-java-sdk-cloudfront

1.12.390

com.amazonaws

aws-java-sdk-cloudhsm

1.12.390

com.amazonaws

aws-java-sdk-cloudsearch

1.12.390

com.amazonaws

aws-java-sdk-cloudtrail

1.12.390

com.amazonaws

aws-java-sdk-cloudwatch

1.12.390

com.amazonaws

aws-java-sdk-cloudwatchmetrics

1.12.390

com.amazonaws

aws-java-sdk-codedeploy

1.12.390

com.amazonaws

aws-java-sdk-cognitoidentity

1.12.390

com.amazonaws

aws-java-sdk-cognitosync

1.12.390

com.amazonaws

aws-java-sdk-config

1.12.390

com.amazonaws

aws-java-sdk-core

1.12.390

com.amazonaws

aws-java-sdk-datapipeline

1.12.390

com.amazonaws

aws-java-sdk-directconnect

1.12.390

com.amazonaws

aws-java-sdk-directory

1.12.390

com.amazonaws

aws-java-sdk-dynamodb

1.12.390

com.amazonaws

aws-java-sdk-ec2

1.12.390

com.amazonaws

aws-java-sdk-ecs

1.12.390

com.amazonaws

aws-java-sdk-efs

1.12.390

com.amazonaws

aws-java-sdk-elasticache

1.12.390

com.amazonaws

aws-java-sdk-elasticbeanstalk

1.12.390

com.amazonaws

aws-java-sdk-elasticloadbalancing

1.12.390

com.amazonaws

aws-java-sdk-elastictranscoder

1.12.390

com.amazonaws

aws-java-sdk-emr

1.12.390

com.amazonaws

aws-java-sdk-glacier

1.12.390

com.amazonaws

aws-java-sdk-glue

1.12.390

com.amazonaws

aws-java-sdk-iam

1.12.390

com.amazonaws

aws-java-sdk-importexport

1.12.390

com.amazonaws

aws-java-sdk-kinesis

1.12.390

com.amazonaws

aws-java-sdk-kms

1.12.390

com.amazonaws

aws-java-sdk-lambda

1.12.390

com.amazonaws

aws-java-sdk-logs

1.12.390

com.amazonaws

aws-java-sdk-machinelearning

1.12.390

com.amazonaws

aws-java-sdk-opsworks

1.12.390

com.amazonaws

aws-java-sdk-rds

1.12.390

com.amazonaws

aws-java-sdk-redshift

1.12.390

com.amazonaws

aws-java-sdk-route53

1.12.390

com.amazonaws

aws-java-sdk-s3

1.12.390

com.amazonaws

aws-java-sdk-ses

1.12.390

com.amazonaws

aws-java-sdk-simpledb

1.12.390

com.amazonaws

aws-java-sdk-simpleworkflow

1.12.390

com.amazonaws

aws-java-sdk-sns

1.12.390

com.amazonaws

aws-java-sdk-sqs

1.12.390

com.amazonaws

aws-java-sdk-ssm

1.12.390

com.amazonaws

aws-java-sdk-storagegateway

1.12.390

com.amazonaws

aws-java-sdk-sts

1.12.390

com.amazonaws

aws-java-sdk-support

1.12.390

com.amazonaws

aws-java-sdk-swf-libraries

1.11.22

com.amazonaws

aws-java-sdk-workspaces

1.12.390

com.amazonaws

jmespath-java

1.12.390

com.clearspring.analytics

stream

2.9.6

com.databricks

Rserve

1.8-3

com.databricks

databricks-sdk-java

0.2.0

com.databricks

jets3t

0.7.1-0

com.databricks.scalapb

compilerplugin_2.12

0.4.15-10

com.databricks.scalapb

scalapb-runtime_2.12

0.4.15-10

com.esotericsoftware

kryo-shaded

4.0.2

com.esotericsoftware

minlog

1.3.0

com.fasterxml

classmate

1.3.4

com.fasterxml.jackson.core

jackson-annotations

2.15.2

com.fasterxml.jackson.core

jackson-core

2.15.2

com.fasterxml.jackson.core

jackson-databind

2.15.2

com.fasterxml.jackson.dataformat

jackson-dataformat-cbor

2.15.2

com.fasterxml.jackson.datatype

jackson-datatype-joda

2.15.2

com.fasterxml.jackson.datatype

jackson-datatype-jsr310

2.15.1

com.fasterxml.jackson.module

jackson-module-paranamer

2.15.2

com.fasterxml.jackson.module

jackson-module-scala_2.12

2.15.2

com.github.ben-manes.caffeine

caffeine

2.9.3

com.github.fommil

jniloader

1.1

com.github.fommil.netlib

native_ref-java

1.1

com.github.fommil.netlib

native_ref-java

1.1-natives

com.github.fommil.netlib

native_system-java

1.1

com.github.fommil.netlib

native_system-java

1.1-natives

com.github.fommil.netlib

netlib-native_ref-linux-x86_64

1.1-natives

com.github.fommil.netlib

netlib-native_system-linux-x86_64

1.1-natives

com.github.luben

zstd-jni

1.5.5-4

com.github.wendykierp

JTransforms

3.1

com.google.code.findbugs

jsr305

3.0.0

com.google.code.gson

gson

2.10.1

com.google.crypto.tink

tink

1.9.0

com.google.errorprone

error_prone_annotations

2.10.0

com.google.flatbuffers

flatbuffers-java

1.12.0

com.google.guava

guava

15.0

com.google.protobuf

protobuf-java

2.6.1

com.helger

profiler

1.1.1

com.jcraft

jsch

0.1.55

com.jolbox

bonecp

0.8.0.RELEASE

com.lihaoyi

sourcecode_2.12

0.1.9

com.microsoft.azure

azure-data-lake-store-sdk

2.3.9

com.microsoft.sqlserver

mssql-jdbc

11.2.2.jre8

com.ning

compress-lzf

1.1.2

com.sun.mail

javax.mail

1.5.2

com.sun.xml.bind

jaxb-core

2.2.11

com.sun.xml.bind

jaxb-impl

2.2.11

com.tdunning

json

1.8

com.thoughtworks.paranamer

paranamer

2.8

com.trueaccord.lenses

lenses_2.12

0.4.12

com.twitter

chill-java

0.10.0

com.twitter

chill_2.12

0.10.0

com.twitter

util-app_2.12

7.1.0

com.twitter

util-core_2.12

7.1.0

com.twitter

util-function_2.12

7.1.0

com.twitter

util-jvm_2.12

7.1.0

com.twitter

util-lint_2.12

7.1.0

com.twitter

util-registry_2.12

7.1.0

com.twitter

util-stats_2.12

7.1.0

com.typesafe

config

1.2.1

com.typesafe.scala-logging

scala-logging_2.12

3.7.2

com.uber

h3

3.7.0

com.univocity

univocity-parsers

2.9.1

com.zaxxer

HikariCP

4.0.3

commons-cli

commons-cli

1.5.0

commons-codec

commons-codec

1.16.0

commons-collections

commons-collections

3.2.2

commons-dbcp

commons-dbcp

1.4

commons-fileupload

commons-fileupload

1.5

commons-httpclient

commons-httpclient

3.1

commons-io

commons-io

2.13.0

commons-lang

commons-lang

2.6

commons-logging

commons-logging

1.1.3

commons-pool

commons-pool

1.5.4

dev.ludovic.netlib

arpack

3.0.3

dev.ludovic.netlib

blas

3.0.3

dev.ludovic.netlib

lapack

3.0.3

info.ganglia.gmetric4j

gmetric4j

1.0.10

io.airlift

aircompressor

0.24

io.delta

delta-sharing-spark_2.12

0.7.1

io.dropwizard.metrics

metrics-annotation

4.2.19

io.dropwizard.metrics

metrics-core

4.2.19

io.dropwizard.metrics

metrics-graphite

4.2.19

io.dropwizard.metrics

metrics-healthchecks

4.2.19

io.dropwizard.metrics

metrics-jetty9

4.2.19

io.dropwizard.metrics

metrics-jmx

4.2.19

io.dropwizard.metrics

metrics-json

4.2.19

io.dropwizard.metrics

metrics-jvm

4.2.19

io.dropwizard.metrics

metrics-servlets

4.2.19

io.netty

netty-all

4.1.93.Final

io.netty

netty-buffer

4.1.93.Final

io.netty

netty-codec

4.1.93.Final

io.netty

netty-codec-http

4.1.93.Final

io.netty

netty-codec-http2

4.1.93.Final

io.netty

netty-codec-socks

4.1.93.Final

io.netty

netty-common

4.1.93.Final

io.netty

netty-handler

4.1.93.Final

io.netty

netty-handler-proxy

4.1.93.Final

io.netty

netty-resolver

4.1.93.Final

io.netty

netty-transport

4.1.93.Final

io.netty

netty-transport-classes-epoll

4.1.93.Final

io.netty

netty-transport-classes-kqueue

4.1.93.Final

io.netty

netty-transport-native-epoll

4.1.93.Final

io.netty

netty-transport-native-epoll

4.1.93.Final-linux-aarch_64

io.netty

netty-transport-native-epoll

4.1.93.Final-linux-x86_64

io.netty

netty-transport-native-kqueue

4.1.93.Final-osx-aarch_64

io.netty

netty-transport-native-kqueue

4.1.93.Final-osx-x86_64

io.netty

netty-transport-native-unix-common

4.1.93.Final

io.prometheus

simpleclient

0.7.0

io.prometheus

simpleclient_common

0.7.0

io.prometheus

simpleclient_dropwizard

0.7.0

io.prometheus

simpleclient_pushgateway

0.7.0

io.prometheus

simpleclient_servlet

0.7.0

io.prometheus.jmx

collector

0.12.0

jakarta.annotation

jakarta.annotation-api

1.3.5

jakarta.servlet

jakarta.servlet-api

4.0.3

jakarta.validation

jakarta.validation-api

2.0.2

jakarta.ws.rs

jakarta.ws.rs-api

2.1.6

javax.activation

activation

1.1.1

javax.el

javax.el-api

2.2.4

javax.jdo

jdo-api

3.0.1

javax.transaction

jta

1.1

javax.transaction

transaction-api

1.1

javax.xml.bind

jaxb-api

2.2.11

javolution

javolution

5.5.1

jline

jline

2.14.6

joda-time

joda-time

2.12.1

net.java.dev.jna

jna

5.8.0

net.razorvine

pickle

1.3

net.sf.jpam

jpam

1.1

net.sf.opencsv

opencsv

2.3

net.sf.supercsv

super-csv

2.2.0

net.snowflake

snowflake-ingest-sdk

0.9.6

net.snowflake

snowflake-jdbc

3.13.33

net.sourceforge.f2j

arpack_combined_all

0.1

org.acplt.remotetea

remotetea-oncrpc

1.1.2

org.antlr

ST4

4.0.4

org.antlr

antlr-runtime

3.5.2

org.antlr

antlr4-runtime

4.9.3

org.antlr

stringtemplate

3.2.1

org.apache.ant

ant

1.9.16

org.apache.ant

ant-jsch

1.9.16

org.apache.ant

ant-launcher

1.9.16

org.apache.arrow

arrow-format

12.0.1

org.apache.arrow

arrow-memory-core

12.0.1

org.apache.arrow

arrow-memory-netty

12.0.1

org.apache.arrow

arrow-vector

12.0.1

org.apache.avro

avro

1.11.2

org.apache.avro

avro-ipc

1.11.2

org.apache.avro

avro-mapred

1.11.2

org.apache.commons

commons-collections4

4.4

org.apache.commons

commons-compress

1.23.0

org.apache.commons

commons-crypto

1.1.0

org.apache.commons

commons-lang3

3.12.0

org.apache.commons

commons-math3

3.6.1

org.apache.commons

commons-text

1.10.0

org.apache.curator

curator-client

2.13.0

org.apache.curator

curator-framework

2.13.0

org.apache.curator

curator-recipes

2.13.0

org.apache.datasketches

datasketches-java

3.1.0

org.apache.datasketches

datasketches-memory

2.0.0

org.apache.derby

derby

10.14.2.0

org.apache.hadoop

hadoop-client-runtime

3.3.6

org.apache.hive

hive-beeline

2.3.9

org.apache.hive

hive-cli

2.3.9

org.apache.hive

hive-jdbc

2.3.9

org.apache.hive

hive-llap-client

2.3.9

org.apache.hive

hive-llap-common

2.3.9

org.apache.hive

hive-serde

2.3.9

org.apache.hive

hive-shims

2.3.9

org.apache.hive

hive-storage-api

2.8.1

org.apache.hive.shims

hive-shims-0.23

2.3.9

org.apache.hive.shims

hive-shims-common

2.3.9

org.apache.hive.shims

hive-shims-scheduler

2.3.9

org.apache.httpcomponents

httpclient

4.5.14

org.apache.httpcomponents

httpcore

4.4.16

org.apache.ivy

ivy

2.5.1

org.apache.logging.log4j

log4j-1.2-api

2.20.0

org.apache.logging.log4j

log4j-api

2.20.0

org.apache.logging.log4j

log4j-core

2.20.0

org.apache.logging.log4j

log4j-slf4j2-impl

2.20.0

org.apache.mesos

mesos

1.11.0-shaded-protobuf

org.apache.orc

orc-core

1.9.0-shaded-protobuf

org.apache.orc

orc-mapreduce

1.9.0-shaded-protobuf

org.apache.orc

orc-shims

1.9.0

org.apache.thrift

libfb303

0.9.3

org.apache.thrift

libthrift

0.12.0

org.apache.xbean

xbean-asm9-shaded

4.23

org.apache.yetus

audience-annotations

0.13.0

org.apache.zookeeper

zookeeper

3.6.3

org.apache.zookeeper

zookeeper-jute

3.6.3

org.checkerframework

checker-qual

3.31.0

org.codehaus.jackson

jackson-core-asl

1.9.13

org.codehaus.jackson

jackson-mapper-asl

1.9.13

org.codehaus.janino

commons-compiler

3.0.16

org.codehaus.janino

janino

3.0.16

org.datanucleus

datanucleus-api-jdo

4.2.4

org.datanucleus

datanucleus-core

4.1.17

org.datanucleus

datanucleus-rdbms

4.1.19

org.datanucleus

javax.jdo

3.2.0-m3

org.eclipse.jetty

jetty-client

9.4.51.v20230217

org.eclipse.jetty

jetty-continuation

9.4.51.v20230217

org.eclipse.jetty

jetty-http

9.4.51.v20230217

org.eclipse.jetty

jetty-io

9.4.51.v20230217

org.eclipse.jetty

jetty-jndi

9.4.51.v20230217

org.eclipse.jetty

jetty-plus

9.4.51.v20230217

org.eclipse.jetty

jetty-proxy

9.4.51.v20230217

org.eclipse.jetty

jetty-security

9.4.51.v20230217

org.eclipse.jetty

jetty-server

9.4.51.v20230217

org.eclipse.jetty

jetty-servlet

9.4.51.v20230217

org.eclipse.jetty

jetty-servlets

9.4.51.v20230217

org.eclipse.jetty

jetty-util

9.4.51.v20230217

org.eclipse.jetty

jetty-util-ajax

9.4.51.v20230217

org.eclipse.jetty

jetty-webapp

9.4.51.v20230217

org.eclipse.jetty

jetty-xml

9.4.51.v20230217

org.eclipse.jetty.websocket

websocket-api

9.4.51.v20230217

org.eclipse.jetty.websocket

websocket-client

9.4.51.v20230217

org.eclipse.jetty.websocket

websocket-common

9.4.51.v20230217

org.eclipse.jetty.websocket

websocket-server

9.4.51.v20230217

org.eclipse.jetty.websocket

websocket-servlet

9.4.51.v20230217

org.fusesource.leveldbjni

leveldbjni-all

1.8

org.glassfish.hk2

hk2-api

2.6.1

org.glassfish.hk2

hk2-locator

2.6.1

org.glassfish.hk2

hk2-utils

2.6.1

org.glassfish.hk2

osgi-resource-locator

1.0.3

org.glassfish.hk2.external

aopalliance-repackaged

2.6.1

org.glassfish.hk2.external

jakarta.inject

2.6.1

org.glassfish.jersey.containers

jersey-container-servlet

2.40

org.glassfish.jersey.containers

jersey-container-servlet-core

2.40

org.glassfish.jersey.core

jersey-client

2.40

org.glassfish.jersey.core

jersey-common

2.40

org.glassfish.jersey.core

jersey-server

2.40

org.glassfish.jersey.inject

jersey-hk2

2.40

org.hibernate.validator

hibernate-validator

6.1.7.Final

org.ini4j

ini4j

0.5.4

org.javassist

javassist

3.29.2-GA

org.jboss.logging

jboss-logging

3.3.2.Final

org.jdbi

jdbi

2.63.1

org.jetbrains

annotations

17.0.0

org.joda

joda-convert

1.7

org.jodd

jodd-core

3.5.2

org.json4s

json4s-ast_2.12

3.7.0-M11

org.json4s

json4s-core_2.12

3.7.0-M11

org.json4s

json4s-jackson_2.12

3.7.0-M11

org.json4s

json4s-scalap_2.12

3.7.0-M11

org.lz4

lz4-java

1.8.0

org.mariadb.jdbc

mariadb-java-client

2.7.9

org.mlflow

mlflow-spark

2.2.0

org.objenesis

objenesis

2.5.1

org.postgresql

postgresql

42.6.0

org.roaringbitmap

RoaringBitmap

0.9.45

org.roaringbitmap

shims

0.9.45

org.rocksdb

rocksdbjni

8.3.2

org.rosuda.REngine

REngine

2.1.0

org.scala-lang

scala-compiler_2.12

2.12.15

org.scala-lang

scala-library_2.12

2.12.15

org.scala-lang

scala-reflect_2.12

2.12.15

org.scala-lang.modules

scala-collection-compat_2.12

2.9.0

org.scala-lang.modules

scala-parser-combinators_2.12

1.1.2

org.scala-lang.modules

scala-xml_2.12

1.2.0

org.scala-sbt

test-interface

1.0

org.scalacheck

scalacheck_2.12

1.14.2

org.scalactic

scalactic_2.12

3.2.15

org.scalanlp

breeze-macros_2.12

2.1.0

org.scalanlp

breeze_2.12

2.1.0

org.scalatest

scalatest-compatible

3.2.15

org.scalatest

scalatest-core_2.12

3.2.15

org.scalatest

scalatest-diagrams_2.12

3.2.15

org.scalatest

scalatest-featurespec_2.12

3.2.15

org.scalatest

scalatest-flatspec_2.12

3.2.15

org.scalatest

scalatest-freespec_2.12

3.2.15

org.scalatest

scalatest-funspec_2.12

3.2.15

org.scalatest

scalatest-funsuite_2.12

3.2.15

org.scalatest

scalatest-matchers-core_2.12

3.2.15

org.scalatest

scalatest-mustmatchers_2.12

3.2.15

org.scalatest

scalatest-propspec_2.12

3.2.15

org.scalatest

scalatest-refspec_2.12

3.2.15

org.scalatest

scalatest-shouldmatchers_2.12

3.2.15

org.scalatest

scalatest-wordspec_2.12

3.2.15

org.scalatest

scalatest_2.12

3.2.15

org.slf4j

jcl-over-slf4j

2.0.7

org.slf4j

jul-to-slf4j

2.0.7

org.slf4j

slf4j-api

2.0.7

org.threeten

threeten-extra

1.7.1

org.tukaani

xz

1.9

org.typelevel

algebra_2.12

2.0.1

org.typelevel

cats-kernel_2.12

2.1.1

org.typelevel

spire-macros_2.12

0.17.0

org.typelevel

spire-platform_2.12

0.17.0

org.typelevel

spire-util_2.12

0.17.0

org.typelevel

spire_2.12

0.17.0

org.wildfly.openssl

wildfly-openssl

1.1.3.Final

org.xerial

sqlite-jdbc

3.42.0.0

org.xerial.snappy

snappy-java

1.1.10.3

org.yaml

snakeyaml

2.0

oro

oro

2.0.8

pl.edu.icm

JLargeArrays

1.5

software.amazon.cryptools

AmazonCorrettoCryptoProvider

1.6.1-linux-x86_64

software.amazon.ion

ion-java

1.0.2

stax

stax-api

1.0.1