Databricks Runtime 17.0 (Beta)
Databricks Runtime 17.0 is in Beta. The contents of the supported environments might change during the Beta. Changes can include the list of packages or versions of installed packages.
The following release notes provide information about Databricks Runtime 17.0 (Beta), powered by Apache Spark 4.0.0.
Databricks released this beta version in May 2025.
To see release notes for Databricks Runtime versions that have reached end-of-support (EoS), see End-of-support Databricks Runtime release notes. The EoS Databricks Runtime versions have been retired and might not be updated.
DBR 17.0 (Beta) new and updated features
- SQL procedure support
- Set a default collation for SQL Functions
- Recursive common table expressions (rCTE) support
- ANSI SQL enabled by default
- PySpark and Spark Connect now support the DataFrames
df.mergeInto
API - Support
ALL CATALOGS
inSHOW
SCHEMAS - Liquid clustering now compacts deletion vectors more efficiently
- Allow non-deterministic expressions in
UPDATE
/INSERT
column values forMERGE
operations - Ignore and rescue empty structs for AutoLoader ingestion (especially Avro)
- Change Delta MERGE Python and Scala APIs to return DataFrame instead of Unit
SQL procedure support
SQL scripts can now be encapsulated in a procedure stored as a reusable asset in Unity Catalog. You can create a procedure using the CREATE PROCEDURE command, and then call it using the CALL command.
Set a default collation for SQL Functions
Using the new DEFAULT COLLATION
clause in the CREATE FUNCTION command defines the default collation used for STRING
parameters, the return type, and STRING
literals in the function body.
Recursive common table expressions (rCTE) support
Databricks now supports navigation of hierarchical data using recursive common table expressions (rCTEs).
Use a self-referencing CTE with UNION ALL
to follow the recursive relationship.
ANSI SQL enabled by default
The default SQL dialect is now ANSI SQL. ANSI SQL is a well-established standard and will help protect users from unexpected or incorrect results. Read the Databricks ANSI enablement guide for more information.
PySpark and Spark Connect now support the DataFrames df.mergeInto
API
PySpark and Spark Connect now support the df.mergeInto
API, which was previously only available for Scala.
Support ALL CATALOGS
in SHOW
SCHEMAS
The SHOW SCHEMAS
syntax is updated to accept the following syntax:
SHOW SCHEMAS [ { FROM | IN } { catalog_name | ALL CATALOGS } ] [ [ LIKE ] pattern ]
When ALL CATALOGS
is specified in a a SHOW
query, the execution iterates through all active catalogs that support namespaces using the catalog manager (DsV2). For each catalog, it includes the top-level namespaces.
The output attributes and schema of the command have been modified to add a catalog
column indicating the catalog of the corresponding namespace. The new column is added to the end of the output attributes, as shown below:
Previous output
| Namespace |
|------------------|
| test-namespace-1 |
| test-namespace-2 |
New output
| Namespace | Catalog |
|------------------|----------------|
| test-namespace-1 | test-catalog-1 |
| test-namespace-2 | test-catalog-2 |
Liquid clustering now compacts deletion vectors more efficiently
Delta tables with Liquid clustering now apply physical changes from deletion vectors more efficiently when OPTIMIZE
is running. For more details, see Apply changes to Parquet data files.
Allow non-deterministic expressions in UPDATE
/INSERT
column values for MERGE
operations
Databricks now allows the use of non-deterministic expressions in updated and inserted column values of MERGE
operations. However, non-deterministic expressions in the conditions of MERGE
statements are not supported.
For example, you can now generate dynamic or random values for columns:
MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED THEN UPDATE SET target.value = source.value + rand()
This can be helpful for data privacy to obfuscate actual data while preserving the data properties (such as mean values or other computed columns).
Ignore and rescue empty structs for AutoLoader ingestion (especially Avro)
Auto Loader now rescues Avro data types with an empty schema since Delta table does not support ingestiom of empty struct
-type data.
Change Delta MERGE Python and Scala APIs to return DataFrame instead of Unit
The Scala and Python MERGE
APIs (such as DeltaMergeBuilder
) now also return a DataFrame like the SQL API does, with the same results.
Behavioral changes
- Behavioral change for the Auto Loader incremental directory listing option
- Removed the "True cache misses" section in Spark UI
- Removed the "Cache Metadata Manager Peak Disk Usage" metric in the Spark UI
- Removed the "Rescheduled cache miss bytes" section in the Spark UI
CREATE VIEW
column-level clauses now throw errors when the clause would only apply to materialized views
Behavioral change for the Auto Loader incremental directory listing option
The value of the deprecated Auto Loader cloudFiles.useIncrementalListing
option is now set to a default value of false
. As a result, this change causes Auto Loader to perform a full directory listing each time it's run. Previously, the default value of the cloudFiles.useIncrementalListing
option was auto
, instructing Auto Loader to make a best-effort attempt at detecting if an incremental listing can be used with a directory.
Databricks recommends against using this option. Instead, use file notification mode with file events. If you want to continue to use the incremental listing feature, set cloudFiles.useIncrementalListing
to auto
in your code. When you set this value to auto
, Auto Loader makes a best-effort attempt to do a full listing once every seven incremental listings, which matches the behavior of this option before this change.
To learn more about Auto Loader directory listing, see Auto Loader streams with directory listing mode.
Removed the "True cache misses" section in Spark UI
This changes removes support for the "Cache true misses size" metric (for both compressed and uncompressed caches). The "Cache writes misses" metric measures the same information.
Use the numLocalScanTasks
as a viable proxy for this metric, when your intention is to see how the cache performs when files are assigned to the right executor.
Removed the "Cache Metadata Manager Peak Disk Usage" metric in the Spark UI
This change removes support for the cacheLocalityMgrDiskUsageInBytes
and cacheLocalityMgrTimeMs
metrics from the Databricks Runtime and the Spark UI.
Removed the "Rescheduled cache miss bytes" section in the Spark UI
Removed the cache rescheduled misses size and cache rescheduled misses size (uncompressed) metrics from DBR. This is done because this measures how the cache performs when files are assigned to non-preferred executors. numNonLocalScanTasks is a good proxy for this metric.
CREATE VIEW
column-level clauses now throw errors when the clause would only apply to materialized views
CREATE VIEW
commands which specify a column-level clause that is only valid for MATERIALIZED VIEW
s now throw an error. The affected clauses for CREATE VIEW
commands are:
NOT NULL
- A specified datatype, such as
FLOAT
orSTRING
DEFAULT
COLUMN MASK
Library upgrades
Apache Spark
Many of its features were already available in Databricks Runtime 14.x, 15.x and 16.x, and now they ship out of the box with Runtime 17.0.
Core and Spark SQL highlights
- [SPARK-45923] Spark Kubernetes Operator
- [SPARK-45869] Revisit and Improve Spark Standalone Cluster
- [SPARK-42849] Session Variables
- [SPARK-44444] Use ANSI SQL mode by default
- [SPARK-46057] Support SQL user-defined functions
- [SPARK-45827] Add Variant data type in Spark
- [SPARK-49555] SQL Pipe Syntax
- [SPARK-46830] String Collation support
- [SPARK-44265] Built-in XML data source support
Spark Core
- [SPARK-49524] Improve K8s support
- [SPARK-47240] SPIP: Structured Logging Framework for Apache Spark
- [SPARK-44893]
ThreadInfo
improvements for monitoring APIs - [SPARK-46861] Avoid Deadlock in DAGScheduler
- [SPARK-47764] Cleanup shuffle dependencies based on
ShuffleCleanupMode
- [SPARK-49459] Support CRC32C for Shuffle Checksum
- [SPARK-46383] Reduce Driver Heap Usage by shortening
TaskInfo.accumulables()
lifespan - [SPARK-45527] Use fraction-based resource calculation
- [SPARK-47172] Add AES-GCM as an optional AES cipher mode for RPC encryption
- [SPARK-47448] Enable
spark.shuffle.service.removeShuffle
by default - [SPARK-47674] Enable
spark.metrics.appStatusSource.enabled
by default - [SPARK-48063] Enable
spark.stage.ignoreDecommissionFetchFailure
by default - [SPARK-48268] Add
spark.checkpoint.dir
config - [SPARK-48292] Revert SPARK-39195 (OutputCommitCoordinator) to fix duplication issues
- [SPARK-48518] Make LZF compression run in parallel
- [SPARK-46132] Support key password for JKS keys for RPC SSL
- [SPARK-46456] Add
spark.ui.jettyStopTimeout
to set Jetty server stop timeout - [SPARK-46256] Parallel Compression Support for ZSTD
- [SPARK-45544] Integrate SSL support into
TransportContext
- [SPARK-45351] Change
spark.shuffle.service.db.backend
default value toROCKSDB
- [SPARK-44741] Support regex-based
MetricFilter
inStatsdSink
- [SPARK-43987] Separate
finalizeShuffleMerge
Processing to Dedicated Thread Pools - [SPARK-45439] Reduce memory usage of
LiveStageMetrics.accumIdsToMetricType
Spark SQL
Features
- [SPARK-50541] Describe Table As JSON
- [SPARK-48031] Support view schema evolution
- [SPARK-50883] Support altering multiple columns in the same command
- [SPARK-47627] Add
SQL MERGE
syntax to enable schema evolution - [SPARK-47430] Support
GROUP BY
forMapType
- [SPARK-49093]
GROUP BY
with MapType nested inside complex type - [SPARK-49098] Add write options for
INSERT
- [SPARK-49451] Allow duplicate keys in
parse_json
- [SPARK-46536] Support
GROUP BY calendar_interval_type
- [SPARK-46908] Support star clause in
WHERE
clause - [SPARK-36680] Support dynamic table options via
WITH OPTIONS
syntax - [SPARK-35553] Improve correlated subqueries
- [SPARK-47492] Widen whitespace rules in lexer to allow Unicode
- [SPARK-46246]
EXECUTE IMMEDIATE
SQL support - [SPARK-46207] Support
MergeInto
in DataFrameWriterV2
Functions
- [SPARK-52016] New built-in functions
- [SPARK-44001] Add option to allow unwrapping protobuf well-known wrapper types
- [SPARK-43427] spark protobuf: allow upcasting unsigned integer types
- [SPARK-44983] Convert
binary
tostring
byto_char
for the formats: hex, base64, utf-8 - [SPARK-44868] Convert
datetime
tostring
byto_char
/to_varchar
- [SPARK-45796] Support
MODE() WITHIN GROUP (ORDER BY col)
- [SPARK-48658] Encode/Decode functions report coding errors instead of mojibake
- [SPARK-45034] Support deterministic mode function
- [SPARK-44778] Add the alias
TIMEDIFF
forTIMESTAMPDIFF
- [SPARK-47497] Make
to_csv
support arrays/maps/binary as pretty strings - [SPARK-44840] Make
array_insert()
1-based for negative indexes
Query optimization
- [SPARK-46946] Supporting broadcast of multiple filtering keys in
DynamicPruning
- [SPARK-48445] Don’t inline UDFs with expansive children
- [SPARK-41413] Avoid shuffle in Storage-Partitioned Join when partition keys mismatch, but expressions are compatible
- [SPARK-46941] Prevent insertion of window group limit node with
SizeBasedWindowFunction
- [SPARK-46707] Add throwable field to expressions to improve predicate pushdown
- [SPARK-47511] Canonicalize
WITH
expressions by reassigning IDs - [SPARK-46502] Support timestamp types in
UnwrapCastInBinaryComparison
- [SPARK-46069] Support unwrap timestamp type to date type
- [SPARK-46219] Unwrap cast in join predicates
- [SPARK-45606] Release restrictions on multi-layer runtime filter
- [SPARK-45909] Remove
NumericType
cast if it can safely up-cast inIsNotNull
Query execution
- [SPARK-45592][SPARK-45282] Correctness issue in AQE with
InMemoryTableScanExec
- [SPARK-50258] Fix output column order changed issue after AQE
- [SPARK-46693] Inject
LocalLimitExec
when matchingOffsetAndLimit
orLimitAndOffset
- [SPARK-48873] Use
UnsafeRow
in JSON parser - [SPARK-41471] Reduce Spark shuffle when only one side of a join is
KeyGroupedPartitioning
- [SPARK-45452] Improve
InMemoryFileIndex
to useFileSystem.listFiles
API - [SPARK-48649] Add
ignoreInvalidPartitionPaths
configs for skipping invalid partition paths - [SPARK-45882]
BroadcastHashJoinExec
propagate partitioning should respect CoalescedHashPartitioning
Spark Connectors
DS v2 framework support changes
- [SPARK-45784] Introduce clustering mechanism to Spark
- [SPARK-50820] DSv2: Conditional nullification of metadata columns in DML
- [SPARK-51938] Improve Storage Partition Join
- [SPARK-50700]
spark.sql.catalog.spark_catalog
supports builtin magic value - [SPARK-48781] Add Catalog APIs for loading stored procedures
- [SPARK-49246]
TableCatalog#loadTable
should indicate if it's for writing - [SPARK-45965] Move DSv2 partitioning expressions into functions.partitioning
- [SPARK-46272] Support CTAS using DSv2 sources
- [SPARK-46043] Support create table using DSv2 sources
- [SPARK-48668] Support
ALTER NAMESPACE ... UNSET PROPERTIES
in v2 - [SPARK-46442] DS V2 supports push down
PERCENTILE_CONT
andPERCENTILE_DISC
- [SPARK-49078] Support show columns syntax in v2 table
Hive Catalog support changes
- [SPARK-45328] Remove Hive support prior to 2.0.0
- [SPARK-47101] Allow comma in top-level column names and relax HiveExternalCatalog schema check
- [SPARK-45265] Support Hive 4.0 metastore
XML support changes
CSV support changes
- [SPARK-46862] Disable CSV column pruning in multi-line mode
- [SPARK-46890] Fix CSV parsing bug with default values and column pruning
- [SPARK-50616] Add File Extension Option to CSV DataSource Writer
- [SPARK-49125] Allow duplicated column names in CSV writing
- [SPARK-49016] Restore behavior for queries from raw CSV files
- [SPARK-48807] Binary support for CSV datasource
- [SPARK-48602] Make csv generator support different output style via spark.sql.binaryOutputStyle
ORC support changes
- [SPARK-46648] Use zstd as the default ORC compression
- [SPARK-47456] Support ORC Brotli codec
- [SPARK-41858] Fix ORC reader perf regression due to DEFAULT value feature
Avro support changes
- [SPARK-47739] Register logical Avro type
- [SPARK-49082] Widening type promotions in
AvroDeserializer
- [SPARK-46633] Fix Avro reader to handle zero-length blocks
- [SPARK-50350] Avro: add new function
schema_of_avro
(Scala side) - [SPARK-46930] Add support for custom prefix for Union type fields in Avro
- [SPARK-46746] Attach codec extension to Avro datasource files
- [SPARK-46759] Support compression level for xz and zstandard in Avro
- [SPARK-46766] Add ZSTD Buffer Pool support for Avro datasource
- [SPARK-43380] Fix Avro data type conversion issues without causing performance regression
- [SPARK-48545] Create
to_avro
andfrom_avro
SQL functions - [SPARK-46990] Fix loading empty Avro files (infinite loop)
JDBC changes
- [SPARK-47361] Improve JDBC data sources
- [SPARK-44977] Upgrade Derby to 10.16.1.1
- [SPARK-47044] Add executed query for JDBC external datasources to explain output
- [SPARK-45139] Add
DatabricksDialect
to handle SQL type conversion
Other notable changes
- [SPARK-45905] Least common type between decimal types should retain integral digits first
- [SPARK-45786] Fix inaccurate Decimal multiplication and division results
- [SPARK-50705] Make
QueryPlan
lock‑free - [SPARK-46743] Fix corner-case with
COUNT
+ constant folding subquery - [SPARK-47509] Block subquery expressions in lambda/higher-order functions for correctness
- [SPARK-48498] Always do char padding in predicates
- [SPARK-45915] Treat decimal(x, 0) the same as IntegralType in PromoteStrings
- [SPARK-46220] Restrict charsets in decode()
- [SPARK-45816] Return
NULL
when overflowing during casting from timestamp to integers - [SPARK-45586] Reduce compiler latency for plans with large expression trees
- [SPARK-45507] Correctness fix for nested correlated scalar subqueries with
COUNT
aggregates - [SPARK-44550] Enable correctness fixes for null
IN
(empty list) under ANSI - [SPARK-47911] Introduces a universal
BinaryFormatter
to make binary output consistent
PySpark
Below are the changes and improvements made to the PySpark libraries shipping in Databricks Runtime 17.0 (Beta).
Highlights
- [SPARK-49530] Introducing PySpark Plotting API
- [SPARK-47540] SPIP: Pure Python Package (Spark Connect)
- [SPARK-50132] Add DataFrame API for Lateral Joins
- [SPARK-45981] Improve Python language test coverage
- [SPARK-46858] Upgrade Pandas to 2
- [SPARK-46910] Eliminate JDK Requirement in PySpark Installation
- [SPARK-47274] Provide more useful context for DataFrame API errors
- [SPARK-44076] SPIP: Python Data Source API
- [SPARK-43797] Python User-defined Table Functions
- [SPARK-46685] PySpark UDF Unified Profiling
DataFrame APIs features
- [SPARK-51079] Support large variable types in pandas UDF,
createDataFrame
andtoPandas
with Arrow - [SPARK-50718] Support
addArtifact(s)
for PySpark - [SPARK-50778] Add
metadataColumn
to PySpark DataFrame - [SPARK-50719] Support
interruptOperation
for PySpark - [SPARK-50790] Implement
parse_json
in PySpark - [SPARK-49306] Create SQL function aliases for
zeroifnull
andnullifzero
- [SPARK-50132] Add DataFrame API for Lateral Joins
- [SPARK-43295] Support string type columns for
DataFrameGroupBy.sum
- [SPARK-45575] Support time travel options for
df.read
API - [SPARK-45755] Improve
Dataset.isEmpty()
by applying global limit 1- Improves performance of isEmpty() by pushing down a global limit of 1.
- [SPARK-48761] Introduce
clusterBy
DataFrameWriter API for Scala - [SPARK-45929] Support
groupingSets
operation in DataFrame API- Extends
groupingSets(...)
to DataFrame/DS-level APIs.
- Extends
- [SPARK-40178] Support coalesce hints with ease for PySpark and R
Pandas API on Spark features
- [SPARK-46931] Implement
{Frame, Series}.to_hdf
- [SPARK-46936] Implement
Frame.to_feather
- [SPARK-46955] Implement
Frame.to_stata
- [SPARK-46976] Implement
DataFrameGroupBy.corr
- [SPARK-49344] Support
json_normalize
for Pandas API on Spark - [SPARK-42617] Support
isocalendar
from the pandas 2 - [SPARK-45552] Introduce flexible parameters to
assertDataFrameEqual
- [SPARK-47824] Fix nondeterminism in pyspark.pandas.series.asof
- [SPARK-46926] Add
convert_dtypes
,infer_objects
,set_axis
in fallback list - [SPARK-48295] Turn on
compute.ops_on_diff_frames
by default - [SPARK-48336] Implement
ps.sql
in Spark Connect - [SPARK-45267] Change the default value for
numeric_only
- [SPARK-44841] Support
value_counts
for pandas 2.0.0 and above - [SPARK-44289][SPARK-43874][SPARK-43869][SPARK-43607] Support
indexer_between_time
for pandas 2.0.0 - [SPARK-44842][SPARK-43812] Support
stat
functions for pandas 2 - [SPARK-43563][SPARK-43459][SPARK-43451][SPARK-43506] Remove squeeze from
read_csv
- [SPARK-42619] Add show_counts parameter for
DataFrame.info
- [SPARK-43568][SPARK-43633] Support Categorical APIs for pandas 2
- [SPARK-42620] Add inclusive parameter for
(DataFrame|Series).between_time
- [SPARK-42621] Add inclusive parameter for
pd.date_range
- [SPARK-43245][SPARK-43705] Type match for
DatetimeIndex
/TimedeltaIndex
with pandas 2 - [SPARK-43872] Support
(DataFrame|Series).plot
with pandas 2 - [SPARK-43476][SPARK-43477][SPARK-43478] Support
StringMethods
for pandas 2 - [SPARK-45553] Deprecate
assertPandasOnSparkEqual
- [SPARK-45718] Remove remaining deprecated Pandas features from Spark 3.4.0
- [SPARK-45550] Remove deprecated APIs from Pandas API on Spark
- [SPARK-45634] Remove
DataFrame.get_dtype_counts
from Pandas API on Spark - [SPARK-45165] Remove
inplace
parameter from CategoricalIndex APIs - [SPARK-45177] Remove
col_space
parameter fromto_latex
- [SPARK-45164] Remove deprecated Index APIs
- [SPARK-45180] Remove boolean inputs for inclusive parameter from
Series.between
- [SPARK-43709] Remove closed parameter from
ps.date_range
& enable test - [SPARK-43453] Ignore the names of
MultiIndex
whenaxis=1
forconcat
- [SPARK-43433] Match
GroupBy.nth
behavior to the latest Pandas
Other notable PySpark changes
- [SPARK-50357] Support
Interrupt(Tag|All)
APIs for PySpark - [SPARK-50392] DataFrame conversion to table argument in Spark Classic
- [SPARK-50752] Introduce configs for tuning Python UDF without Arrow
- [SPARK-47366] Add VariantVal for PySpark
- [SPARK-47683] Decouple PySpark core API to pyspark.core package
- [SPARK-47565] Improve PySpark worker pool crash resilience
- [SPARK-47933] Parent Column class for Spark Connect and Spark Classic
- [SPARK-50499] Expose metrics from
BasePythonRunner
- [SPARK-50220] Support
listagg
in PySpark - [SPARK-46910] Eliminate JDK Requirement in PySpark Installation
- [SPARK-46522] Block Python data source registration with name conflicts
- [SPARK-48996] Allow bare Python literals in Column.and / or
- [SPARK-48762] Introduce
clusterBy
DataFrameWriter API for Python - [SPARK-49009] Make Column APIs accept Python Enums
- [SPARK-45891] Add interval types in Variant Spec
- [SPARK-48710] Use NumPy 2.0-compatible types
- [SPARK-48714] Implement
DataFrame.mergeInto
in PySpark - [SPARK-48798] Introduce
spark.profile.render
for SparkSession-based profiling - [SPARK-47346] Make daemon mode configurable for Python planner workers
- [SPARK-47366] Add
parse_json
alias in PySpark/dataframe - [SPARK-48247] Use all
dict
pairs inMapType
schema inference - [SPARK-48340] Support
TimestampNTZ
schema inference withprefer_timestamp_ntz
- [SPARK-48220] Allow passing PyArrow Table to
createDataFrame()
- [SPARK-48482]
dropDuplicates
,dropDuplicatesWithinWatermark
acceptvar-args
- [SPARK-48372][SPARK-45716] Implement
StructType.treeString
- [SPARK-50311] (
add
|remove
|get
|clear
)Tag(s) APIs - [SPARK-50238] Add Variant Support in PySpark UDFs/UDTFs/UDAFs
- [SPARK-50446] Concurrent level in Arrow-optimized Python UDF
- [SPARK-50310] Add a flag to disable DataFrameQueryContext
- [SPARK-50471] Support Arrow-based Python Data Source Writer
- [SPARK-49899] Support
deleteIfExists
forTransformWithStateInPandas
- [SPARK-45597] Support creating table using a Python data source in SQL (DSv2 exec)
- [SPARK-46424] Support Python metrics in Python Data Source
- [SPARK-45525] Support for Python data source write using DSv2
- [SPARK-41666] Support parameterized SQL by
sql()
- [SPARK-45768] Make
faulthandler
a runtime configuration for Python execution in SQL - [SPARK-45555] Includes a debuggable object for failed assertion
- [SPARK-45600] Make Python data source registration session level
- [SPARK-46048] Support DataFrame.groupingSets in PySpark
- [SPARK-46103] Enhancing PySpark documentation
- [SPARK-40559] Add applyInArrow to groupBy and cogroup
- [SPARK-45420] Add
DataType.fromDDL
into PySpark - [SPARK-45554] Introduce flexible parameter to a
ssertSchemaEqual
- [SPARK-44918] Support named arguments in scalar Python/Pandas UDFs
- [SPARK-45017] Add
CalendarIntervalType
to PySpark - [SPARK-44952] Support named arguments in aggregate Pandas UDFs
- [SPARK-44665] Add support for pandas DataFrame
assertDataFrameEqual
- [SPARK-44705] Make PythonRunner single-threaded
- [SPARK-45673] Enhancing clarity and usability of PySpark error messages
Spark Streaming
Below are the changes and improvements made to Spark Streaming in Databricks Runtime 17.0 (Beta).
Highlights
- [SPARK-46815] Structured Streaming - Arbitrary State API v2
- For more details, see Introducing transformWithState for Apache Spark.
- [SPARK-45511] SPIP: State Data Source - Reader
- For more details, see Announcing the State Reader API and Announcing Simplified State Tracking with Apache Spark.
- [SPARK-46962] Implement python worker to run python streaming data source
Other notable streaming changes
- [SPARK-44865] Make StreamingRelationV2 support metadata column
- [SPARK-45080] Explicitly call out support for columnar in DSv2 streaming data sources
- [SPARK-45178] Fallback to execute a single batch for Trigger.AvailableNow with unsupported sources
- [SPARK-45415] Allow selective disabling of "fallocate" in RocksDB statestore
- [SPARK-45503] Add Conf to Set RocksDB Compression
- [SPARK-45511] State Data Source - Reader
- [SPARK-45558] Introduce a metadata file for streaming stateful operator
- [SPARK-45794] Introduce state metadata source to query the streaming state metadata information
- [SPARK-45815] Provide an interface for other Streaming sources to add _metadata columns
- [SPARK-45845] Add number of evicted state rows to streaming UI
- [SPARK-46641] Add
maxBytesPerTrigger
threshold - [SPARK-46816] Add base support for new arbitrary state management operator (multiple state variables/column families)
- [SPARK-46865] Add Batch Support for TransformWithState Operator
- [SPARK-46906] Add a check for stateful operator change for streaming
- [SPARK-46961] Use
ProcessorContext
to store and retrieve handle - [SPARK-46962] Add interface for Python streaming data source & worker
- [SPARK-47107] Partition reader for Python streaming data sources
- [SPARK-47273] Python data stream writer interface
- [SPARK-47553] Add Java support for
transformWithState
operator APIs - [SPARK-47653] Add support for negative numeric types and range scan key encoder
- [SPARK-47733] Add custom metrics for transformWithState operator part of query progress
- [SPARK-47960] Allow chaining other stateful operators after transformWithState
- [SPARK-48447] Check
StateStoreProvider
class before constructor - [SPARK-48569] Handle edge cases in
query.name
for streaming queries - [SPARK-48589] Add
snapshotStartBatchId
/snapshotPartitionId
for state data source (see SQL) - [SPARK-48589] Add snapshotStartBatchId / snapshotPartitionId options to state data source
- [SPARK-48726] Create StateSchemaV3 file for
TransformWithStateExec
- [SPARK-48742] Virtual Column Family for RocksDB (arbitrary stateful API v2)
- [SPARK-48755]
transformWithState
pyspark base implementation andValueState
support - [SPARK-48772] State Data Source Change Feed Reader Mode
- [SPARK-48836] Integrate SQL schema with state schema/metadata for TWS operator
- [SPARK-48849] Create OperatorStateMetadataV2 for
TransformWithStateExec
operator - [SPARK-48901][SPARK-48916] Introduce
clusterBy
DataStreamWriter API in Scala/PySpark - [SPARK-48931] Reduce Cloud Store List API cost for state-store maintenance
- [SPARK-49021] Add support for reading
transformWithState
value state variables with state data source reader - [SPARK-49048] Add support for reading operator metadata at given batch id
- [SPARK-49191] Read
transformWithState
map state with state data source - [SPARK-49259] Size-based partition creation during Kafka read
- [SPARK-49411] Communicate State Store Checkpoint ID
- [SPARK-49463] ListState support in
TransformWithStateInPandas
- [SPARK-49467] Add state data source reader for list state
- [SPARK-49513] Add timer support in
transformWithStateInPandas
- [SPARK-49630] Add flatten option for collection types in state data source reader
- [SPARK-49656] Support state variables with value state collection types
- [SPARK-49676] Chaining of operators in
transformWithStateInPandas
- [SPARK-49699] Disable
PruneFilters
for streaming workloads - [SPARK-49744] TTL support for ListState in
TransformWithStateInPandas
- [SPARK-49745] Read registered timers in
transformWithState
- [SPARK-49802] Add support for read change feed for map/list types
- [SPARK-49846] Add
numUpdatedStateRows
/numRemovedStateRows
metrics - [SPARK-49883] State Store Checkpoint Structure V2 Integration with RocksDB and RocksDBFileManager
- [SPARK-50017] Support Avro encoding for
TransformWithState
operator - [SPARK-50035] Explicit
handleExpiredTimer
function in the stateful processor - [SPARK-50128] Add handle APIs using implicit encoders
- [SPARK-50152] Support handleInitialState with state data source reader
- [SPARK-50194] Integration of New Timer API and Initial State API
- [SPARK-50378] Add custom metric for time spent populating initial state
- [SPARK-50428] Support
TransformWithStateInPandas
in batch queries - [SPARK-50573] Adding State Schema ID to State Rows for schema evolution
- [SPARK-50714] Enable schema evolution for
TransformWithState
with Avro encoding
Spark ML
- [SPARK-48463] Make various ML transformers support nested input columns
- [SPARK-48463] Make
StringIndexer
support nested input columns - [SPARK-45757] Avoid re-computation of NNZ in Binarizer
- [SPARK-45397] Add array assembler feature transformer
- [SPARK-45547] Validate Vectors with built-in function
Spark UX
- [SPARK-47240] SPIP: Structured Logging Framework for Apache Spark
- [SPARK-44893]
ThreadInfo
improvements for monitoring APIs - [SPARK-45595] Expose
SQLSTATE
in error message - [SPARK-45022] Provide context for dataset API errors
- [SPARK-45771] Enable
spark.eventLog.rolling.enabled
by default
Other notable Spark UX changes
- [SPARK-41685] Support Protobuf serializer for the KVStore in History server
- [SPARK-44770] Add a
displayOrder
variable toWebUITab
to specify the order in which tabs appear - [SPARK-44801] Capture analyzing failed queries in Listener and UI
- [SPARK-44838]
raise_error
improvement - [SPARK-44863] Add a button to download thread dump as a txt in Spark UI
- [SPARK-44895] Add 'daemon', 'priority' for
ThreadStackTrace
- [SPARK-45022] Provide context for dataset API errors
- [SPARK-45151] Task Level Thread Dump Support
- [SPARK-45207] Implement Error Enrichment for Scala Client
- [SPARK-45209] FlameGraph Support For Executor Thread Dump Page
- [SPARK-45240] Implement Error Enrichment for Python Client
- [SPARK-45248] Set the timeout for spark UI server
- [SPARK-45274] Implementation of a new DAG drawing approach for job/stage/plan graphics
- [SPARK-45312] Support toggle display/hide plan svg on execution page
- [SPARK-45439] Reduce memory usage of
LiveStageMetrics.accumIdsToMetricType
- [SPARK-45462] Show Duration in ApplicationPage
- [SPARK-45480] Selectable Spark Plan Node on UI
- [SPARK-45491] Add missing SQLSTATES
- [SPARK-45500] Show the number of abnormally completed drivers in MasterPage
- [SPARK-45516] Include
QueryContext
inSparkThrowable
proto message - [SPARK-45581] Make
SQLSTATE
mandatory - [SPARK-45595] Expose
SQLSTATE
in error message - [SPARK-45609] Include
SqlState
inSparkThrowable
proto message - [SPARK-45641] Display the application start time on AllJobsPage
- [SPARK-45771] Enable
spark.eventLog.rolling.enabled
by default - [SPARK-45774] Support
spark.master.ui.historyServerUrl
in ApplicationPage - [SPARK-45955] Collapse Support for Flamegraph and thread dump details
- [SPARK-46003] Create a
ui-test
module with Jest to test UI JavaScript code - [SPARK-46094] Support Executor JVM Profiling
- [SPARK-46399] Add exit status to the Application End event for the use of Spark Listener
- [SPARK-46886] Enable
spark.ui.prometheus.enabled
by default - [SPARK-46893] Remove inline scripts from UI descriptions
- [SPARK-46903] Support Spark History Server Log UI
- [SPARK-46922] Do not wrap runtime user-facing errors
- [SPARK-46933] Add query execution time metric to connectors using JDBCRDD
- [SPARK-47253] Allow LiveEventBus to stop without draining the event queue
- [SPARK-47894] Add Environment page to Master UI
- [SPARK-48459] Implement
DataFrameQueryContext
in Spark Connect - [SPARK-48597] Introduce marker for
isStreaming
in text representation of logical plan - [SPARK-48628] Add task peak on/off heap memory metrics
- [SPARK-48716] Add
jobGroupId
toSparkListenerSQLExecutionStart
- [SPARK-49128] Support custom History Server UI title
- [SPARK-49206] Add Environment Variables table to Master EnvironmentPage
- [SPARK-49241] Add OpenTelemetryPush Sink with opentelemetry profile
- [SPARK-49445] Support show tooltip in the progress bar of UI
- [SPARK-50049] Support custom driver metrics in writing to v2 table
- [SPARK-50315] Support custom metrics for V1Fallback writes
- [SPARK-50915] Add
getCondition
and deprecategetErrorClass
in PySparkException - [SPARK-51021] Add log throttler
Spark Connect
Below are the changes and improvements made to Spark Connect in Databricks Runtime 17.0 (Beta).
Highlights
- [SPARK-49248] Scala Client Parity with existing Dataset/DataFrame API
- [SPARK-48918] Create a unified SQL Scala interface shared by regular SQL and Connect
- [SPARK-50812] Support pyspark.ml on Connect
- [SPARK-47908] Parent classes for Spark Connect and Spark Classic
Other Spark Connect changes and improvements
- [SPARK-41065] Implement
DataFrame.freqItems
andDataFrame.stat.freqItems
- [SPARK-41066] Implement
DataFrame.sampleBy
andDataFrame.stat.sampleBy
- [SPARK-41067] Implement
DataFrame.stat.cov
- [SPARK-41068] Implement
DataFrame.stat.corr
- [SPARK-41069] Implement
DataFrame.approxQuantile
andDataFrame.stat.approxQuantile
- [SPARK-41292][SPARK-41640][SPARK-41641] Implement
Window
functions - [SPARK-41333][SPARK-41737] Implement
GroupedData.{min, max, avg, sum}
- [SPARK-41364] Implement broadcast function
- [SPARK-41383][SPARK-41692][SPARK-41693] Implement
rollup
,cube
, andpivot
- [SPARK-41434] Initial LambdaFunction implementation
- [SPARK-41440] Implement
DataFrame.randomSplit
- [SPARK-41464] Implement
DataFrame.to
- [SPARK-41473] Implement
format_number
function - [SPARK-41503] Implement Partition Transformation Functions
- [SPARK-41529] Implement
SparkSession.stop
- [SPARK-41534] Setup initial client module for Spark Connect
- [SPARK-41629] Support for Protocol Extensions in Relation and Expression
- [SPARK-41663] Implement the rest of Lambda functions
- [SPARK-41673] Implement
Column.astype
- [SPARK-41690] Agnostic Encoders
- [SPARK-41707] Implement Catalog API in Spark Connect
- [SPARK-41710] Implement
Column.between
- [SPARK-41722] Implement 3 missing time window functions
- [SPARK-41723] Implement sequence function
- [SPARK-41724] Implement
call_udf
function - [SPARK-41728] Implement
unwrap_udt
function - [SPARK-41731] Implement the column accessor (
getItem
,getField
,getitem
, etc.) - [SPARK-41738] Mix
ClientId
inSparkSession
cache - [SPARK-41740] Implement
Column.name
- [SPARK-41767] Implement
Column.{withField, dropFields}
- [SPARK-41785] Implement
GroupedData.mean
- [SPARK-41803] Add missing function
log(arg1, arg2)
- [SPARK-41810] Infer names from a list of dictionaries in
SparkSession.createDataFrame
- [SPARK-41811] Implement
SQLStringFormatter
withWithRelations
- [SPARK-42664] Support
bloomFilter
function forDataFrameStatFunction
s - [SPARK-43662] Support
merge_asof
in Spark Connect - [SPARK-43704] Support
MultiIndex
forto_series()
in Spark Connect - [SPARK-44625]
SparkConnectExecutionManager
to track all executions - [SPARK-44731] Make
TimestampNTZ
work with literals in Python Spark Connect - [SPARK-44736] Add
Dataset.explode
to Spark Connect Scala Client - [SPARK-44740] Support specifying
session_id
inSPARK_REMOTE
connection string - [SPARK-44747] Add missing
SparkSession.Builder
methods - [SPARK-44750] Apply configuration to
SparkSession
during creation - [SPARK-44761] Support
DataStreamWriter.foreachBatch(VoidFunction2)
- [SPARK-44788] Add
from_xml
andschema_of_xml
to pyspark, Spark Connect, and SQL functions - [SPARK-44807] Add
Dataset.metadataColumn
to Scala Client - [SPARK-44877] Support python protobuf functions for Spark Connect
- [SPARK-45000] Implement
DataFrame.foreach
- [SPARK-45001] Implement
DataFrame.foreachPartition
- [SPARK-45088] Make
getitem
work with duplicated columns - [SPARK-45090]
DataFrame.{cube, rollup}
support column ordinals - [SPARK-45091] Function
floor
/round
/bround
now accept Column type scale - [SPARK-45121] Support
Series.empty
for Spark Connect - [SPARK-45136] Enhance
ClosureCleaner
with Ammonite support - [SPARK-45137] Support map/array parameters in parameterized sql()
- [SPARK-45143] Make PySpark compatible with PyArrow 13.0.0
- [SPARK-45190][SPARK-48897] Make
from_xml
supportStructType
schema - [SPARK-45235] Support
map and array
parameters by sql() - [SPARK-45485] User agent improvements: Use
SPARK_CONNECT_USER_AGENT
env variable and include environment specific attributes - [SPARK-45506] Add ivy URI support to SparkcConnect
addArtifact
- [SPARK-45509] Fix df column reference behavior for Spark Connect
- [SPARK-45619] Apply the observed metrics to Observation object
- [SPARK-45680] Release session
- [SPARK-45733] Support multiple retry policies
- [SPARK-45770] Introduce plan
DataFrameDropColumns
forDataframe.drop
- [SPARK-45851] Support multiple policies in scala client
- [SPARK-46039] Upgrade
grpcio\*
to 1.59.3 for Python 3.12 - [SPARK-46048] Support
DataFrame.groupingSets
in Python Spark Connect - [SPARK-46085]
Dataset.groupingSets
in Scala Spark Connect client - [SPARK-46202] Expose new
ArtifactManager
APIs to support custom target directories - [SPARK-46229] Add
applyInArrow
togroupBy
andcogroup
in Spark Connect - [SPARK-46255] Support complex type -> string conversion
- [SPARK-46620] Introduce a basic fallback mechanism for frame methods
- [SPARK-46812] Make
mapInPandas
/mapInArrow
supportResourceProfile
- [SPARK-46919] Upgrade grpcio* and grpc-java to 1.62.x
- [SPARK-47014] Implement methods
dumpPerfProfile
anddumpMemoryProfiles
of SparkSession - [SPARK-47069] Introduce
spark.profile.show
/.dump
for SparkSession-based profiling - [SPARK-47081] Support Query Execution Progress
- [SPARK-47137] Add
getAll
tospark.conf
for feature parity with Scala - [SPARK-47233] Client & Server logic for client-side streaming query listener
- [SPARK-47276] Introduce
spark.profile.clear
for SparkSession-based profiling - [SPARK-47367] Support Python data sources with Spark Connect
- [SPARK-47543] Infer
dict
asMapType
from Pandas DataFrame (via new config) - [SPARK-47545]
Dataset.observe
for Scala Connect - [SPARK-47694] Make max message size configurable on the client side
- [SPARK-47712] Allow connect plugins to create and process Datasets
- [SPARK-47812] Support Serialization of
SparkSession
forForEachBatch
worker - [SPARK-47818] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests
- [SPARK-47828] Fix
DataFrameWriterV2.overwrite
failure due to invalid plan - [SPARK-47845] Support Column type in split function for Scala and Python
- [SPARK-47909] Parent DataFrame class for Spark Connect and Spark Classic
- [SPARK-48008] Support UDAFs in Spark Connect
- [SPARK-48048] Added client side listener support for Scala
- [SPARK-48058][SPARK-43727]
UserDefinedFunction.returnType
parse the DDL string - [SPARK-48112] Expose session in
SparkConnectPlanner
to plugins - [SPARK-48113] Allow Plugins to integrate with Spark Connect
- [SPARK-48258]
Checkpoint
andlocalCheckpoint
in Spark Connect - [SPARK-48278] Refine the string representation of Cast
- [SPARK-48310] Cached properties must return copies
- [SPARK-48336] Implement
ps.sql
in Spark Connect - [SPARK-48370]
Checkpoint
andlocalCheckpoint
in Scala Spark Connect client - [SPARK-48510] Support UDAF
toColumn
API in Spark Connect - [SPARK-48555] Support using Columns as parameters for several functions (
array_remove
,array_position
, etc.) - [SPARK-48569] Handle edge cases in
query.name
for streaming queries - [SPARK-48638] Add
ExecutionInfo
support for DataFrame - [SPARK-48639] Add
Origin
toRelationCommo
n - [SPARK-48648] Make
SparkConnectClient.tags
properly thread-local - [SPARK-48794]
DataFrame.mergeInto
support for Spark Connect (Scala & Python) - [SPARK-48831] Make default column name of cast compatible with Spark Classic
- [SPARK-48960] Makes spark‑shell work with Spark Connect (
–remote
support) - [SPARK-49025] Make Column implementation agnostic
- [SPARK-49027] Share Column API between Classic and Connect
- [SPARK-49028] Create a shared SparkSession
- [SPARK-49029] Create shared Dataset interface
- [SPARK-49087] Distinguish
UnresolvedFunction
calling internal functions - [SPARK-49185] Reimplement kde plot with Spark SQL
- [SPARK-49201] Reimplement hist plot with Spark SQL
- [SPARK-49249][SPARK-49122] Add
addArtifac
t API to the Spark SQL Core - [SPARK-49273] Origin support for Spark Connect Scala client
- [SPARK-49282] Create a shared
SparkSessionBuilder
interface - [SPARK-49284] Create a shared Catalog interface
- [SPARK-49413] Create a shared
RuntimeConfig
interface - [SPARK-49416] Add shared
DataStreamReader
interface - [SPARK-49417] Add shared
StreamingQueryManager
interface - [SPARK-49419] Create shared DataFrameStatFunctions
- [SPARK-49429] Add shared
DataStreamWriter
interface - [SPARK-49526] Support Windows-style paths in ArtifactManager
- [SPARK-49530] Support kde/density plots
- [SPARK-49531] Support line plot with plotly backend
- [SPARK-49595] Fix
DataFrame.unpivot
andDataFrame.melt
in Spark Connect Scala Client - [SPARK-49626] Support horizontal/vertical bar plots
- [SPARK-49907] Support spark.ml on Connect
- [SPARK-49948] Add “precision” parameter to pandas on Spark box plot
- [SPARK-50050] Make lit accept str/bool numpy ndarray
- [SPARK-50054] Support histogram plots
- [SPARK-50063] Add support for Variant in the Spark Connect Scala client
- [SPARK-50075] DataFrame APIs for table-valued functions
- [SPARK-50134][SPARK-50130] Support DataFrame API for
SCALAR
andEXISTS
subqueries in Spark Connect - [SPARK-50134][SPARK-50132] Support DataFrame API for Lateral Join in Spark Connect
- [SPARK-50227] Upgrade buf plugins to v28.3
- [SPARK-50298] Implement
verifySchema
parameter ofcreateDataFrame
- [SPARK-50306] Support Python 3.13 in Spark Connect
- [SPARK-50373] Prohibit Variant from set operations
- [SPARK-50544] Implement
StructType.toDDL
- [SPARK-50710] Add support for optional client reconnection to sessions after release
- [SPARK-50828] Deprecate
pyspark.ml.connect
- [SPARK-46465] Add
Column.isNaN
in PySpark- Adds the
Column.isNaN
function to PySpark Connect, matching Scala API parity.
- Adds the
- [SPARK-41440] Implement
DataFrame.randomSplit
- Implements
DataFrame.randomSplit
for Spark Connect in Python.
- Implements
- [SPARK-41434] Initial LambdaFunction implementation
- Adds basic support for LambdaFunction and an initial exists function in Spark Connect.
- [SPARK-41464] Implement
DataFrame.to
- Implements
DataFrame.to
for Spark Connect in Python.
- Implements
- [SPARK-41364] Implement broadcast function
- Implements the broadcast function in Spark Connect Python client.
- [SPARK-41663] Implement the rest of Lambda functions
- Completes Lambda function support in Spark Connect Python client (such as
filter
,map
, etc.).
- Completes Lambda function support in Spark Connect Python client (such as
- [SPARK-41673] Implement
Column.astype
- Adds
Column.astype
to Spark Connect Python for type casting.
- Adds
- [SPARK-41292][SPARK-41640][SPARK-41641] Implement
Window
functions- Adds support for window functions (
Window.partitionBy
,Window.orderBy
, etc.) to Spark Connect.
- Adds support for window functions (
- [SPARK-41534] Setup initial client module for Spark Connect
- Sets up the initial Scala/JVM client module for Spark Connect.
- [SPARK-41503] Implement Partition Transformation Functions
- Implements partition transformation functions for Spark Connect in Python.
- [SPARK-41710] Implement
Column.between
- Adds
Column.between
method to Spark Connect in Python.
- Adds
- [SPARK-41707] Implement Catalog API in Spark Connect
- Implements the catalog API for Spark Connect (such as
listTables
,listFunctions
, etc.).
- Implements the catalog API for Spark Connect (such as
- [SPARK-41690] Agnostic Encoders
- Introduces “agnostic encoders” for mapping external types to Spark data types.
- [SPARK-41722] Implement 3 missing time window functions
- Implements window, window_time, and session_window in Spark Connect Python.
- [SPARK-41723] Implement sequence function
- Adds the sequence function for Spark Connect in Python.
- [SPARK-41473] Implement
format_number
function- Implements
format_number
function in Spark Connect Python.
- Implements
- [SPARK-41724] Implement
call_udf
function- Allows users to call a UDF by name:
call_udf("my_udf", col1, col2, ...)
.
- Allows users to call a UDF by name:
- [SPARK-41529] Implement
SparkSession.sto
p- Implements
SparkSession.stop
to shut down a Spark Connect session server side.
- Implements
- [SPARK-41728] Implement
unwrap_udt
function- Adds the
unwrap_udt
function to Spark Connect in Python.
- Adds the
- [SPARK-41731] Implement the column accessor (
getItem
,getField
,getitem
, etc.)- Allows indexing into arrays and structs in Spark Connect columns.
- [SPARK-41740] Implement
Column.name
- Adds
.name
method for columns in Spark Connect Python.
- Adds
- [SPARK-41738] Mix
ClientId
inSparkSession
cache- Fixes concurrency by mixing client ID into the
SparkSessio
n cache on the server.
- Fixes concurrency by mixing client ID into the
- [SPARK-41067] Implement
DataFrame.stat.cov
- Implements covariance calculation (
df.stat.cov
) for Spark Connect in Python.
- Implements covariance calculation (
- [SPARK-41767] Implement
Column.{withField, dropFields}
- Adds support for adding/dropping struct fields in Spark Connect columns.
- [SPARK-41292] Support Window in
pyspark.sql.window
namespace- Integrates Spark Connect’s window functionality into
pyspark.sql.window
.
- Integrates Spark Connect’s window functionality into
- [SPARK-41068] Implement
DataFrame.stat.cor
r- Implements correlation calculation (
df.stat.corr
) for Spark Connect in Python.
- Implements correlation calculation (
- [SPARK-41629] Support for Protocol Extensions in Relation and Expression
- Adds plugin-based extension mechanism for custom Relation/Expression in Spark Connect.
- [SPARK-41785] Implement
GroupedData.mean
- Adds the
mean
function to grouped data in Spark Connect.
- Adds the
- [SPARK-41069] Implement
DataFrame.approxQuantile
andDataFrame.stat.approxQuantile
- Adds
approxQuantile
for Spark Connect DataFrame/stat in Python.
- Adds
- [SPARK-41065] Implement
DataFrame.freqItems
andDataFrame.stat.freqItems
- Adds
freqItems
to Spark Connect DataFrame in Python.
- Adds
- [SPARK-41066] Implement
DataFrame.sampleBy
andDataFrame.stat.sampleBy
- Adds
sampleBy
to Spark Connect DataFrame in Python.
- Adds
- [SPARK-41810] Infer names from a list of dictionaries in
SparkSession.createDataFrame
- Improves column name inference when creating DataFrames from lists of dictionaries in Spark Connect.
- [SPARK-41803] Add missing function
log(arg1, arg2)
- Implements two-argument
log(base, expr)
in Spark Connect Python.
- Implements two-argument
- [SPARK-41383][SPARK-41692][SPARK-41693] Implement
rollup
,cube
, andpivot
- Adds
DataFrame.rollup
,DataFrame.cube
, andpivot
to Spark Connect.
- Adds
- [SPARK-41333][SPARK-41737] Implement
GroupedData.{min, max, avg, sum}
- Implements the standard aggregate functions on grouped data for Spark Connect.
- [SPARK-45680] Release session
- Introduces
ReleaseSession
RPC to cancel all running jobs and remove the session server side.
- Introduces
- [SPARK-45851] Support multiple policies in scala client
- Adds multiple retry policies to the Scala Spark Connect client.
- [SPARK-45990][SPARK-45987] Upgrade protobuf to 4.25.1 for Python 3.11 support
- Updates protobuf library to fix issues under Python 3.11.
- [SPARK-46202] Expose new
ArtifactManager
APIs to support custom target directories- Allows adding artifacts with a custom directory structure to remote Spark Connect sessions.
- [SPARK-46284] Add
session_user
function to Python- Exposes the
session_user
function in PySpark for Connect, matching Scala parity.
- Exposes the
- [SPARK-46039] Upgrade
grpcio\*
to 1.59.3 for Python 3.12- Updates gRPC libraries to support Python 3.12 and new grpc-inprocess.
- [SPARK-46048] Support
DataFrame.groupingSets
in Python Spark Connect- Allows calling
df.groupingSets(...)
in Python Spark Connect for multi-dimensional grouping.
- Allows calling
- [SPARK-46085]
Dataset.groupingSets
in Scala Spark Connect client- Adds
groupingSets(...)
to Spark Connect in Scala.
- Adds
- [SPARK-46229] Add
applyInArrow
togroupBy
andcogroup
in Spark Connect- Implements applyInArrow in Spark Connect for grouped/cogrouped DataFrame operations.
- [SPARK-46255] Support complex type -> string conversion
- Allows string conversion of complex (list/struct) types in Spark Connect Python.
- [SPARK-45770] Introduce plan
DataFrameDropColumns
forDataframe.drop
- [SPARK-45733] Support multiple retry policies
- [SPARK-45485] User agent improvements: Use
SPARK_CONNECT_USER_AGENT
env variable and include environment specific attributes - [SPARK-44753] XML: pyspark SQL XML reader/writer
- [SPARK-45619] Apply the observed metrics to Observation object
- [SPARK-45088] Make
getitem
work with duplicated columns - [SPARK-45091] Function
floor
/round
/bround
now accept Column type scale - [SPARK-45143] Make PySpark compatible with PyArrow 13.0.0
- [SPARK-44788] Add
from_xml
andschema_of_xml
to pyspark, Spark Connect, and SQL functions - [SPARK-45137] Support map/array parameters in parameterized sql()
- [SPARK-45235] Support
map and array
parameters by sql() - [SPARK-43662] Support
merge_asof
in Spark Connect - [SPARK-45121] Support
Series.empty
for Spark Connect - [SPARK-45090]
DataFrame.{cube, rollup}
support column ordinals - [SPARK-45136] Enhance
ClosureCleaner
with Ammonite support - [SPARK-45506] Add ivy URI support to SparkcConnect
addArtifact
- [SPARK-43704] Support
MultiIndex
forto_series()
in Spark Connect - [SPARK-44807] Add
Dataset.metadataColumn
to Scala Client - [SPARK-44877] Support python protobuf functions for Spark Connect
- [SPARK-44750] Apply configuration to
SparkSession
during creation - [SPARK-45000] Implement
DataFrame.foreach
- [SPARK-45001] Implement
DataFrame.foreachPartition
- [SPARK-44740] Support specifying
session_id
inSPARK_REMOTE
connection string - [SPARK-44747] Add missing
SparkSession.Builder
methods - [SPARK-44731] Make
TimestampNTZ
work with literals in Python Spark Connect - [SPARK-44761] Support
DataStreamWriter.foreachBatch(VoidFunction2)
- [SPARK-44625]
SparkConnectExecutionManager
to track all executions - [SPARK-44736] Add
Dataset.explode
to Spark Connect Scala Client - [SPARK-42664] Support
bloomFilter
function forDataFrameStatFunction
s - [SPARK-48831] Align default cast column name with Spark Classic (Connect)
- [SPARK-48272]
timestamp_diff
function added (Connect duplicate above) - [SPARK-48369]
timestamp_add
function added (Connect duplicate above) - [SPARK-48336]
ps.sql
in Spark Connect (duplicate) - [SPARK-48370] Checkpoint in Scala Connect client (duplicate above)
- [SPARK-47545]
Dataset.observe
for Scala Connect (duplicate) - [SPARK-45509] Fix df column reference behavior for Spark Connect Aligns column resolution in Spark Connect with classic Spark and provides better error messages.
System environment
- Operating System: Ubuntu 24.04.2 LTS
- Java: Zulu17.54+21-CA
- Scala: 2.13.16
- Python: 3.12.3
- R: 4.4.2
- Delta Lake: 3.3.1
Installed Python libraries
Library | Version | Library | Version | Library | Version |
---|---|---|---|---|---|
annotated-types | 0.7.0 | anyio | 4.6.2 | argon2-cffi | 21.3.0 |
argon2-cffi-bindings | 21.2.0 | arrow | 1.3.0 | asttokens | 2.0.5 |
astunparse | 1.6.3 | async-lru | 2.0.4 | attrs | 24.3.0 |
autocommand | 2.2.2 | azure-common | 1.1.28 | azure-core | 1.34.0 |
azure-identity | 1.20.0 | azure-mgmt-core | 1.5.0 | azure-mgmt-web | 8.0.0 |
azure-storage-blob | 12.23.0 | azure-storage-file-datalake | 12.17.0 | babel | 2.16.0 |
backports.tarfile | 1.2.0 | beautifulsoup4 | 4.12.3 | black | 24.10.0 |
bleach | 6.2.0 | blinker | 1.7.0 | boto3 | 1.36.2 |
botocore | 1.36.3 | cachetools | 5.5.1 | certifi | 2025.1.31 |
cffi | 1.17.1 | chardet | 4.0.0 | charset-normalizer | 3.3.2 |
click | 8.1.7 | cloudpickle | 3.0.0 | comm | 0.2.1 |
contourpy | 1.3.1 | cryptography | 43.0.3 | cycler | 0.11.0 |
Cython | 3.0.12 | databricks-sdk | 0.49.0 | dbus-python | 1.3.2 |
debugpy | 1.8.11 | decorator | 5.1.1 | defusedxml | 0.7.1 |
Deprecated | 1.2.13 | distlib | 0.3.9 | docstring-to-markdown | 0.11 |
executing | 0.8.3 | facets-overview | 1.1.1 | fastapi | 0.115.12 |
fastjsonschema | 2.21.1 | filelock | 3.18.0 | fonttools | 4.55.3 |
fqdn | 1.5.1 | fsspec | 2023.5.0 | gitdb | 4.0.11 |
GitPython | 3.1.43 | google-api-core | 2.20.0 | google-auth | 2.40.0 |
google-cloud-core | 2.4.3 | google-cloud-storage | 3.1.0 | google-crc32c | 1.7.1 |
google-resumable-media | 2.7.2 | googleapis-common-protos | 1.65.0 | grpcio | 1.67.0 |
grpcio-status | 1.67.0 | h11 | 0.14.0 | httpcore | 1.0.2 |
httplib2 | 0.20.4 | httpx | 0.27.0 | idna | 3.7 |
importlib-metadata | 6.6.0 | importlib_resources | 6.4.0 | inflect | 7.3.1 |
iniconfig | 1.1.1 | ipyflow-core | 0.0.209 | ipykernel | 6.29.5 |
ipython | 8.30.0 | ipython-genutils | 0.2.0 | ipywidgets | 7.8.1 |
isodate | 0.6.1 | isoduration | 20.11.0 | jaraco.context | 5.3.0 |
jaraco.functools | 4.0.1 | jaraco.text | 3.12.1 | jedi | 0.19.2 |
Jinja2 | 3.1.5 | jmespath | 1.0.1 | joblib | 1.4.2 |
json5 | 0.9.25 | jsonpointer | 3.0.0 | jsonschema | 4.23.0 |
jsonschema-specifications | 2023.7.1 | jupyter-events | 0.10.0 | jupyter-lsp | 2.2.0 |
jupyter_client | 8.6.3 | jupyter_core | 5.7.2 | jupyter_server | 2.14.1 |
jupyter_server_terminals | 0.4.4 | jupyterlab | 4.3.4 | jupyterlab-pygments | 0.1.2 |
jupyterlab-widgets | 1.0.0 | jupyterlab_server | 2.27.3 | kiwisolver | 1.4.8 |
launchpadlib | 1.11.0 | lazr.restfulclient | 0.14.6 | lazr.uri | 1.0.6 |
markdown-it-py | 2.2.0 | MarkupSafe | 3.0.2 | matplotlib | 3.10.0 |
matplotlib-inline | 0.1.7 | mccabe | 0.7.0 | mdurl | 0.1.0 |
mistune | 2.0.4 | mlflow-skinny | 2.22.0 | mmh3 | 5.1.0 |
more-itertools | 10.3.0 | msal | 1.32.3 | msal-extensions | 1.3.1 |
mypy-extensions | 1.0.0 | nbclient | 0.8.0 | nbconvert | 7.16.4 |
nbformat | 5.10.4 | nest-asyncio | 1.6.0 | nodeenv | 1.9.1 |
notebook | 7.3.2 | notebook_shim | 0.2.3 | numpy | 2.1.3 |
oauthlib | 3.2.2 | opentelemetry-api | 1.32.1 | opentelemetry-sdk | 1.32.1 |
opentelemetry-semantic-conventions | 0.53b1 | overrides | 7.4.0 | packaging | 24.1 |
pandas | 2.2.3 | pandocfilters | 1.5.0 | parso | 0.8.4 |
pathspec | 0.10.3 | patsy | 1.0.1 | pexpect | 4.8.0 |
pillow | 11.1.0 | pip | 24.2 | platformdirs | 3.10.0 |
plotly | 5.24.1 | pluggy | 1.5.0 | prometheus_client | 0.21.0 |
prompt-toolkit | 3.0.43 | proto-plus | 1.26.1 | protobuf | 5.29.4 |
psutil | 5.9.0 | psycopg2 | 2.9.3 | ptyprocess | 0.7.0 |
pure-eval | 0.2.2 | pyarrow | 19.0.1 | pyasn1 | 0.4.8 |
pyasn1-modules | 0.2.8 | pyccolo | 0.0.71 | pycparser | 2.21 |
pydantic | 2.10.6 | pydantic_core | 2.27.2 | pyflakes | 3.2.0 |
Pygments | 2.15.1 | PyGObject | 3.48.2 | pyiceberg | 0.9.0 |
PyJWT | 2.10.1 | pyodbc | 5.2.0 | pyparsing | 3.2.0 |
pyright | 1.1.394 | pytest | 8.3.5 | python-dateutil | 2.9.0.post0 |
python-json-logger | 3.2.1 | python-lsp-jsonrpc | 1.1.2 | python-lsp-server | 1.12.0 |
pytoolconfig | 1.2.6 | pytz | 2024.1 | PyYAML | 6.0.2 |
pyzmq | 26.2.0 | referencing | 0.30.2 | requests | 2.32.3 |
rfc3339-validator | 0.1.4 | rfc3986-validator | 0.1.1 | rich | 13.9.4 |
rope | 1.12.0 | rpds-py | 0.22.3 | rsa | 4.9.1 |
s3transfer | 0.11.3 | scikit-learn | 1.6.1 | scipy | 1.15.1 |
seaborn | 0.13.2 | Send2Trash | 1.8.2 | setuptools | 74.0.0 |
six | 1.16.0 | smmap | 5.0.0 | sniffio | 1.3.0 |
sortedcontainers | 2.4.0 | soupsieve | 2.5 | sqlparse | 0.5.3 |
ssh-import-id | 5.11 | stack-data | 0.2.0 | starlette | 0.46.2 |
statsmodels | 0.14.4 | strictyaml | 1.7.3 | tenacity | 9.0.0 |
terminado | 0.17.1 | threadpoolctl | 3.5.0 | tinycss2 | 1.4.0 |
tokenize_rt | 6.1.0 | tomli | 2.0.1 | tornado | 6.4.2 |
traitlets | 5.14.3 | typeguard | 4.3.0 | types-python-dateutil | 2.9.0.20241206 |
typing_extensions | 4.12.2 | tzdata | 2024.1 | ujson | 5.10.0 |
unattended-upgrades | 0.1 | uri-template | 1.3.0 | urllib3 | 2.3.0 |
uvicorn | 0.34.2 | virtualenv | 20.29.3 | wadllib | 1.3.6 |
wcwidth | 0.2.5 | webcolors | 24.11.1 | webencodings | 0.5.1 |
websocket-client | 1.8.0 | whatthepatch | 1.0.2 | wheel | 0.45.1 |
widgetsnbextension | 3.6.6 | wrapt | 1.17.0 | yapf | 0.40.2 |
zipp | 3.21.0 |
Installed R libraries
R libraries are installed from the Posit Package Manager CRAN snapshot on 2025-03-20.
Library | Version | Library | Version | Library | Version |
---|---|---|---|---|---|
arrow | 19.0.1 | askpass | 1.2.1 | assertthat | 0.2.1 |
backports | 1.5.0 | base | 4.4.2 | base64enc | 0.1-3 |
bigD | 0.3.0 | bit | 4.6.0 | bit64 | 4.6.0-1 |
bitops | 1.0-9 | blob | 1.2.4 | boot | 1.3-30 |
brew | 1.0-10 | brio | 1.1.5 | broom | 1.0.7 |
bslib | 0.9.0 | cachem | 1.1.0 | callr | 3.7.6 |
caret | 7.0-1 | cellranger | 1.1.0 | chron | 2.3-62 |
class | 7.3-22 | cli | 3.6.4 | clipr | 0.8.0 |
clock | 0.7.2 | cluster | 2.1.6 | codetools | 0.2-20 |
colorspace | 2.1-1 | commonmark | 1.9.5 | compiler | 4.4.2 |
config | 0.3.2 | conflicted | 1.2.0 | cpp11 | 0.5.2 |
crayon | 1.5.3 | credentials | 2.0.2 | curl | 6.2.1 |
data.table | 1.17.0 | datasets | 4.4.2 | DBI | 1.2.3 |
dbplyr | 2.5.0 | desc | 1.4.3 | devtools | 2.4.5 |
diagram | 1.6.5 | diffobj | 0.3.5 | digest | 0.6.37 |
downlit | 0.4.4 | dplyr | 1.1.4 | dtplyr | 1.3.1 |
e1071 | 1.7-16 | ellipsis | 0.3.2 | evaluate | 1.0.3 |
fansi | 1.0.6 | farver | 2.1.2 | fastmap | 1.2.0 |
fontawesome | 0.5.3 | forcats | 1.0.0 | foreach | 1.5.2 |
foreign | 0.8-86 | forge | 0.2.0 | fs | 1.6.5 |
future | 1.34.0 | future.apply | 1.11.3 | gargle | 1.5.2 |
generics | 0.1.3 | gert | 2.1.4 | ggplot2 | 3.5.1 |
gh | 1.4.1 | git2r | 0.35.0 | gitcreds | 0.1.2 |
glmnet | 4.1-8 | globals | 0.16.3 | glue | 1.8.0 |
googledrive | 2.1.1 | googlesheets4 | 1.1.1 | gower | 1.0.2 |
graphics | 4.4.2 | grDevices | 4.4.2 | grid | 4.4.2 |
gridExtra | 2.3 | gsubfn | 0.7 | gt | 0.11.1 |
gtable | 0.3.6 | hardhat | 1.4.1 | haven | 2.5.4 |
highr | 0.11 | hms | 1.1.3 | htmltools | 0.5.8.1 |
htmlwidgets | 1.6.4 | httpuv | 1.6.15 | httr | 1.4.7 |
httr2 | 1.1.1 | ids | 1.0.1 | ini | 0.3.1 |
ipred | 0.9-15 | isoband | 0.2.7 | iterators | 1.0.14 |
jquerylib | 0.1.4 | jsonlite | 1.9.1 | juicyjuice | 0.1.0 |
KernSmooth | 2.23-22 | knitr | 1.50 | labeling | 0.4.3 |
later | 1.4.1 | lattice | 0.22-5 | lava | 1.8.1 |
lifecycle | 1.0.4 | listenv | 0.9.1 | lubridate | 1.9.4 |
magrittr | 2.0.3 | markdown | 1.13 | MASS | 7.3-60.0.1 |
Matrix | 1.6-5 | memoise | 2.0.1 | methods | 4.4.2 |
mgcv | 1.9-1 | mime | 0.13 | miniUI | 0.1.1.1 |
mlflow | 2.20.4 | ModelMetrics | 1.2.2.2 | modelr | 0.1.11 |
munsell | 0.5.1 | nlme | 3.1-164 | nnet | 7.3-19 |
numDeriv | 2016.8-1.1 | openssl | 2.3.2 | parallel | 4.4.2 |
parallelly | 1.42.0 | pillar | 1.10.1 | pkgbuild | 1.4.6 |
pkgconfig | 2.0.3 | pkgdown | 2.1.1 | pkgload | 1.4.0 |
plogr | 0.2.0 | plyr | 1.8.9 | praise | 1.0.0 |
prettyunits | 1.2.0 | pROC | 1.18.5 | processx | 3.8.6 |
prodlim | 2024.06.25 | profvis | 0.4.0 | progress | 1.2.3 |
progressr | 0.15.1 | promises | 1.3.2 | proto | 1.0.0 |
proxy | 0.4-27 | ps | 1.9.0 | purrr | 1.0.4 |
R6 | 2.6.1 | ragg | 1.3.3 | randomForest | 4.7-1.2 |
rappdirs | 0.3.3 | rcmdcheck | 1.4.0 | RColorBrewer | 1.1-3 |
Rcpp | 1.0.14 | RcppEigen | 0.3.4.0.2 | reactable | 0.4.4 |
reactR | 0.6.1 | readr | 2.1.5 | readxl | 1.4.5 |
recipes | 1.2.0 | rematch | 2.0.0 | rematch2 | 2.1.2 |
remotes | 2.5.0 | reprex | 2.1.1 | reshape2 | 1.4.4 |
rlang | 1.1.5 | rmarkdown | 2.29 | RODBC | 1.3-26 |
roxygen2 | 7.3.2 | rpart | 4.1.23 | rprojroot | 2.0.4 |
Rserve | 1.8-15 | RSQLite | 2.3.9 | rstudioapi | 0.17.1 |
rversions | 2.1.2 | rvest | 1.0.4 | sass | 0.4.9 |
scales | 1.3.0 | selectr | 0.4-2 | sessioninfo | 1.2.3 |
shape | 1.4.6.1 | shiny | 1.10.0 | sourcetools | 0.1.7-1 |
sparklyr | 1.9.0 | SparkR | 4.0.0 | sparsevctrs | 0.3.1 |
spatial | 7.3-17 | splines | 4.4.2 | sqldf | 0.4-11 |
SQUAREM | 2021.1 | stats | 4.4.2 | stats4 | 4.4.2 |
stringi | 1.8.4 | stringr | 1.5.1 | survival | 3.5-8 |
swagger | 5.17.14.1 | sys | 3.4.3 | systemfonts | 1.2.1 |
tcltk | 4.4.2 | testthat | 3.2.3 | textshaping | 1.0.0 |
tibble | 3.2.1 | tidyr | 1.3.1 | tidyselect | 1.2.1 |
tidyverse | 2.0.0 | timechange | 0.3.0 | timeDate | 4041.110 |
tinytex | 0.56 | tools | 4.4.2 | tzdb | 0.5.0 |
urlchecker | 1.0.1 | usethis | 3.1.0 | utf8 | 1.2.4 |
utils | 4.4.2 | uuid | 1.2-1 | V8 | 6.0.2 |
vctrs | 0.6.5 | viridisLite | 0.4.2 | vroom | 1.6.5 |
waldo | 0.6.1 | whisker | 0.4.1 | withr | 3.0.2 |
xfun | 0.51 | xml2 | 1.3.8 | xopen | 1.0.1 |
xtable | 1.8-4 | yaml | 2.3.10 | zeallot | 0.1.0 |
zip | 2.3.2 |
Installed Java and Scala libraries (Scala 2.13 cluster version)
Group ID | Artifact ID | Version |
---|---|---|
antlr | antlr | 2.7.7 |
com.amazonaws | amazon-kinesis-client | 1.12.0 |
com.amazonaws | aws-java-sdk-autoscaling | 1.12.638 |
com.amazonaws | aws-java-sdk-cloudformation | 1.12.638 |
com.amazonaws | aws-java-sdk-cloudfront | 1.12.638 |
com.amazonaws | aws-java-sdk-cloudhsm | 1.12.638 |
com.amazonaws | aws-java-sdk-cloudsearch | 1.12.638 |
com.amazonaws | aws-java-sdk-cloudtrail | 1.12.638 |
com.amazonaws | aws-java-sdk-cloudwatch | 1.12.638 |
com.amazonaws | aws-java-sdk-cloudwatchmetrics | 1.12.638 |
com.amazonaws | aws-java-sdk-codedeploy | 1.12.638 |
com.amazonaws | aws-java-sdk-cognitoidentity | 1.12.638 |
com.amazonaws | aws-java-sdk-cognitosync | 1.12.638 |
com.amazonaws | aws-java-sdk-config | 1.12.638 |
com.amazonaws | aws-java-sdk-core | 1.12.638 |
com.amazonaws | aws-java-sdk-datapipeline | 1.12.638 |
com.amazonaws | aws-java-sdk-directconnect | 1.12.638 |
com.amazonaws | aws-java-sdk-directory | 1.12.638 |
com.amazonaws | aws-java-sdk-dynamodb | 1.12.638 |
com.amazonaws | aws-java-sdk-ec2 | 1.12.638 |
com.amazonaws | aws-java-sdk-ecs | 1.12.638 |
com.amazonaws | aws-java-sdk-efs | 1.12.638 |
com.amazonaws | aws-java-sdk-elasticache | 1.12.638 |
com.amazonaws | aws-java-sdk-elasticbeanstalk | 1.12.638 |
com.amazonaws | aws-java-sdk-elasticloadbalancing | 1.12.638 |
com.amazonaws | aws-java-sdk-elastictranscoder | 1.12.638 |
com.amazonaws | aws-java-sdk-emr | 1.12.638 |
com.amazonaws | aws-java-sdk-glacier | 1.12.638 |
com.amazonaws | aws-java-sdk-glue | 1.12.638 |
com.amazonaws | aws-java-sdk-iam | 1.12.638 |
com.amazonaws | aws-java-sdk-importexport | 1.12.638 |
com.amazonaws | aws-java-sdk-kinesis | 1.12.638 |
com.amazonaws | aws-java-sdk-kms | 1.12.638 |
com.amazonaws | aws-java-sdk-lambda | 1.12.638 |
com.amazonaws | aws-java-sdk-logs | 1.12.638 |
com.amazonaws | aws-java-sdk-machinelearning | 1.12.638 |
com.amazonaws | aws-java-sdk-opsworks | 1.12.638 |
com.amazonaws | aws-java-sdk-rds | 1.12.638 |
com.amazonaws | aws-java-sdk-redshift | 1.12.638 |
com.amazonaws | aws-java-sdk-route53 | 1.12.638 |
com.amazonaws | aws-java-sdk-s3 | 1.12.638 |
com.amazonaws | aws-java-sdk-ses | 1.12.638 |
com.amazonaws | aws-java-sdk-simpledb | 1.12.638 |
com.amazonaws | aws-java-sdk-simpleworkflow | 1.12.638 |
com.amazonaws | aws-java-sdk-sns | 1.12.638 |
com.amazonaws | aws-java-sdk-sqs | 1.12.638 |
com.amazonaws | aws-java-sdk-ssm | 1.12.638 |
com.amazonaws | aws-java-sdk-storagegateway | 1.12.638 |
com.amazonaws | aws-java-sdk-sts | 1.12.638 |
com.amazonaws | aws-java-sdk-support | 1.12.638 |
com.amazonaws | aws-java-sdk-swf-libraries | 1.11.22 |
com.amazonaws | aws-java-sdk-workspaces | 1.12.638 |
com.amazonaws | jmespath-java | 1.12.638 |
com.clearspring.analytics | stream | 2.9.8 |
com.databricks | Rserve | 1.8-3 |
com.databricks | databricks-sdk-java | 0.27.0 |
com.databricks | jets3t | 0.7.1-0 |
com.databricks.scalapb | scalapb-runtime_2.12 | 0.4.15-10 |
com.esotericsoftware | kryo-shaded | 4.0.3 |
com.esotericsoftware | minlog | 1.3.0 |
com.fasterxml | classmate | 1.5.1 |
com.fasterxml.jackson.core | jackson-annotations | 2.18.2 |
com.fasterxml.jackson.core | jackson-core | 2.18.2 |
com.fasterxml.jackson.core | jackson-databind | 2.18.2 |
com.fasterxml.jackson.dataformat | jackson-dataformat-cbor | 2.18.2 |
com.fasterxml.jackson.dataformat | jackson-dataformat-yaml | 2.15.2 |
com.fasterxml.jackson.datatype | jackson-datatype-joda | 2.18.2 |
com.fasterxml.jackson.datatype | jackson-datatype-jsr310 | 2.18.2 |
com.fasterxml.jackson.module | jackson-module-paranamer | 2.18.2 |
com.fasterxml.jackson.module | jackson-module-scala_2.12 | 2.18.2 |
com.github.ben-manes.caffeine | caffeine | 2.9.3 |
com.github.blemale | scaffeine_2.12 | 5.2.1 |
com.github.fommil | jniloader | 1.1 |
com.github.fommil.netlib | native_ref-java | 1.1 |
com.github.fommil.netlib | native_ref-java | 1.1-natives |
com.github.fommil.netlib | native_system-java | 1.1 |
com.github.fommil.netlib | native_system-java | 1.1-natives |
com.github.fommil.netlib | netlib-native_ref-linux-x86_64 | 1.1-natives |
com.github.fommil.netlib | netlib-native_system-linux-x86_64 | 1.1-natives |
com.github.luben | zstd-jni | 1.5.6-10 |
com.github.virtuald | curvesapi | 1.08 |
com.github.wendykierp | JTransforms | 3.1 |
com.google.api.grpc | proto-google-common-protos | 2.5.1 |
com.google.code.findbugs | jsr305 | 3.0.0 |
com.google.code.gson | gson | 2.11.0 |
com.google.crypto.tink | tink | 1.16.0 |
com.google.errorprone | error_prone_annotations | 2.36.0 |
com.google.flatbuffers | flatbuffers-java | 24.3.25 |
com.google.guava | failureaccess | 1.0.2 |
com.google.guava | guava | 33.4.0-jre |
com.google.guava | listenablefuture | 9999.0-empty-to-avoid-conflict-with-guava |
com.google.j2objc | j2objc-annotations | 3.0.0 |
com.google.protobuf | protobuf-java | 3.25.5 |
com.google.protobuf | protobuf-java-util | 3.25.5 |
com.helger | profiler | 1.1.1 |
com.ibm.icu | icu4j | 75.1 |
com.jcraft | jsch | 0.1.55 |
com.lihaoyi | sourcecode_2.12 | 0.1.9 |
com.microsoft.azure | azure-data-lake-store-sdk | 2.3.10 |
com.microsoft.sqlserver | mssql-jdbc | 12.8.0.jre11 |
com.microsoft.sqlserver | mssql-jdbc | 12.8.0.jre8 |
com.ning | compress-lzf | 1.1.2 |
com.sun.mail | javax.mail | 1.5.2 |
com.sun.xml.bind | jaxb-core | 2.2.11 |
com.sun.xml.bind | jaxb-impl | 2.2.11 |
com.tdunning | json | 1.8 |
com.thoughtworks.paranamer | paranamer | 2.8 |
com.trueaccord.lenses | lenses_2.12 | 0.4.12 |
com.twitter | chill-java | 0.10.0 |
com.twitter | chill_2.12 | 0.10.0 |
com.twitter | util-app_2.12 | 7.1.0 |
com.twitter | util-core_2.12 | 7.1.0 |
com.twitter | util-function_2.12 | 7.1.0 |
com.twitter | util-jvm_2.12 | 7.1.0 |
com.twitter | util-lint_2.12 | 7.1.0 |
com.twitter | util-registry_2.12 | 7.1.0 |
com.twitter | util-stats_2.12 | 7.1.0 |
com.typesafe | config | 1.4.3 |
com.typesafe.scala-logging | scala-logging_2.12 | 3.7.2 |
com.uber | h3 | 3.7.3 |
com.univocity | univocity-parsers | 2.9.1 |
com.zaxxer | HikariCP | 4.0.3 |
com.zaxxer | SparseBitSet | 1.3 |
commons-cli | commons-cli | 1.9.0 |
commons-codec | commons-codec | 1.17.2 |
commons-collections | commons-collections | 3.2.2 |
commons-dbcp | commons-dbcp | 1.4 |
commons-fileupload | commons-fileupload | 1.5 |
commons-httpclient | commons-httpclient | 3.1 |
commons-io | commons-io | 2.18.0 |
commons-lang | commons-lang | 2.6 |
commons-logging | commons-logging | 1.1.3 |
commons-pool | commons-pool | 1.5.4 |
dev.ludovic.netlib | arpack | 3.0.3 |
dev.ludovic.netlib | blas | 3.0.3 |
dev.ludovic.netlib | lapack | 3.0.3 |
info.ganglia.gmetric4j | gmetric4j | 1.0.10 |
io.airlift | aircompressor | 2.0.2 |
io.delta | delta-sharing-client_2.12 | 1.3.0 |
io.dropwizard.metrics | metrics-annotation | 4.2.30 |
io.dropwizard.metrics | metrics-core | 4.2.30 |
io.dropwizard.metrics | metrics-graphite | 4.2.30 |
io.dropwizard.metrics | metrics-healthchecks | 4.2.30 |
io.dropwizard.metrics | metrics-jetty9 | 4.2.30 |
io.dropwizard.metrics | metrics-jmx | 4.2.30 |
io.dropwizard.metrics | metrics-json | 4.2.30 |
io.dropwizard.metrics | metrics-jvm | 4.2.30 |
io.dropwizard.metrics | metrics-servlets | 4.2.30 |
io.netty | netty-all | 4.1.118.Final |
io.netty | netty-buffer | 4.1.118.Final |
io.netty | netty-codec | 4.1.118.Final |
io.netty | netty-codec-http | 4.1.118.Final |
io.netty | netty-codec-http2 | 4.1.118.Final |
io.netty | netty-codec-socks | 4.1.118.Final |
io.netty | netty-common | 4.1.118.Final |
io.netty | netty-handler | 4.1.118.Final |
io.netty | netty-handler-proxy | 4.1.118.Final |
io.netty | netty-resolver | 4.1.118.Final |
io.netty | netty-tcnative-boringssl-static | 2.0.70.Final-db-r0-linux-aarch_64 |
io.netty | netty-tcnative-boringssl-static | 2.0.70.Final-db-r0-linux-x86_64 |
io.netty | netty-tcnative-boringssl-static | 2.0.70.Final-db-r0-osx-aarch_64 |
io.netty | netty-tcnative-boringssl-static | 2.0.70.Final-db-r0-osx-x86_64 |
io.netty | netty-tcnative-boringssl-static | 2.0.70.Final-db-r0-windows-x86_64 |
io.netty | netty-tcnative-classes | 2.0.70.Final |
io.netty | netty-transport | 4.1.118.Final |
io.netty | netty-transport-classes-epoll | 4.1.118.Final |
io.netty | netty-transport-classes-kqueue | 4.1.118.Final |
io.netty | netty-transport-native-epoll | 4.1.118.Final |
io.netty | netty-transport-native-epoll | 4.1.118.Final-linux-aarch_64 |
io.netty | netty-transport-native-epoll | 4.1.118.Final-linux-riscv64 |
io.netty | netty-transport-native-epoll | 4.1.118.Final-linux-x86_64 |
io.netty | netty-transport-native-kqueue | 4.1.118.Final-osx-aarch_64 |
io.netty | netty-transport-native-kqueue | 4.1.118.Final-osx-x86_64 |
io.netty | netty-transport-native-unix-common | 4.1.118.Final |
io.prometheus | simpleclient | 0.16.1-databricks |
io.prometheus | simpleclient_common | 0.16.1-databricks |
io.prometheus | simpleclient_dropwizard | 0.16.1-databricks |
io.prometheus | simpleclient_pushgateway | 0.16.1-databricks |
io.prometheus | simpleclient_servlet | 0.16.1-databricks |
io.prometheus | simpleclient_servlet_common | 0.16.1-databricks |
io.prometheus | simpleclient_tracer_common | 0.16.1-databricks |
io.prometheus | simpleclient_tracer_otel | 0.16.1-databricks |
io.prometheus | simpleclient_tracer_otel_agent | 0.16.1-databricks |
io.prometheus.jmx | collector | 0.18.0 |
jakarta.annotation | jakarta.annotation-api | 1.3.5 |
jakarta.servlet | jakarta.servlet-api | 4.0.3 |
jakarta.validation | jakarta.validation-api | 2.0.2 |
jakarta.ws.rs | jakarta.ws.rs-api | 2.1.6 |
javax.activation | activation | 1.1.1 |
javax.annotation | javax.annotation-api | 1.3.2 |
javax.el | javax.el-api | 2.2.4 |
javax.jdo | jdo-api | 3.0.1 |
javax.transaction | jta | 1.1 |
javax.transaction | transaction-api | 1.1 |
javax.xml.bind | jaxb-api | 2.2.11 |
javolution | javolution | 5.5.1 |
jline | jline | 2.14.6 |
joda-time | joda-time | 2.13.0 |
net.java.dev.jna | jna | 5.8.0 |
net.razorvine | pickle | 1.5 |
net.sf.jpam | jpam | 1.1 |
net.sf.opencsv | opencsv | 2.3 |
net.sf.supercsv | super-csv | 2.2.0 |
net.snowflake | snowflake-ingest-sdk | 0.9.6 |
net.sourceforge.f2j | arpack_combined_all | 0.1 |
org.acplt.remotetea | remotetea-oncrpc | 1.1.2 |
org.antlr | ST4 | 4.0.4 |
org.antlr | antlr-runtime | 3.5.2 |
org.antlr | antlr4-runtime | 4.13.1 |
org.antlr | stringtemplate | 3.2.1 |
org.apache.ant | ant | 1.10.11 |
org.apache.ant | ant-jsch | 1.10.11 |
org.apache.ant | ant-launcher | 1.10.11 |
org.apache.arrow | arrow-format | 18.2.0 |
org.apache.arrow | arrow-memory-core | 18.2.0 |
org.apache.arrow | arrow-memory-netty | 18.2.0 |
org.apache.arrow | arrow-memory-netty-buffer-patch | 18.2.0 |
org.apache.arrow | arrow-vector | 18.2.0 |
org.apache.avro | avro | 1.12.0 |
org.apache.avro | avro-ipc | 1.12.0 |
org.apache.avro | avro-mapred | 1.12.0 |
org.apache.commons | commons-collections4 | 4.4 |
org.apache.commons | commons-compress | 1.27.1 |
org.apache.commons | commons-crypto | 1.1.0 |
org.apache.commons | commons-lang3 | 3.17.0 |
org.apache.commons | commons-math3 | 3.6.1 |
org.apache.commons | commons-text | 1.13.0 |
org.apache.curator | curator-client | 5.7.1 |
org.apache.curator | curator-framework | 5.7.1 |
org.apache.curator | curator-recipes | 5.7.1 |
org.apache.datasketches | datasketches-java | 6.1.1 |
org.apache.datasketches | datasketches-memory | 3.0.2 |
org.apache.derby | derby | 10.14.2.0 |
org.apache.hadoop | hadoop-client-runtime | 3.4.1 |
org.apache.hive | hive-beeline | 2.3.10 |
org.apache.hive | hive-cli | 2.3.10 |
org.apache.hive | hive-jdbc | 2.3.10 |
org.apache.hive | hive-llap-client | 2.3.10 |
org.apache.hive | hive-llap-common | 2.3.10 |
org.apache.hive | hive-serde | 2.3.10 |
org.apache.hive | hive-shims | 2.3.10 |
org.apache.hive | hive-storage-api | 2.8.1 |
org.apache.hive.shims | hive-shims-0.23 | 2.3.10 |
org.apache.hive.shims | hive-shims-common | 2.3.10 |
org.apache.hive.shims | hive-shims-scheduler | 2.3.10 |
org.apache.httpcomponents | httpclient | 4.5.14 |
org.apache.httpcomponents | httpcore | 4.4.16 |
org.apache.ivy | ivy | 2.5.3 |
org.apache.logging.log4j | log4j-1.2-api | 2.24.3 |
org.apache.logging.log4j | log4j-api | 2.24.3 |
org.apache.logging.log4j | log4j-core | 2.24.3 |
org.apache.logging.log4j | log4j-layout-template-json | 2.24.3 |
org.apache.logging.log4j | log4j-slf4j2-impl | 2.24.3 |
org.apache.orc | orc-core | 2.1.1-shaded-protobuf |
org.apache.orc | orc-format | 1.1.0-shaded-protobuf |
org.apache.orc | orc-mapreduce | 2.1.1-shaded-protobuf |
org.apache.orc | orc-shims | 2.1.1 |
org.apache.poi | poi | 5.4.1 |
org.apache.poi | poi-ooxml | 5.4.1 |
org.apache.poi | poi-ooxml-full | 5.4.1 |
org.apache.poi | poi-ooxml-lite | 5.4.1 |
org.apache.thrift | libfb303 | 0.9.3 |
org.apache.thrift | libthrift | 0.16.0 |
org.apache.ws.xmlschema | xmlschema-core | 2.3.1 |
org.apache.xbean | xbean-asm9-shaded | 4.26 |
org.apache.xmlbeans | xmlbeans | 5.3.0 |
org.apache.yetus | audience-annotations | 0.13.0 |
org.apache.zookeeper | zookeeper | 3.9.3 |
org.apache.zookeeper | zookeeper-jute | 3.9.3 |
org.checkerframework | checker-qual | 3.43.0 |
org.codehaus.janino | commons-compiler | 3.0.16 |
org.codehaus.janino | janino | 3.0.16 |
org.datanucleus | datanucleus-api-jdo | 4.2.4 |
org.datanucleus | datanucleus-core | 4.1.17 |
org.datanucleus | datanucleus-rdbms | 4.1.19 |
org.datanucleus | javax.jdo | 3.2.0-m3 |
org.eclipse.jetty | jetty-client | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-continuation | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-http | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-io | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-jndi | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-plus | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-proxy | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-security | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-server | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-servlet | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-servlets | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-util | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-util-ajax | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-webapp | 9.4.53.v20231009 |
org.eclipse.jetty | jetty-xml | 9.4.53.v20231009 |
org.eclipse.jetty.websocket | websocket-api | 9.4.53.v20231009 |
org.eclipse.jetty.websocket | websocket-client | 9.4.53.v20231009 |
org.eclipse.jetty.websocket | websocket-common | 9.4.53.v20231009 |
org.eclipse.jetty.websocket | websocket-server | 9.4.53.v20231009 |
org.eclipse.jetty.websocket | websocket-servlet | 9.4.53.v20231009 |
org.fusesource.leveldbjni | leveldbjni-all | 1.8 |
org.glassfish.hk2 | hk2-api | 2.6.1 |
org.glassfish.hk2 | hk2-locator | 2.6.1 |
org.glassfish.hk2 | hk2-utils | 2.6.1 |
org.glassfish.hk2 | osgi-resource-locator | 1.0.3 |
org.glassfish.hk2.external | aopalliance-repackaged | 2.6.1 |
org.glassfish.hk2.external | jakarta.inject | 2.6.1 |
org.glassfish.jersey.containers | jersey-container-servlet | 2.41 |
org.glassfish.jersey.containers | jersey-container-servlet-core | 2.41 |
org.glassfish.jersey.core | jersey-client | 2.41 |
org.glassfish.jersey.core | jersey-common | 2.41 |
org.glassfish.jersey.core | jersey-server | 2.41 |
org.glassfish.jersey.inject | jersey-hk2 | 2.41 |
org.hibernate.validator | hibernate-validator | 6.2.5.Final |
org.ini4j | ini4j | 0.5.4 |
org.javassist | javassist | 3.29.2-GA |
org.jboss.logging | jboss-logging | 3.4.1.Final |
org.jdbi | jdbi | 2.63.1 |
org.jetbrains | annotations | 17.0.0 |
org.joda | joda-convert | 1.7 |
org.jodd | jodd-core | 3.5.2 |
org.json4s | json4s-ast_2.12 | 4.0.7 |
org.json4s | json4s-core_2.12 | 4.0.7 |
org.json4s | json4s-jackson-core_2.12 | 4.0.7 |
org.json4s | json4s-jackson_2.12 | 4.0.7 |
org.json4s | json4s-scalap_2.12 | 4.0.7 |
org.lz4 | lz4-java | 1.8.0-databricks-1 |
org.mlflow | mlflow-spark_2.12 | 2.9.1 |
org.objenesis | objenesis | 3.3 |
org.postgresql | postgresql | 42.6.1 |
org.roaringbitmap | RoaringBitmap | 1.2.1 |
org.rocksdb | rocksdbjni | 9.8.4 |
org.rosuda.REngine | REngine | 2.1.0 |
org.scala-lang | scala-compiler_2.12 | 2.12.15 |
org.scala-lang | scala-library_2.12 | 2.12.15 |
org.scala-lang | scala-reflect_2.12 | 2.12.15 |
org.scala-lang.modules | scala-collection-compat_2.12 | 2.11.0 |
org.scala-lang.modules | scala-java8-compat_2.12 | 0.9.1 |
org.scala-lang.modules | scala-parser-combinators_2.12 | 2.4.0 |
org.scala-lang.modules | scala-xml_2.12 | 2.3.0 |
org.scala-sbt | test-interface | 1.0 |
org.scalacheck | scalacheck_2.12 | 1.18.0 |
org.scalactic | scalactic_2.12 | 3.2.19 |
org.scalanlp | breeze-macros_2.12 | 2.1.0 |
org.scalanlp | breeze_2.12 | 2.1.0 |
org.scalatest | scalatest-compatible | 3.2.19 |
org.scalatest | scalatest-core_2.12 | 3.2.19 |
org.scalatest | scalatest-diagrams_2.12 | 3.2.19 |
org.scalatest | scalatest-featurespec_2.12 | 3.2.19 |
org.scalatest | scalatest-flatspec_2.12 | 3.2.19 |
org.scalatest | scalatest-freespec_2.12 | 3.2.19 |
org.scalatest | scalatest-funspec_2.12 | 3.2.19 |
org.scalatest | scalatest-funsuite_2.12 | 3.2.19 |
org.scalatest | scalatest-matchers-core_2.12 | 3.2.19 |
org.scalatest | scalatest-mustmatchers_2.12 | 3.2.19 |
org.scalatest | scalatest-propspec_2.12 | 3.2.19 |
org.scalatest | scalatest-refspec_2.12 | 3.2.19 |
org.scalatest | scalatest-shouldmatchers_2.12 | 3.2.19 |
org.scalatest | scalatest-wordspec_2.12 | 3.2.19 |
org.scalatest | scalatest_2.12 | 3.2.19 |
org.slf4j | jcl-over-slf4j | 2.0.16 |
org.slf4j | jul-to-slf4j | 2.0.16 |
org.slf4j | slf4j-api | 2.0.16 |
org.slf4j | slf4j-simple | 1.7.25 |
org.threeten | threeten-extra | 1.8.0 |
org.tukaani | xz | 1.10 |
org.typelevel | algebra_2.12 | 2.0.1 |
org.typelevel | cats-kernel_2.12 | 2.1.1 |
org.typelevel | spire-macros_2.12 | 0.17.0 |
org.typelevel | spire-platform_2.12 | 0.17.0 |
org.typelevel | spire-util_2.12 | 0.17.0 |
org.typelevel | spire_2.12 | 0.17.0 |
org.wildfly.openssl | wildfly-openssl | 1.1.3.Final |
org.xerial | sqlite-jdbc | 3.42.0.0 |
org.xerial.snappy | snappy-java | 1.1.10.3 |
org.yaml | snakeyaml | 2.0 |
oro | oro | 2.0.8 |
pl.edu.icm | JLargeArrays | 1.5 |
software.amazon.cryptools | AmazonCorrettoCryptoProvider | 2.4.1-linux-x86_64 |
stax | stax-api | 1.0.1 |
To see release notes for Databricks Runtime versions that have reached end-of-support (EoS), see End-of-support Databricks Runtime release notes. The EoS Databricks Runtime versions have been retired and might not be updated.