Databricks Runtime 17.0 (Beta)

Beta

Databricks Runtime 17.0 is in Beta. The contents of the supported environments might change during the Beta. Changes can include the list of packages or versions of installed packages.

The following release notes provide information about Databricks Runtime 17.0 (Beta), powered by Apache Spark 4.0.0.

Databricks released this beta version in May 2025.

tip

To see release notes for Databricks Runtime versions that have reached end-of-support (EoS), see End-of-support Databricks Runtime release notes. The EoS Databricks Runtime versions have been retired and might not be updated.

DBR 17.0 (Beta) new and updated features

SQL procedure support
Set a default collation for SQL Functions
Recursive common table expressions (rCTE) support
ANSI SQL enabled by default
PySpark and Spark Connect now support the DataFrames df.mergeInto API
Support ALL CATALOGS in SHOW SCHEMAS
Liquid clustering now compacts deletion vectors more efficiently
Allow non-deterministic expressions in UPDATE/INSERT column values for MERGE operations
Ignore and rescue empty structs for AutoLoader ingestion (especially Avro)
Change Delta MERGE Python and Scala APIs to return DataFrame instead of Unit

SQL procedure support

SQL scripts can now be encapsulated in a procedure stored as a reusable asset in Unity Catalog. You can create a procedure using the CREATE PROCEDURE command, and then call it using the CALL command.

Set a default collation for SQL Functions

Using the new DEFAULT COLLATION clause in the CREATE FUNCTION command defines the default collation used for STRING parameters, the return type, and STRING literals in the function body.

Recursive common table expressions (rCTE) support

Databricks now supports navigation of hierarchical data using recursive common table expressions (rCTEs). Use a self-referencing CTE with UNION ALL to follow the recursive relationship.

ANSI SQL enabled by default

The default SQL dialect is now ANSI SQL. ANSI SQL is a well-established standard and will help protect users from unexpected or incorrect results. Read the Databricks ANSI enablement guide for more information.

PySpark and Spark Connect now support the DataFrames `df.mergeInto` API

PySpark and Spark Connect now support the df.mergeInto API, which was previously only available for Scala.

Support `ALL CATALOGS` in `SHOW` SCHEMAS

The SHOW SCHEMAS syntax is updated to accept the following syntax:

SHOW SCHEMAS [ { FROM | IN } { catalog_name | ALL CATALOGS } ] [ [ LIKE ] pattern ]

When ALL CATALOGS is specified in a a SHOW query, the execution iterates through all active catalogs that support namespaces using the catalog manager (DsV2). For each catalog, it includes the top-level namespaces.

The output attributes and schema of the command have been modified to add a catalog column indicating the catalog of the corresponding namespace. The new column is added to the end of the output attributes, as shown below:

Previous output

| Namespace        |
|------------------|
| test-namespace-1 |
| test-namespace-2 |

New output

| Namespace        | Catalog        |
|------------------|----------------|
| test-namespace-1 | test-catalog-1 |
| test-namespace-2 | test-catalog-2 |

Liquid clustering now compacts deletion vectors more efficiently

Delta tables with Liquid clustering now apply physical changes from deletion vectors more efficiently when OPTIMIZE is running. For more details, see Apply changes to Parquet data files.

Allow non-deterministic expressions in `UPDATE`/`INSERT` column values for `MERGE` operations

Databricks now allows the use of non-deterministic expressions in updated and inserted column values of MERGE operations. However, non-deterministic expressions in the conditions of MERGE statements are not supported.

For example, you can now generate dynamic or random values for columns:

MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED THEN UPDATE SET target.value = source.value + rand()

This can be helpful for data privacy to obfuscate actual data while preserving the data properties (such as mean values or other computed columns).

Ignore and rescue empty structs for AutoLoader ingestion (especially Avro)

Auto Loader now rescues Avro data types with an empty schema since Delta table does not support ingestiom of empty struct-type data.

Change Delta MERGE Python and Scala APIs to return DataFrame instead of Unit

The Scala and Python MERGE APIs (such as DeltaMergeBuilder) now also return a DataFrame like the SQL API does, with the same results.

Behavioral changes

Behavioral change for the Auto Loader incremental directory listing option
Removed the "True cache misses" section in Spark UI
Removed the "Cache Metadata Manager Peak Disk Usage" metric in the Spark UI
Removed the "Rescheduled cache miss bytes" section in the Spark UI
CREATE VIEW column-level clauses now throw errors when the clause would only apply to materialized views

Behavioral change for the Auto Loader incremental directory listing option

The value of the deprecated Auto Loader cloudFiles.useIncrementalListing option is now set to a default value of false . As a result, this change causes Auto Loader to perform a full directory listing each time it's run. Previously, the default value of the cloudFiles.useIncrementalListing option was auto, instructing Auto Loader to make a best-effort attempt at detecting if an incremental listing can be used with a directory.

Databricks recommends against using this option. Instead, use file notification mode with file events. If you want to continue to use the incremental listing feature, set cloudFiles.useIncrementalListing to auto in your code. When you set this value to auto, Auto Loader makes a best-effort attempt to do a full listing once every seven incremental listings, which matches the behavior of this option before this change.

To learn more about Auto Loader directory listing, see Auto Loader streams with directory listing mode.

Removed the "True cache misses" section in Spark UI

This changes removes support for the "Cache true misses size" metric (for both compressed and uncompressed caches). The "Cache writes misses" metric measures the same information.

Use the numLocalScanTasks as a viable proxy for this metric, when your intention is to see how the cache performs when files are assigned to the right executor.

Removed the "Cache Metadata Manager Peak Disk Usage" metric in the Spark UI

This change removes support for the cacheLocalityMgrDiskUsageInBytes and cacheLocalityMgrTimeMs metrics from the Databricks Runtime and the Spark UI.

Removed the "Rescheduled cache miss bytes" section in the Spark UI

Removed the cache rescheduled misses size and cache rescheduled misses size (uncompressed) metrics from DBR. This is done because this measures how the cache performs when files are assigned to non-preferred executors. numNonLocalScanTasks is a good proxy for this metric.

`CREATE VIEW` column-level clauses now throw errors when the clause would only apply to materialized views

CREATE VIEW commands which specify a column-level clause that is only valid for MATERIALIZED VIEWs now throw an error. The affected clauses for CREATE VIEW commands are:

NOT NULL
A specified datatype, such as FLOAT or STRING
DEFAULT
COLUMN MASK

Library upgrades

Apache Spark

Many of its features were already available in Databricks Runtime 14.x, 15.x and 16.x, and now they ship out of the box with Runtime 17.0.

Core and Spark SQL highlights

[SPARK-45923] Spark Kubernetes Operator
[SPARK-45869] Revisit and Improve Spark Standalone Cluster
[SPARK-42849] Session Variables
[SPARK-44444] Use ANSI SQL mode by default
[SPARK-46057] Support SQL user-defined functions
[SPARK-45827] Add Variant data type in Spark
[SPARK-49555] SQL Pipe Syntax
[SPARK-46830] String Collation support
[SPARK-44265] Built-in XML data source support

Spark Core

[SPARK-49524] Improve K8s support
[SPARK-47240] SPIP: Structured Logging Framework for Apache Spark
[SPARK-44893] ThreadInfo improvements for monitoring APIs
[SPARK-46861] Avoid Deadlock in DAGScheduler
[SPARK-47764] Cleanup shuffle dependencies based on ShuffleCleanupMode
[SPARK-49459] Support CRC32C for Shuffle Checksum
[SPARK-46383] Reduce Driver Heap Usage by shortening TaskInfo.accumulables() lifespan
[SPARK-45527] Use fraction-based resource calculation
[SPARK-47172] Add AES-GCM as an optional AES cipher mode for RPC encryption
[SPARK-47448] Enable spark.shuffle.service.removeShuffle by default
[SPARK-47674] Enable spark.metrics.appStatusSource.enabled by default
[SPARK-48063] Enable spark.stage.ignoreDecommissionFetchFailure by default
[SPARK-48268] Add spark.checkpoint.dir config
[SPARK-48292] Revert SPARK-39195 (OutputCommitCoordinator) to fix duplication issues
[SPARK-48518] Make LZF compression run in parallel
[SPARK-46132] Support key password for JKS keys for RPC SSL
[SPARK-46456] Add spark.ui.jettyStopTimeout to set Jetty server stop timeout
[SPARK-46256] Parallel Compression Support for ZSTD
[SPARK-45544] Integrate SSL support into TransportContext
[SPARK-45351] Change spark.shuffle.service.db.backend default value to ROCKSDB
[SPARK-44741] Support regex-based MetricFilter in StatsdSink
[SPARK-43987] Separate finalizeShuffleMerge Processing to Dedicated Thread Pools
[SPARK-45439] Reduce memory usage of LiveStageMetrics.accumIdsToMetricType

Spark SQL

Features

[SPARK-50541] Describe Table As JSON
[SPARK-48031] Support view schema evolution
[SPARK-50883] Support altering multiple columns in the same command
[SPARK-47627] Add SQL MERGE syntax to enable schema evolution
[SPARK-47430] Support GROUP BY for MapType
[SPARK-49093] GROUP BY with MapType nested inside complex type
[SPARK-49098] Add write options for INSERT
[SPARK-49451] Allow duplicate keys in parse_json
[SPARK-46536] Support GROUP BY calendar_interval_type
[SPARK-46908] Support star clause in WHERE clause
[SPARK-36680] Support dynamic table options via WITH OPTIONS syntax
[SPARK-35553] Improve correlated subqueries
[SPARK-47492] Widen whitespace rules in lexer to allow Unicode
[SPARK-46246] EXECUTE IMMEDIATESQL support
[SPARK-46207] Support MergeInto in DataFrameWriterV2

Functions

[SPARK-52016] New built-in functions
[SPARK-44001] Add option to allow unwrapping protobuf well-known wrapper types
[SPARK-43427] spark protobuf: allow upcasting unsigned integer types
[SPARK-44983] Convert binary to string by to_char for the formats: hex, base64, utf-8
[SPARK-44868] Convert datetime to string by to_char/to_varchar
[SPARK-45796] Support MODE() WITHIN GROUP (ORDER BY col)
[SPARK-48658] Encode/Decode functions report coding errors instead of mojibake
[SPARK-45034] Support deterministic mode function
[SPARK-44778] Add the alias TIMEDIFF for TIMESTAMPDIFF
[SPARK-47497] Make to_csv support arrays/maps/binary as pretty strings
[SPARK-44840] Make array_insert() 1-based for negative indexes

Query optimization

[SPARK-46946] Supporting broadcast of multiple filtering keys in DynamicPruning
[SPARK-48445] Don’t inline UDFs with expansive children
[SPARK-41413] Avoid shuffle in Storage-Partitioned Join when partition keys mismatch, but expressions are compatible
[SPARK-46941] Prevent insertion of window group limit node with SizeBasedWindowFunction
[SPARK-46707] Add throwable field to expressions to improve predicate pushdown
[SPARK-47511] Canonicalize WITH expressions by reassigning IDs
[SPARK-46502] Support timestamp types in UnwrapCastInBinaryComparison
[SPARK-46069] Support unwrap timestamp type to date type
[SPARK-46219] Unwrap cast in join predicates
[SPARK-45606] Release restrictions on multi-layer runtime filter
[SPARK-45909] Remove NumericType cast if it can safely up-cast in IsNotNull

Query execution

[SPARK-45592][SPARK-45282] Correctness issue in AQE with InMemoryTableScanExec
[SPARK-50258] Fix output column order changed issue after AQE
[SPARK-46693] Inject LocalLimitExec when matching OffsetAndLimit or LimitAndOffset
[SPARK-48873] Use UnsafeRow in JSON parser
[SPARK-41471] Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning
[SPARK-45452] Improve InMemoryFileIndex to use FileSystem.listFiles API
[SPARK-48649] Add ignoreInvalidPartitionPaths configs for skipping invalid partition paths
[SPARK-45882] BroadcastHashJoinExec propagate partitioning should respect CoalescedHashPartitioning

Spark Connectors

DS v2 framework support changes

[SPARK-45784] Introduce clustering mechanism to Spark
[SPARK-50820] DSv2: Conditional nullification of metadata columns in DML
[SPARK-51938] Improve Storage Partition Join
[SPARK-50700] spark.sql.catalog.spark_catalog supports builtin magic value
[SPARK-48781] Add Catalog APIs for loading stored procedures
[SPARK-49246] TableCatalog#loadTable should indicate if it's for writing
[SPARK-45965] Move DSv2 partitioning expressions into functions.partitioning
[SPARK-46272] Support CTAS using DSv2 sources
[SPARK-46043] Support create table using DSv2 sources
[SPARK-48668] Support ALTER NAMESPACE ... UNSET PROPERTIES in v2
[SPARK-46442] DS V2 supports push down PERCENTILE_CONT and PERCENTILE_DISC
[SPARK-49078] Support show columns syntax in v2 table

Hive Catalog support changes

[SPARK-45328] Remove Hive support prior to 2.0.0
[SPARK-47101] Allow comma in top-level column names and relax HiveExternalCatalog schema check
[SPARK-45265] Support Hive 4.0 metastore

XML support changes

[SPARK-44265] https://issues.apache.org/jira/browse/SPARK-44265

CSV support changes

[SPARK-46862] Disable CSV column pruning in multi-line mode
[SPARK-46890] Fix CSV parsing bug with default values and column pruning
[SPARK-50616] Add File Extension Option to CSV DataSource Writer
[SPARK-49125] Allow duplicated column names in CSV writing
[SPARK-49016] Restore behavior for queries from raw CSV files
[SPARK-48807] Binary support for CSV datasource
[SPARK-48602] Make csv generator support different output style via spark.sql.binaryOutputStyle

ORC support changes

[SPARK-46648] Use zstd as the default ORC compression
[SPARK-47456] Support ORC Brotli codec
[SPARK-41858] Fix ORC reader perf regression due to DEFAULT value feature

Avro support changes

[SPARK-47739] Register logical Avro type
[SPARK-49082] Widening type promotions in AvroDeserializer
[SPARK-46633] Fix Avro reader to handle zero-length blocks
[SPARK-50350] Avro: add new function schema_of_avro (Scala side)
[SPARK-46930] Add support for custom prefix for Union type fields in Avro
[SPARK-46746] Attach codec extension to Avro datasource files
[SPARK-46759] Support compression level for xz and zstandard in Avro
[SPARK-46766] Add ZSTD Buffer Pool support for Avro datasource
[SPARK-43380] Fix Avro data type conversion issues without causing performance regression
[SPARK-48545] Create to_avro and from_avro SQL functions
[SPARK-46990] Fix loading empty Avro files (infinite loop)

JDBC changes

[SPARK-47361] Improve JDBC data sources
[SPARK-44977] Upgrade Derby to 10.16.1.1
[SPARK-47044] Add executed query for JDBC external datasources to explain output
[SPARK-45139] Add DatabricksDialect to handle SQL type conversion

Other notable changes

[SPARK-45905] Least common type between decimal types should retain integral digits first
[SPARK-45786] Fix inaccurate Decimal multiplication and division results
[SPARK-50705] Make QueryPlan lock‑free
[SPARK-46743] Fix corner-case with COUNT + constant folding subquery
[SPARK-47509] Block subquery expressions in lambda/higher-order functions for correctness
[SPARK-48498] Always do char padding in predicates
[SPARK-45915] Treat decimal(x, 0) the same as IntegralType in PromoteStrings
[SPARK-46220] Restrict charsets in decode()
[SPARK-45816] Return NULL when overflowing during casting from timestamp to integers
[SPARK-45586] Reduce compiler latency for plans with large expression trees
[SPARK-45507] Correctness fix for nested correlated scalar subqueries with COUNT aggregates
[SPARK-44550] Enable correctness fixes for null IN (empty list) under ANSI
[SPARK-47911] Introduces a universal BinaryFormatter to make binary output consistent

PySpark

Below are the changes and improvements made to the PySpark libraries shipping in Databricks Runtime 17.0 (Beta).

Highlights

[SPARK-49530] Introducing PySpark Plotting API
[SPARK-47540] SPIP: Pure Python Package (Spark Connect)
[SPARK-50132] Add DataFrame API for Lateral Joins
[SPARK-45981] Improve Python language test coverage
[SPARK-46858] Upgrade Pandas to 2
[SPARK-46910] Eliminate JDK Requirement in PySpark Installation
[SPARK-47274] Provide more useful context for DataFrame API errors
[SPARK-44076] SPIP: Python Data Source API
[SPARK-43797] Python User-defined Table Functions
[SPARK-46685] PySpark UDF Unified Profiling

DataFrame APIs features

[SPARK-51079] Support large variable types in pandas UDF, createDataFrame and toPandas with Arrow
[SPARK-50718] Support addArtifact(s) for PySpark
[SPARK-50778] Add metadataColumn to PySpark DataFrame
[SPARK-50719] Support interruptOperation for PySpark
[SPARK-50790] Implement parse_json in PySpark
[SPARK-49306] Create SQL function aliases for zeroifnull and nullifzero
[SPARK-50132] Add DataFrame API for Lateral Joins
[SPARK-43295] Support string type columns for DataFrameGroupBy.sum
[SPARK-45575] Support time travel options for df.read API
[SPARK-45755] Improve Dataset.isEmpty() by applying global limit 1
- Improves performance of isEmpty() by pushing down a global limit of 1.
[SPARK-48761] Introduce clusterBy DataFrameWriter API for Scala
[SPARK-45929] Support groupingSets operation in DataFrame API
- Extends groupingSets(...) to DataFrame/DS-level APIs.
[SPARK-40178] Support coalesce hints with ease for PySpark and R

Pandas API on Spark features

[SPARK-46931] Implement {Frame, Series}.to_hdf
[SPARK-46936] Implement Frame.to_feather
[SPARK-46955] Implement Frame.to_stata
[SPARK-46976] Implement DataFrameGroupBy.corr
[SPARK-49344] Support json_normalize for Pandas API on Spark
[SPARK-42617] Support isocalendar from the pandas 2
[SPARK-45552] Introduce flexible parameters to assertDataFrameEqual
[SPARK-47824] Fix nondeterminism in pyspark.pandas.series.asof
[SPARK-46926] Add convert_dtypes, infer_objects, set_axis in fallback list
[SPARK-48295] Turn on compute.ops_on_diff_frames by default
[SPARK-48336] Implement ps.sql in Spark Connect
[SPARK-45267] Change the default value for numeric_only
[SPARK-44841] Support value_counts for pandas 2.0.0 and above
[SPARK-44289][SPARK-43874][SPARK-43869][SPARK-43607] Support indexer_between_time for pandas 2.0.0
[SPARK-44842][SPARK-43812] Support stat functions for pandas 2
[SPARK-43563][SPARK-43459][SPARK-43451][SPARK-43506] Remove squeeze from read_csv
[SPARK-42619] Add show_counts parameter for DataFrame.info
[SPARK-43568][SPARK-43633] Support Categorical APIs for pandas 2
[SPARK-42620] Add inclusive parameter for (DataFrame|Series).between_time
[SPARK-42621] Add inclusive parameter for pd.date_range
[SPARK-43245][SPARK-43705] Type match for DatetimeIndex/TimedeltaIndex with pandas 2
[SPARK-43872] Support (DataFrame|Series).plot with pandas 2
[SPARK-43476][SPARK-43477][SPARK-43478] Support StringMethods for pandas 2
[SPARK-45553] Deprecate assertPandasOnSparkEqual
[SPARK-45718] Remove remaining deprecated Pandas features from Spark 3.4.0
[SPARK-45550] Remove deprecated APIs from Pandas API on Spark
[SPARK-45634] Remove DataFrame.get_dtype_counts from Pandas API on Spark
[SPARK-45165] Remove inplace parameter from CategoricalIndex APIs
[SPARK-45177] Remove col_space parameter from to_latex
[SPARK-45164] Remove deprecated Index APIs
[SPARK-45180] Remove boolean inputs for inclusive parameter from Series.between
[SPARK-43709] Remove closed parameter from ps.date_range & enable test
[SPARK-43453] Ignore the names of MultiIndex when axis=1 for concat
[SPARK-43433] Match GroupBy.nth behavior to the latest Pandas

Other notable PySpark changes

[SPARK-50357] Support Interrupt(Tag|All) APIs for PySpark
[SPARK-50392] DataFrame conversion to table argument in Spark Classic
[SPARK-50752] Introduce configs for tuning Python UDF without Arrow
[SPARK-47366] Add VariantVal for PySpark
[SPARK-47683] Decouple PySpark core API to pyspark.core package
[SPARK-47565] Improve PySpark worker pool crash resilience
[SPARK-47933] Parent Column class for Spark Connect and Spark Classic
[SPARK-50499] Expose metrics from BasePythonRunner
[SPARK-50220] Support listagg in PySpark
[SPARK-46910] Eliminate JDK Requirement in PySpark Installation
[SPARK-46522] Block Python data source registration with name conflicts
[SPARK-48996] Allow bare Python literals in Column.and / or
[SPARK-48762] Introduce clusterBy DataFrameWriter API for Python
[SPARK-49009] Make Column APIs accept Python Enums
[SPARK-45891] Add interval types in Variant Spec
[SPARK-48710] Use NumPy 2.0-compatible types
[SPARK-48714] Implement DataFrame.mergeInto in PySpark
[SPARK-48798] Introduce spark.profile.render for SparkSession-based profiling
[SPARK-47346] Make daemon mode configurable for Python planner workers
[SPARK-47366] Add parse_json alias in PySpark/dataframe
[SPARK-48247] Use all dict pairs in MapType schema inference
[SPARK-48340] Support TimestampNTZ schema inference with prefer_timestamp_ntz
[SPARK-48220] Allow passing PyArrow Table to createDataFrame()
[SPARK-48482] dropDuplicates, dropDuplicatesWithinWatermark accept var-args
[SPARK-48372][SPARK-45716] Implement StructType.treeString
[SPARK-50311] (add|remove|get|clear)Tag(s) APIs
[SPARK-50238] Add Variant Support in PySpark UDFs/UDTFs/UDAFs
[SPARK-50446] Concurrent level in Arrow-optimized Python UDF
[SPARK-50310] Add a flag to disable DataFrameQueryContext
[SPARK-50471] Support Arrow-based Python Data Source Writer
[SPARK-49899] Support deleteIfExists for TransformWithStateInPandas
[SPARK-45597] Support creating table using a Python data source in SQL (DSv2 exec)
[SPARK-46424] Support Python metrics in Python Data Source
[SPARK-45525] Support for Python data source write using DSv2
[SPARK-41666] Support parameterized SQL by sql()
[SPARK-45768] Make faulthandler a runtime configuration for Python execution in SQL
[SPARK-45555] Includes a debuggable object for failed assertion
[SPARK-45600] Make Python data source registration session level
[SPARK-46048] Support DataFrame.groupingSets in PySpark
[SPARK-46103] Enhancing PySpark documentation
[SPARK-40559] Add applyInArrow to groupBy and cogroup
[SPARK-45420] Add DataType.fromDDL into PySpark
[SPARK-45554] Introduce flexible parameter to assertSchemaEqual
[SPARK-44918] Support named arguments in scalar Python/Pandas UDFs
[SPARK-45017] Add CalendarIntervalType to PySpark
[SPARK-44952] Support named arguments in aggregate Pandas UDFs
[SPARK-44665] Add support for pandas DataFrame assertDataFrameEqual
[SPARK-44705] Make PythonRunner single-threaded
[SPARK-45673] Enhancing clarity and usability of PySpark error messages

Spark Streaming

Below are the changes and improvements made to Spark Streaming in Databricks Runtime 17.0 (Beta).

Highlights

[SPARK-46815] Structured Streaming - Arbitrary State API v2
- For more details, see Introducing transformWithState for Apache Spark.
[SPARK-45511] SPIP: State Data Source - Reader
- For more details, see Announcing the State Reader API and Announcing Simplified State Tracking with Apache Spark.
[SPARK-46962] Implement python worker to run python streaming data source

Other notable streaming changes

[SPARK-44865] Make StreamingRelationV2 support metadata column
[SPARK-45080] Explicitly call out support for columnar in DSv2 streaming data sources
[SPARK-45178] Fallback to execute a single batch for Trigger.AvailableNow with unsupported sources
[SPARK-45415] Allow selective disabling of "fallocate" in RocksDB statestore
[SPARK-45503] Add Conf to Set RocksDB Compression
[SPARK-45511] State Data Source - Reader
[SPARK-45558] Introduce a metadata file for streaming stateful operator
[SPARK-45794] Introduce state metadata source to query the streaming state metadata information
[SPARK-45815] Provide an interface for other Streaming sources to add _metadata columns
[SPARK-45845] Add number of evicted state rows to streaming UI
[SPARK-46641] Add maxBytesPerTrigger threshold
[SPARK-46816] Add base support for new arbitrary state management operator (multiple state variables/column families)
[SPARK-46865] Add Batch Support for TransformWithState Operator
[SPARK-46906] Add a check for stateful operator change for streaming
[SPARK-46961] Use ProcessorContext to store and retrieve handle
[SPARK-46962] Add interface for Python streaming data source & worker
[SPARK-47107] Partition reader for Python streaming data sources
[SPARK-47273] Python data stream writer interface
[SPARK-47553] Add Java support for transformWithState operator APIs
[SPARK-47653] Add support for negative numeric types and range scan key encoder
[SPARK-47733] Add custom metrics for transformWithState operator part of query progress
[SPARK-47960] Allow chaining other stateful operators after transformWithState
[SPARK-48447] Check StateStoreProvider class before constructor
[SPARK-48569] Handle edge cases in query.name for streaming queries
[SPARK-48589] Add snapshotStartBatchId / snapshotPartitionId for state data source (see SQL)
[SPARK-48589] Add snapshotStartBatchId / snapshotPartitionId options to state data source
[SPARK-48726] Create StateSchemaV3 file for TransformWithStateExec
[SPARK-48742] Virtual Column Family for RocksDB (arbitrary stateful API v2)
[SPARK-48755] transformWithState pyspark base implementation and ValueState support
[SPARK-48772] State Data Source Change Feed Reader Mode
[SPARK-48836] Integrate SQL schema with state schema/metadata for TWS operator
[SPARK-48849] Create OperatorStateMetadataV2 for TransformWithStateExec operator
[SPARK-48901][SPARK-48916] Introduce clusterBy DataStreamWriter API in Scala/PySpark
[SPARK-48931] Reduce Cloud Store List API cost for state-store maintenance
[SPARK-49021] Add support for reading transformWithState value state variables with state data source reader
[SPARK-49048] Add support for reading operator metadata at given batch id
[SPARK-49191] Read transformWithState map state with state data source
[SPARK-49259] Size-based partition creation during Kafka read
[SPARK-49411] Communicate State Store Checkpoint ID
[SPARK-49463] ListState support in TransformWithStateInPandas
[SPARK-49467] Add state data source reader for list state
[SPARK-49513] Add timer support in transformWithStateInPandas
[SPARK-49630] Add flatten option for collection types in state data source reader
[SPARK-49656] Support state variables with value state collection types
[SPARK-49676] Chaining of operators in transformWithStateInPandas
[SPARK-49699] Disable PruneFilters for streaming workloads
[SPARK-49744] TTL support for ListState in TransformWithStateInPandas
[SPARK-49745] Read registered timers in transformWithState
[SPARK-49802] Add support for read change feed for map/list types
[SPARK-49846] Add numUpdatedStateRows/numRemovedStateRows metrics
[SPARK-49883] State Store Checkpoint Structure V2 Integration with RocksDB and RocksDBFileManager
[SPARK-50017] Support Avro encoding for TransformWithState operator
[SPARK-50035] Explicit handleExpiredTimer function in the stateful processor
[SPARK-50128] Add handle APIs using implicit encoders
[SPARK-50152] Support handleInitialState with state data source reader
[SPARK-50194] Integration of New Timer API and Initial State API
[SPARK-50378] Add custom metric for time spent populating initial state
[SPARK-50428] Support TransformWithStateInPandas in batch queries
[SPARK-50573] Adding State Schema ID to State Rows for schema evolution
[SPARK-50714] Enable schema evolution for TransformWithState with Avro encoding

Spark ML

[SPARK-48463] Make various ML transformers support nested input columns
[SPARK-48463] Make StringIndexer support nested input columns
[SPARK-45757] Avoid re-computation of NNZ in Binarizer
[SPARK-45397] Add array assembler feature transformer
[SPARK-45547] Validate Vectors with built-in function

Spark UX

[SPARK-47240] SPIP: Structured Logging Framework for Apache Spark
[SPARK-44893] ThreadInfo improvements for monitoring APIs
[SPARK-45595] Expose SQLSTATE in error message
[SPARK-45022] Provide context for dataset API errors
[SPARK-45771] Enable spark.eventLog.rolling.enabled by default

Other notable Spark UX changes

[SPARK-41685] Support Protobuf serializer for the KVStore in History server
[SPARK-44770] Add a displayOrder variable to WebUITab to specify the order in which tabs appear
[SPARK-44801] Capture analyzing failed queries in Listener and UI
[SPARK-44838] raise_error improvement
[SPARK-44863] Add a button to download thread dump as a txt in Spark UI
[SPARK-44895] Add 'daemon', 'priority' for ThreadStackTrace
[SPARK-45022] Provide context for dataset API errors
[SPARK-45151] Task Level Thread Dump Support
[SPARK-45207] Implement Error Enrichment for Scala Client
[SPARK-45209] FlameGraph Support For Executor Thread Dump Page
[SPARK-45240] Implement Error Enrichment for Python Client
[SPARK-45248] Set the timeout for spark UI server
[SPARK-45274] Implementation of a new DAG drawing approach for job/stage/plan graphics
[SPARK-45312] Support toggle display/hide plan svg on execution page
[SPARK-45439] Reduce memory usage of LiveStageMetrics.accumIdsToMetricType
[SPARK-45462] Show Duration in ApplicationPage
[SPARK-45480] Selectable Spark Plan Node on UI
[SPARK-45491] Add missing SQLSTATES
[SPARK-45500] Show the number of abnormally completed drivers in MasterPage
[SPARK-45516] Include QueryContext in SparkThrowable proto message
[SPARK-45581] Make SQLSTATE mandatory
[SPARK-45595] Expose SQLSTATE in error message
[SPARK-45609] Include SqlState in SparkThrowable proto message
[SPARK-45641] Display the application start time on AllJobsPage
[SPARK-45771] Enable spark.eventLog.rolling.enabled by default
[SPARK-45774] Support spark.master.ui.historyServerUrl in ApplicationPage
[SPARK-45955] Collapse Support for Flamegraph and thread dump details
[SPARK-46003] Create a ui-test module with Jest to test UI JavaScript code
[SPARK-46094] Support Executor JVM Profiling
[SPARK-46399] Add exit status to the Application End event for the use of Spark Listener
[SPARK-46886] Enable spark.ui.prometheus.enabled by default
[SPARK-46893] Remove inline scripts from UI descriptions
[SPARK-46903] Support Spark History Server Log UI
[SPARK-46922] Do not wrap runtime user-facing errors
[SPARK-46933] Add query execution time metric to connectors using JDBCRDD
[SPARK-47253] Allow LiveEventBus to stop without draining the event queue
[SPARK-47894] Add Environment page to Master UI
[SPARK-48459] Implement DataFrameQueryContext in Spark Connect
[SPARK-48597] Introduce marker for isStreaming in text representation of logical plan
[SPARK-48628] Add task peak on/off heap memory metrics
[SPARK-48716] Add jobGroupId to SparkListenerSQLExecutionStart
[SPARK-49128] Support custom History Server UI title
[SPARK-49206] Add Environment Variables table to Master EnvironmentPage
[SPARK-49241] Add OpenTelemetryPush Sink with opentelemetry profile
[SPARK-49445] Support show tooltip in the progress bar of UI
[SPARK-50049] Support custom driver metrics in writing to v2 table
[SPARK-50315] Support custom metrics for V1Fallback writes
[SPARK-50915] Add getCondition and deprecate getErrorClass in PySparkException
[SPARK-51021] Add log throttler

Spark Connect

Below are the changes and improvements made to Spark Connect in Databricks Runtime 17.0 (Beta).

Highlights

[SPARK-49248] Scala Client Parity with existing Dataset/DataFrame API
[SPARK-48918] Create a unified SQL Scala interface shared by regular SQL and Connect
[SPARK-50812] Support pyspark.ml on Connect
[SPARK-47908] Parent classes for Spark Connect and Spark Classic

Other Spark Connect changes and improvements

[SPARK-41065] Implement DataFrame.freqItems and DataFrame.stat.freqItems
[SPARK-41066] Implement DataFrame.sampleBy and DataFrame.stat.sampleBy
[SPARK-41067] Implement DataFrame.stat.cov
[SPARK-41068] Implement DataFrame.stat.corr
[SPARK-41069] Implement DataFrame.approxQuantile and DataFrame.stat.approxQuantile
[SPARK-41292][SPARK-41640][SPARK-41641] Implement Window functions
[SPARK-41333][SPARK-41737] Implement GroupedData.{min, max, avg, sum}
[SPARK-41364] Implement broadcast function
[SPARK-41383][SPARK-41692][SPARK-41693] Implement rollup, cube, and pivot
[SPARK-41434] Initial LambdaFunction implementation
[SPARK-41440] Implement DataFrame.randomSplit
[SPARK-41464] Implement DataFrame.to
[SPARK-41473] Implement format_number function
[SPARK-41503] Implement Partition Transformation Functions
[SPARK-41529] Implement SparkSession.stop
[SPARK-41534] Setup initial client module for Spark Connect
[SPARK-41629] Support for Protocol Extensions in Relation and Expression
[SPARK-41663] Implement the rest of Lambda functions
[SPARK-41673] Implement Column.astype
[SPARK-41690] Agnostic Encoders
[SPARK-41707] Implement Catalog API in Spark Connect
[SPARK-41710] Implement Column.between
[SPARK-41722] Implement 3 missing time window functions
[SPARK-41723] Implement sequence function
[SPARK-41724] Implement call_udf function
[SPARK-41728] Implement unwrap_udt function
[SPARK-41731] Implement the column accessor (getItem, getField, getitem, etc.)
[SPARK-41738] Mix ClientId in SparkSession cache
[SPARK-41740] Implement Column.name
[SPARK-41767] Implement Column.{withField, dropFields}
[SPARK-41785] Implement GroupedData.mean
[SPARK-41803] Add missing function log(arg1, arg2)
[SPARK-41810] Infer names from a list of dictionaries in SparkSession.createDataFrame
[SPARK-41811] Implement SQLStringFormatter with WithRelations
[SPARK-42664] Support bloomFilter function for DataFrameStatFunctions
[SPARK-43662] Support merge_asof in Spark Connect
[SPARK-43704] Support MultiIndex for to_series() in Spark Connect
[SPARK-44625] SparkConnectExecutionManager to track all executions
[SPARK-44731] Make TimestampNTZ work with literals in Python Spark Connect
[SPARK-44736] Add Dataset.explode to Spark Connect Scala Client
[SPARK-44740] Support specifying session_id in SPARK_REMOTE connection string
[SPARK-44747] Add missing SparkSession.Builder methods
[SPARK-44750] Apply configuration to SparkSession during creation
[SPARK-44761] Support DataStreamWriter.foreachBatch(VoidFunction2)
[SPARK-44788] Add from_xml and schema_of_xml to pyspark, Spark Connect, and SQL functions
[SPARK-44807] Add Dataset.metadataColumn to Scala Client
[SPARK-44877] Support python protobuf functions for Spark Connect
[SPARK-45000] Implement DataFrame.foreach
[SPARK-45001] Implement DataFrame.foreachPartition
[SPARK-45088] Make getitem work with duplicated columns
[SPARK-45090] DataFrame.{cube, rollup} support column ordinals
[SPARK-45091] Function floor/round/bround now accept Column type scale
[SPARK-45121] Support Series.empty for Spark Connect
[SPARK-45136] Enhance ClosureCleaner with Ammonite support
[SPARK-45137] Support map/array parameters in parameterized sql()
[SPARK-45143] Make PySpark compatible with PyArrow 13.0.0
[SPARK-45190][SPARK-48897] Make from_xml support StructType schema
[SPARK-45235] Support map and array parameters by sql()
[SPARK-45485] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes
[SPARK-45506] Add ivy URI support to SparkcConnect addArtifact
[SPARK-45509] Fix df column reference behavior for Spark Connect
[SPARK-45619] Apply the observed metrics to Observation object
[SPARK-45680] Release session
[SPARK-45733] Support multiple retry policies
[SPARK-45770] Introduce plan DataFrameDropColumns for Dataframe.drop
[SPARK-45851] Support multiple policies in scala client
[SPARK-46039] Upgrade grpcio\* to 1.59.3 for Python 3.12
[SPARK-46048] Support DataFrame.groupingSets in Python Spark Connect
[SPARK-46085] Dataset.groupingSets in Scala Spark Connect client
[SPARK-46202] Expose new ArtifactManager APIs to support custom target directories
[SPARK-46229] Add applyInArrow to groupBy and cogroup in Spark Connect
[SPARK-46255] Support complex type -> string conversion
[SPARK-46620] Introduce a basic fallback mechanism for frame methods
[SPARK-46812] Make mapInPandas/mapInArrow support ResourceProfile
[SPARK-46919] Upgrade grpcio* and grpc-java to 1.62.x
[SPARK-47014] Implement methods dumpPerfProfile and dumpMemoryProfiles of SparkSession
[SPARK-47069] Introduce spark.profile.show/.dump for SparkSession-based profiling
[SPARK-47081] Support Query Execution Progress
[SPARK-47137] Add getAll to spark.conf for feature parity with Scala
[SPARK-47233] Client & Server logic for client-side streaming query listener
[SPARK-47276] Introduce spark.profile.clear for SparkSession-based profiling
[SPARK-47367] Support Python data sources with Spark Connect
[SPARK-47543] Infer dict as MapType from Pandas DataFrame (via new config)
[SPARK-47545] Dataset.observe for Scala Connect
[SPARK-47694] Make max message size configurable on the client side
[SPARK-47712] Allow connect plugins to create and process Datasets
[SPARK-47812] Support Serialization of SparkSession for ForEachBatch worker
[SPARK-47818] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests
[SPARK-47828] Fix DataFrameWriterV2.overwrite failure due to invalid plan
[SPARK-47845] Support Column type in split function for Scala and Python
[SPARK-47909] Parent DataFrame class for Spark Connect and Spark Classic
[SPARK-48008] Support UDAFs in Spark Connect
[SPARK-48048] Added client side listener support for Scala
[SPARK-48058][SPARK-43727] UserDefinedFunction.returnType parse the DDL string
[SPARK-48112] Expose session in SparkConnectPlanner to plugins
[SPARK-48113] Allow Plugins to integrate with Spark Connect
[SPARK-48258] Checkpoint and localCheckpoint in Spark Connect
[SPARK-48278] Refine the string representation of Cast
[SPARK-48310] Cached properties must return copies
[SPARK-48336] Implement ps.sql in Spark Connect
[SPARK-48370] Checkpoint and localCheckpoint in Scala Spark Connect client
[SPARK-48510] Support UDAF toColumn API in Spark Connect
[SPARK-48555] Support using Columns as parameters for several functions (array_remove, array_position, etc.)
[SPARK-48569] Handle edge cases in query.name for streaming queries
[SPARK-48638] Add ExecutionInfo support for DataFrame
[SPARK-48639] Add Origin to RelationCommon
[SPARK-48648] Make SparkConnectClient.tags properly thread-local
[SPARK-48794] DataFrame.mergeInto support for Spark Connect (Scala & Python)
[SPARK-48831] Make default column name of cast compatible with Spark Classic
[SPARK-48960] Makes spark‑shell work with Spark Connect (–remote support)
[SPARK-49025] Make Column implementation agnostic
[SPARK-49027] Share Column API between Classic and Connect
[SPARK-49028] Create a shared SparkSession
[SPARK-49029] Create shared Dataset interface
[SPARK-49087] Distinguish UnresolvedFunction calling internal functions
[SPARK-49185] Reimplement kde plot with Spark SQL
[SPARK-49201] Reimplement hist plot with Spark SQL
[SPARK-49249][SPARK-49122] Add addArtifact API to the Spark SQL Core
[SPARK-49273] Origin support for Spark Connect Scala client
[SPARK-49282] Create a shared SparkSessionBuilder interface
[SPARK-49284] Create a shared Catalog interface
[SPARK-49413] Create a shared RuntimeConfig interface
[SPARK-49416] Add shared DataStreamReader interface
[SPARK-49417] Add shared StreamingQueryManager interface
[SPARK-49419] Create shared DataFrameStatFunctions
[SPARK-49429] Add shared DataStreamWriter interface
[SPARK-49526] Support Windows-style paths in ArtifactManager
[SPARK-49530] Support kde/density plots
[SPARK-49531] Support line plot with plotly backend
[SPARK-49595] Fix DataFrame.unpivot and DataFrame.melt in Spark Connect Scala Client
[SPARK-49626] Support horizontal/vertical bar plots
[SPARK-49907] Support spark.ml on Connect
[SPARK-49948] Add “precision” parameter to pandas on Spark box plot
[SPARK-50050] Make lit accept str/bool numpy ndarray
[SPARK-50054] Support histogram plots
[SPARK-50063] Add support for Variant in the Spark Connect Scala client
[SPARK-50075] DataFrame APIs for table-valued functions
[SPARK-50134][SPARK-50130] Support DataFrame API for SCALAR and EXISTS subqueries in Spark Connect
[SPARK-50134][SPARK-50132] Support DataFrame API for Lateral Join in Spark Connect
[SPARK-50227] Upgrade buf plugins to v28.3
[SPARK-50298] Implement verifySchema parameter of createDataFrame
[SPARK-50306] Support Python 3.13 in Spark Connect
[SPARK-50373] Prohibit Variant from set operations
[SPARK-50544] Implement StructType.toDDL
[SPARK-50710] Add support for optional client reconnection to sessions after release
[SPARK-50828] Deprecate pyspark.ml.connect
[SPARK-46465] Add Column.isNaN in PySpark
- Adds the Column.isNaN function to PySpark Connect, matching Scala API parity.
[SPARK-41440] Implement DataFrame.randomSplit
- Implements DataFrame.randomSplit for Spark Connect in Python.
[SPARK-41434] Initial LambdaFunction implementation
- Adds basic support for LambdaFunction and an initial exists function in Spark Connect.
[SPARK-41464] Implement DataFrame.to
- Implements DataFrame.to for Spark Connect in Python.
[SPARK-41364] Implement broadcast function
- Implements the broadcast function in Spark Connect Python client.
[SPARK-41663] Implement the rest of Lambda functions
- Completes Lambda function support in Spark Connect Python client (such as filter, map, etc.).
[SPARK-41673] Implement Column.astype
- Adds Column.astype to Spark Connect Python for type casting.
[SPARK-41292][SPARK-41640][SPARK-41641] Implement Window functions
- Adds support for window functions (Window.partitionBy, Window.orderBy, etc.) to Spark Connect.
[SPARK-41534] Setup initial client module for Spark Connect
- Sets up the initial Scala/JVM client module for Spark Connect.
[SPARK-41503] Implement Partition Transformation Functions
- Implements partition transformation functions for Spark Connect in Python.
[SPARK-41710] Implement Column.between
- Adds Column.between method to Spark Connect in Python.
[SPARK-41707] Implement Catalog API in Spark Connect
- Implements the catalog API for Spark Connect (such as listTables, listFunctions, etc.).
[SPARK-41690] Agnostic Encoders
- Introduces “agnostic encoders” for mapping external types to Spark data types.
[SPARK-41722] Implement 3 missing time window functions
- Implements window, window_time, and session_window in Spark Connect Python.
[SPARK-41723] Implement sequence function
- Adds the sequence function for Spark Connect in Python.
[SPARK-41473] Implement format_number function
- Implements format_number function in Spark Connect Python.
[SPARK-41724] Implement call_udf function
- Allows users to call a UDF by name: call_udf("my_udf", col1, col2, ...).
[SPARK-41529] Implement SparkSession.stop
- Implements SparkSession.stop to shut down a Spark Connect session server side.
[SPARK-41728] Implement unwrap_udt function
- Adds the unwrap_udt function to Spark Connect in Python.
[SPARK-41731] Implement the column accessor (getItem, getField, getitem, etc.)
- Allows indexing into arrays and structs in Spark Connect columns.
[SPARK-41740] Implement Column.name
- Adds .name method for columns in Spark Connect Python.
[SPARK-41738] Mix ClientId in SparkSession cache
- Fixes concurrency by mixing client ID into the SparkSession cache on the server.
[SPARK-41067] Implement DataFrame.stat.cov
- Implements covariance calculation (df.stat.cov) for Spark Connect in Python.
[SPARK-41767] Implement Column.{withField, dropFields}
- Adds support for adding/dropping struct fields in Spark Connect columns.
[SPARK-41292] Support Window in pyspark.sql.window namespace
- Integrates Spark Connect’s window functionality into pyspark.sql.window.
[SPARK-41068] Implement DataFrame.stat.corr
- Implements correlation calculation (df.stat.corr) for Spark Connect in Python.
[SPARK-41629] Support for Protocol Extensions in Relation and Expression
- Adds plugin-based extension mechanism for custom Relation/Expression in Spark Connect.
[SPARK-41785] Implement GroupedData.mean
- Adds the mean function to grouped data in Spark Connect.
[SPARK-41069] Implement DataFrame.approxQuantile and DataFrame.stat.approxQuantile
- Adds approxQuantile for Spark Connect DataFrame/stat in Python.
[SPARK-41065] Implement DataFrame.freqItems and DataFrame.stat.freqItems
- Adds freqItems to Spark Connect DataFrame in Python.
[SPARK-41066] Implement DataFrame.sampleBy and DataFrame.stat.sampleBy
- Adds sampleBy to Spark Connect DataFrame in Python.
[SPARK-41810] Infer names from a list of dictionaries in SparkSession.createDataFrame
- Improves column name inference when creating DataFrames from lists of dictionaries in Spark Connect.
[SPARK-41803] Add missing function log(arg1, arg2)
- Implements two-argument log(base, expr) in Spark Connect Python.
[SPARK-41383][SPARK-41692][SPARK-41693] Implement rollup, cube, and pivot
- Adds DataFrame.rollup, DataFrame.cube, and pivot to Spark Connect.
[SPARK-41333][SPARK-41737] Implement GroupedData.{min, max, avg, sum}
- Implements the standard aggregate functions on grouped data for Spark Connect.
[SPARK-45680] Release session
- Introduces ReleaseSession RPC to cancel all running jobs and remove the session server side.
[SPARK-45851] Support multiple policies in scala client
- Adds multiple retry policies to the Scala Spark Connect client.
[SPARK-45990][SPARK-45987] Upgrade protobuf to 4.25.1 for Python 3.11 support
- Updates protobuf library to fix issues under Python 3.11.
[SPARK-46202] Expose new ArtifactManager APIs to support custom target directories
- Allows adding artifacts with a custom directory structure to remote Spark Connect sessions.
[SPARK-46284] Add session_user function to Python
- Exposes the session_user function in PySpark for Connect, matching Scala parity.
[SPARK-46039] Upgrade grpcio\* to 1.59.3 for Python 3.12
- Updates gRPC libraries to support Python 3.12 and new grpc-inprocess.
[SPARK-46048] Support DataFrame.groupingSets in Python Spark Connect
- Allows calling df.groupingSets(...) in Python Spark Connect for multi-dimensional grouping.
[SPARK-46085] Dataset.groupingSets in Scala Spark Connect client
- Adds groupingSets(...) to Spark Connect in Scala.
[SPARK-46229] Add applyInArrow to groupBy and cogroup in Spark Connect
- Implements applyInArrow in Spark Connect for grouped/cogrouped DataFrame operations.
[SPARK-46255] Support complex type -> string conversion
- Allows string conversion of complex (list/struct) types in Spark Connect Python.
[SPARK-45770] Introduce plan DataFrameDropColumns for Dataframe.drop
[SPARK-45733] Support multiple retry policies
[SPARK-45485] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes
[SPARK-44753] XML: pyspark SQL XML reader/writer
[SPARK-45619] Apply the observed metrics to Observation object
[SPARK-45088] Make getitem work with duplicated columns
[SPARK-45091] Function floor/round/bround now accept Column type scale
[SPARK-45143] Make PySpark compatible with PyArrow 13.0.0
[SPARK-44788] Add from_xml and schema_of_xml to pyspark, Spark Connect, and SQL functions
[SPARK-45137] Support map/array parameters in parameterized sql()
[SPARK-45235] Support map and array parameters by sql()
[SPARK-43662] Support merge_asof in Spark Connect
[SPARK-45121] Support Series.empty for Spark Connect
[SPARK-45090] DataFrame.{cube, rollup} support column ordinals
[SPARK-45136] Enhance ClosureCleaner with Ammonite support
[SPARK-45506] Add ivy URI support to SparkcConnect addArtifact
[SPARK-43704] Support MultiIndex for to_series() in Spark Connect
[SPARK-44807] Add Dataset.metadataColumn to Scala Client
[SPARK-44877] Support python protobuf functions for Spark Connect
[SPARK-44750] Apply configuration to SparkSession during creation
[SPARK-45000] Implement DataFrame.foreach
[SPARK-45001] Implement DataFrame.foreachPartition
[SPARK-44740] Support specifying session_id in SPARK_REMOTE connection string
[SPARK-44747] Add missing SparkSession.Builder methods
[SPARK-44731] Make TimestampNTZ work with literals in Python Spark Connect
[SPARK-44761] Support DataStreamWriter.foreachBatch(VoidFunction2)
[SPARK-44625] SparkConnectExecutionManager to track all executions
[SPARK-44736] Add Dataset.explode to Spark Connect Scala Client
[SPARK-42664] Support bloomFilter function for DataFrameStatFunctions
[SPARK-48831] Align default cast column name with Spark Classic (Connect)
[SPARK-48272] timestamp_diff function added (Connect duplicate above)
[SPARK-48369] timestamp_add function added (Connect duplicate above)
[SPARK-48336] ps.sql in Spark Connect (duplicate)
[SPARK-48370] Checkpoint in Scala Connect client (duplicate above)
[SPARK-47545] Dataset.observe for Scala Connect (duplicate)
[SPARK-45509] Fix df column reference behavior for Spark Connect Aligns column resolution in Spark Connect with classic Spark and provides better error messages.

System environment

Operating System: Ubuntu 24.04.2 LTS
Java: Zulu17.54+21-CA
Scala: 2.13.16
Python: 3.12.3
R: 4.4.2
Delta Lake: 3.3.1

Installed Python libraries

Library	Version	Library	Version	Library	Version
annotated-types	0.7.0	anyio	4.6.2	argon2-cffi	21.3.0
argon2-cffi-bindings	21.2.0	arrow	1.3.0	asttokens	2.0.5
astunparse	1.6.3	async-lru	2.0.4	attrs	24.3.0
autocommand	2.2.2	azure-common	1.1.28	azure-core	1.34.0
azure-identity	1.20.0	azure-mgmt-core	1.5.0	azure-mgmt-web	8.0.0
azure-storage-blob	12.23.0	azure-storage-file-datalake	12.17.0	babel	2.16.0
backports.tarfile	1.2.0	beautifulsoup4	4.12.3	black	24.10.0
bleach	6.2.0	blinker	1.7.0	boto3	1.36.2
botocore	1.36.3	cachetools	5.5.1	certifi	2025.1.31
cffi	1.17.1	chardet	4.0.0	charset-normalizer	3.3.2
click	8.1.7	cloudpickle	3.0.0	comm	0.2.1
contourpy	1.3.1	cryptography	43.0.3	cycler	0.11.0
Cython	3.0.12	databricks-sdk	0.49.0	dbus-python	1.3.2
debugpy	1.8.11	decorator	5.1.1	defusedxml	0.7.1
Deprecated	1.2.13	distlib	0.3.9	docstring-to-markdown	0.11
executing	0.8.3	facets-overview	1.1.1	fastapi	0.115.12
fastjsonschema	2.21.1	filelock	3.18.0	fonttools	4.55.3
fqdn	1.5.1	fsspec	2023.5.0	gitdb	4.0.11
GitPython	3.1.43	google-api-core	2.20.0	google-auth	2.40.0
google-cloud-core	2.4.3	google-cloud-storage	3.1.0	google-crc32c	1.7.1
google-resumable-media	2.7.2	googleapis-common-protos	1.65.0	grpcio	1.67.0
grpcio-status	1.67.0	h11	0.14.0	httpcore	1.0.2
httplib2	0.20.4	httpx	0.27.0	idna	3.7
importlib-metadata	6.6.0	importlib_resources	6.4.0	inflect	7.3.1
iniconfig	1.1.1	ipyflow-core	0.0.209	ipykernel	6.29.5
ipython	8.30.0	ipython-genutils	0.2.0	ipywidgets	7.8.1
isodate	0.6.1	isoduration	20.11.0	jaraco.context	5.3.0
jaraco.functools	4.0.1	jaraco.text	3.12.1	jedi	0.19.2
Jinja2	3.1.5	jmespath	1.0.1	joblib	1.4.2
json5	0.9.25	jsonpointer	3.0.0	jsonschema	4.23.0
jsonschema-specifications	2023.7.1	jupyter-events	0.10.0	jupyter-lsp	2.2.0
jupyter_client	8.6.3	jupyter_core	5.7.2	jupyter_server	2.14.1
jupyter_server_terminals	0.4.4	jupyterlab	4.3.4	jupyterlab-pygments	0.1.2
jupyterlab-widgets	1.0.0	jupyterlab_server	2.27.3	kiwisolver	1.4.8
launchpadlib	1.11.0	lazr.restfulclient	0.14.6	lazr.uri	1.0.6
markdown-it-py	2.2.0	MarkupSafe	3.0.2	matplotlib	3.10.0
matplotlib-inline	0.1.7	mccabe	0.7.0	mdurl	0.1.0
mistune	2.0.4	mlflow-skinny	2.22.0	mmh3	5.1.0
more-itertools	10.3.0	msal	1.32.3	msal-extensions	1.3.1
mypy-extensions	1.0.0	nbclient	0.8.0	nbconvert	7.16.4
nbformat	5.10.4	nest-asyncio	1.6.0	nodeenv	1.9.1
notebook	7.3.2	notebook_shim	0.2.3	numpy	2.1.3
oauthlib	3.2.2	opentelemetry-api	1.32.1	opentelemetry-sdk	1.32.1
opentelemetry-semantic-conventions	0.53b1	overrides	7.4.0	packaging	24.1
pandas	2.2.3	pandocfilters	1.5.0	parso	0.8.4
pathspec	0.10.3	patsy	1.0.1	pexpect	4.8.0
pillow	11.1.0	pip	24.2	platformdirs	3.10.0
plotly	5.24.1	pluggy	1.5.0	prometheus_client	0.21.0
prompt-toolkit	3.0.43	proto-plus	1.26.1	protobuf	5.29.4
psutil	5.9.0	psycopg2	2.9.3	ptyprocess	0.7.0
pure-eval	0.2.2	pyarrow	19.0.1	pyasn1	0.4.8
pyasn1-modules	0.2.8	pyccolo	0.0.71	pycparser	2.21
pydantic	2.10.6	pydantic_core	2.27.2	pyflakes	3.2.0
Pygments	2.15.1	PyGObject	3.48.2	pyiceberg	0.9.0
PyJWT	2.10.1	pyodbc	5.2.0	pyparsing	3.2.0
pyright	1.1.394	pytest	8.3.5	python-dateutil	2.9.0.post0
python-json-logger	3.2.1	python-lsp-jsonrpc	1.1.2	python-lsp-server	1.12.0
pytoolconfig	1.2.6	pytz	2024.1	PyYAML	6.0.2
pyzmq	26.2.0	referencing	0.30.2	requests	2.32.3
rfc3339-validator	0.1.4	rfc3986-validator	0.1.1	rich	13.9.4
rope	1.12.0	rpds-py	0.22.3	rsa	4.9.1
s3transfer	0.11.3	scikit-learn	1.6.1	scipy	1.15.1
seaborn	0.13.2	Send2Trash	1.8.2	setuptools	74.0.0
six	1.16.0	smmap	5.0.0	sniffio	1.3.0
sortedcontainers	2.4.0	soupsieve	2.5	sqlparse	0.5.3
ssh-import-id	5.11	stack-data	0.2.0	starlette	0.46.2
statsmodels	0.14.4	strictyaml	1.7.3	tenacity	9.0.0
terminado	0.17.1	threadpoolctl	3.5.0	tinycss2	1.4.0
tokenize_rt	6.1.0	tomli	2.0.1	tornado	6.4.2
traitlets	5.14.3	typeguard	4.3.0	types-python-dateutil	2.9.0.20241206
typing_extensions	4.12.2	tzdata	2024.1	ujson	5.10.0
unattended-upgrades	0.1	uri-template	1.3.0	urllib3	2.3.0
uvicorn	0.34.2	virtualenv	20.29.3	wadllib	1.3.6
wcwidth	0.2.5	webcolors	24.11.1	webencodings	0.5.1
websocket-client	1.8.0	whatthepatch	1.0.2	wheel	0.45.1
widgetsnbextension	3.6.6	wrapt	1.17.0	yapf	0.40.2
zipp	3.21.0

Installed R libraries

R libraries are installed from the Posit Package Manager CRAN snapshot on 2025-03-20.

Library	Version	Library	Version	Library	Version
arrow	19.0.1	askpass	1.2.1	assertthat	0.2.1
backports	1.5.0	base	4.4.2	base64enc	0.1-3
bigD	0.3.0	bit	4.6.0	bit64	4.6.0-1
bitops	1.0-9	blob	1.2.4	boot	1.3-30
brew	1.0-10	brio	1.1.5	broom	1.0.7
bslib	0.9.0	cachem	1.1.0	callr	3.7.6
caret	7.0-1	cellranger	1.1.0	chron	2.3-62
class	7.3-22	cli	3.6.4	clipr	0.8.0
clock	0.7.2	cluster	2.1.6	codetools	0.2-20
colorspace	2.1-1	commonmark	1.9.5	compiler	4.4.2
config	0.3.2	conflicted	1.2.0	cpp11	0.5.2
crayon	1.5.3	credentials	2.0.2	curl	6.2.1
data.table	1.17.0	datasets	4.4.2	DBI	1.2.3
dbplyr	2.5.0	desc	1.4.3	devtools	2.4.5
diagram	1.6.5	diffobj	0.3.5	digest	0.6.37
downlit	0.4.4	dplyr	1.1.4	dtplyr	1.3.1
e1071	1.7-16	ellipsis	0.3.2	evaluate	1.0.3
fansi	1.0.6	farver	2.1.2	fastmap	1.2.0
fontawesome	0.5.3	forcats	1.0.0	foreach	1.5.2
foreign	0.8-86	forge	0.2.0	fs	1.6.5
future	1.34.0	future.apply	1.11.3	gargle	1.5.2
generics	0.1.3	gert	2.1.4	ggplot2	3.5.1
gh	1.4.1	git2r	0.35.0	gitcreds	0.1.2
glmnet	4.1-8	globals	0.16.3	glue	1.8.0
googledrive	2.1.1	googlesheets4	1.1.1	gower	1.0.2
graphics	4.4.2	grDevices	4.4.2	grid	4.4.2
gridExtra	2.3	gsubfn	0.7	gt	0.11.1
gtable	0.3.6	hardhat	1.4.1	haven	2.5.4
highr	0.11	hms	1.1.3	htmltools	0.5.8.1
htmlwidgets	1.6.4	httpuv	1.6.15	httr	1.4.7
httr2	1.1.1	ids	1.0.1	ini	0.3.1
ipred	0.9-15	isoband	0.2.7	iterators	1.0.14
jquerylib	0.1.4	jsonlite	1.9.1	juicyjuice	0.1.0
KernSmooth	2.23-22	knitr	1.50	labeling	0.4.3
later	1.4.1	lattice	0.22-5	lava	1.8.1
lifecycle	1.0.4	listenv	0.9.1	lubridate	1.9.4
magrittr	2.0.3	markdown	1.13	MASS	7.3-60.0.1
Matrix	1.6-5	memoise	2.0.1	methods	4.4.2
mgcv	1.9-1	mime	0.13	miniUI	0.1.1.1
mlflow	2.20.4	ModelMetrics	1.2.2.2	modelr	0.1.11
munsell	0.5.1	nlme	3.1-164	nnet	7.3-19
numDeriv	2016.8-1.1	openssl	2.3.2	parallel	4.4.2
parallelly	1.42.0	pillar	1.10.1	pkgbuild	1.4.6
pkgconfig	2.0.3	pkgdown	2.1.1	pkgload	1.4.0
plogr	0.2.0	plyr	1.8.9	praise	1.0.0
prettyunits	1.2.0	pROC	1.18.5	processx	3.8.6
prodlim	2024.06.25	profvis	0.4.0	progress	1.2.3
progressr	0.15.1	promises	1.3.2	proto	1.0.0
proxy	0.4-27	ps	1.9.0	purrr	1.0.4
R6	2.6.1	ragg	1.3.3	randomForest	4.7-1.2
rappdirs	0.3.3	rcmdcheck	1.4.0	RColorBrewer	1.1-3
Rcpp	1.0.14	RcppEigen	0.3.4.0.2	reactable	0.4.4
reactR	0.6.1	readr	2.1.5	readxl	1.4.5
recipes	1.2.0	rematch	2.0.0	rematch2	2.1.2
remotes	2.5.0	reprex	2.1.1	reshape2	1.4.4
rlang	1.1.5	rmarkdown	2.29	RODBC	1.3-26
roxygen2	7.3.2	rpart	4.1.23	rprojroot	2.0.4
Rserve	1.8-15	RSQLite	2.3.9	rstudioapi	0.17.1
rversions	2.1.2	rvest	1.0.4	sass	0.4.9
scales	1.3.0	selectr	0.4-2	sessioninfo	1.2.3
shape	1.4.6.1	shiny	1.10.0	sourcetools	0.1.7-1
sparklyr	1.9.0	SparkR	4.0.0	sparsevctrs	0.3.1
spatial	7.3-17	splines	4.4.2	sqldf	0.4-11
SQUAREM	2021.1	stats	4.4.2	stats4	4.4.2
stringi	1.8.4	stringr	1.5.1	survival	3.5-8
swagger	5.17.14.1	sys	3.4.3	systemfonts	1.2.1
tcltk	4.4.2	testthat	3.2.3	textshaping	1.0.0
tibble	3.2.1	tidyr	1.3.1	tidyselect	1.2.1
tidyverse	2.0.0	timechange	0.3.0	timeDate	4041.110
tinytex	0.56	tools	4.4.2	tzdb	0.5.0
urlchecker	1.0.1	usethis	3.1.0	utf8	1.2.4
utils	4.4.2	uuid	1.2-1	V8	6.0.2
vctrs	0.6.5	viridisLite	0.4.2	vroom	1.6.5
waldo	0.6.1	whisker	0.4.1	withr	3.0.2
xfun	0.51	xml2	1.3.8	xopen	1.0.1
xtable	1.8-4	yaml	2.3.10	zeallot	0.1.0
zip	2.3.2

Installed Java and Scala libraries (Scala 2.13 cluster version)

Group ID	Artifact ID	Version
antlr	antlr	2.7.7
com.amazonaws	amazon-kinesis-client	1.12.0
com.amazonaws	aws-java-sdk-autoscaling	1.12.638
com.amazonaws	aws-java-sdk-cloudformation	1.12.638
com.amazonaws	aws-java-sdk-cloudfront	1.12.638
com.amazonaws	aws-java-sdk-cloudhsm	1.12.638
com.amazonaws	aws-java-sdk-cloudsearch	1.12.638
com.amazonaws	aws-java-sdk-cloudtrail	1.12.638
com.amazonaws	aws-java-sdk-cloudwatch	1.12.638
com.amazonaws	aws-java-sdk-cloudwatchmetrics	1.12.638
com.amazonaws	aws-java-sdk-codedeploy	1.12.638
com.amazonaws	aws-java-sdk-cognitoidentity	1.12.638
com.amazonaws	aws-java-sdk-cognitosync	1.12.638
com.amazonaws	aws-java-sdk-config	1.12.638
com.amazonaws	aws-java-sdk-core	1.12.638
com.amazonaws	aws-java-sdk-datapipeline	1.12.638
com.amazonaws	aws-java-sdk-directconnect	1.12.638
com.amazonaws	aws-java-sdk-directory	1.12.638
com.amazonaws	aws-java-sdk-dynamodb	1.12.638
com.amazonaws	aws-java-sdk-ec2	1.12.638
com.amazonaws	aws-java-sdk-ecs	1.12.638
com.amazonaws	aws-java-sdk-efs	1.12.638
com.amazonaws	aws-java-sdk-elasticache	1.12.638
com.amazonaws	aws-java-sdk-elasticbeanstalk	1.12.638
com.amazonaws	aws-java-sdk-elasticloadbalancing	1.12.638
com.amazonaws	aws-java-sdk-elastictranscoder	1.12.638
com.amazonaws	aws-java-sdk-emr	1.12.638
com.amazonaws	aws-java-sdk-glacier	1.12.638
com.amazonaws	aws-java-sdk-glue	1.12.638
com.amazonaws	aws-java-sdk-iam	1.12.638
com.amazonaws	aws-java-sdk-importexport	1.12.638
com.amazonaws	aws-java-sdk-kinesis	1.12.638
com.amazonaws	aws-java-sdk-kms	1.12.638
com.amazonaws	aws-java-sdk-lambda	1.12.638
com.amazonaws	aws-java-sdk-logs	1.12.638
com.amazonaws	aws-java-sdk-machinelearning	1.12.638
com.amazonaws	aws-java-sdk-opsworks	1.12.638
com.amazonaws	aws-java-sdk-rds	1.12.638
com.amazonaws	aws-java-sdk-redshift	1.12.638
com.amazonaws	aws-java-sdk-route53	1.12.638
com.amazonaws	aws-java-sdk-s3	1.12.638
com.amazonaws	aws-java-sdk-ses	1.12.638
com.amazonaws	aws-java-sdk-simpledb	1.12.638
com.amazonaws	aws-java-sdk-simpleworkflow	1.12.638
com.amazonaws	aws-java-sdk-sns	1.12.638
com.amazonaws	aws-java-sdk-sqs	1.12.638
com.amazonaws	aws-java-sdk-ssm	1.12.638
com.amazonaws	aws-java-sdk-storagegateway	1.12.638
com.amazonaws	aws-java-sdk-sts	1.12.638
com.amazonaws	aws-java-sdk-support	1.12.638
com.amazonaws	aws-java-sdk-swf-libraries	1.11.22
com.amazonaws	aws-java-sdk-workspaces	1.12.638
com.amazonaws	jmespath-java	1.12.638
com.clearspring.analytics	stream	2.9.8
com.databricks	Rserve	1.8-3
com.databricks	databricks-sdk-java	0.27.0
com.databricks	jets3t	0.7.1-0
com.databricks.scalapb	scalapb-runtime_2.12	0.4.15-10
com.esotericsoftware	kryo-shaded	4.0.3
com.esotericsoftware	minlog	1.3.0
com.fasterxml	classmate	1.5.1
com.fasterxml.jackson.core	jackson-annotations	2.18.2
com.fasterxml.jackson.core	jackson-core	2.18.2
com.fasterxml.jackson.core	jackson-databind	2.18.2
com.fasterxml.jackson.dataformat	jackson-dataformat-cbor	2.18.2
com.fasterxml.jackson.dataformat	jackson-dataformat-yaml	2.15.2
com.fasterxml.jackson.datatype	jackson-datatype-joda	2.18.2
com.fasterxml.jackson.datatype	jackson-datatype-jsr310	2.18.2
com.fasterxml.jackson.module	jackson-module-paranamer	2.18.2
com.fasterxml.jackson.module	jackson-module-scala_2.12	2.18.2
com.github.ben-manes.caffeine	caffeine	2.9.3
com.github.blemale	scaffeine_2.12	5.2.1
com.github.fommil	jniloader	1.1
com.github.fommil.netlib	native_ref-java	1.1
com.github.fommil.netlib	native_ref-java	1.1-natives
com.github.fommil.netlib	native_system-java	1.1
com.github.fommil.netlib	native_system-java	1.1-natives
com.github.fommil.netlib	netlib-native_ref-linux-x86_64	1.1-natives
com.github.fommil.netlib	netlib-native_system-linux-x86_64	1.1-natives
com.github.luben	zstd-jni	1.5.6-10
com.github.virtuald	curvesapi	1.08
com.github.wendykierp	JTransforms	3.1
com.google.api.grpc	proto-google-common-protos	2.5.1
com.google.code.findbugs	jsr305	3.0.0
com.google.code.gson	gson	2.11.0
com.google.crypto.tink	tink	1.16.0
com.google.errorprone	error_prone_annotations	2.36.0
com.google.flatbuffers	flatbuffers-java	24.3.25
com.google.guava	failureaccess	1.0.2
com.google.guava	guava	33.4.0-jre
com.google.guava	listenablefuture	9999.0-empty-to-avoid-conflict-with-guava
com.google.j2objc	j2objc-annotations	3.0.0
com.google.protobuf	protobuf-java	3.25.5
com.google.protobuf	protobuf-java-util	3.25.5
com.helger	profiler	1.1.1
com.ibm.icu	icu4j	75.1
com.jcraft	jsch	0.1.55
com.lihaoyi	sourcecode_2.12	0.1.9
com.microsoft.azure	azure-data-lake-store-sdk	2.3.10
com.microsoft.sqlserver	mssql-jdbc	12.8.0.jre11
com.microsoft.sqlserver	mssql-jdbc	12.8.0.jre8
com.ning	compress-lzf	1.1.2
com.sun.mail	javax.mail	1.5.2
com.sun.xml.bind	jaxb-core	2.2.11
com.sun.xml.bind	jaxb-impl	2.2.11
com.tdunning	json	1.8
com.thoughtworks.paranamer	paranamer	2.8
com.trueaccord.lenses	lenses_2.12	0.4.12
com.twitter	chill-java	0.10.0
com.twitter	chill_2.12	0.10.0
com.twitter	util-app_2.12	7.1.0
com.twitter	util-core_2.12	7.1.0
com.twitter	util-function_2.12	7.1.0
com.twitter	util-jvm_2.12	7.1.0
com.twitter	util-lint_2.12	7.1.0
com.twitter	util-registry_2.12	7.1.0
com.twitter	util-stats_2.12	7.1.0
com.typesafe	config	1.4.3
com.typesafe.scala-logging	scala-logging_2.12	3.7.2
com.uber	h3	3.7.3
com.univocity	univocity-parsers	2.9.1
com.zaxxer	HikariCP	4.0.3
com.zaxxer	SparseBitSet	1.3
commons-cli	commons-cli	1.9.0
commons-codec	commons-codec	1.17.2
commons-collections	commons-collections	3.2.2
commons-dbcp	commons-dbcp	1.4
commons-fileupload	commons-fileupload	1.5
commons-httpclient	commons-httpclient	3.1
commons-io	commons-io	2.18.0
commons-lang	commons-lang	2.6
commons-logging	commons-logging	1.1.3
commons-pool	commons-pool	1.5.4
dev.ludovic.netlib	arpack	3.0.3
dev.ludovic.netlib	blas	3.0.3
dev.ludovic.netlib	lapack	3.0.3
info.ganglia.gmetric4j	gmetric4j	1.0.10
io.airlift	aircompressor	2.0.2
io.delta	delta-sharing-client_2.12	1.3.0
io.dropwizard.metrics	metrics-annotation	4.2.30
io.dropwizard.metrics	metrics-core	4.2.30
io.dropwizard.metrics	metrics-graphite	4.2.30
io.dropwizard.metrics	metrics-healthchecks	4.2.30
io.dropwizard.metrics	metrics-jetty9	4.2.30
io.dropwizard.metrics	metrics-jmx	4.2.30
io.dropwizard.metrics	metrics-json	4.2.30
io.dropwizard.metrics	metrics-jvm	4.2.30
io.dropwizard.metrics	metrics-servlets	4.2.30
io.netty	netty-all	4.1.118.Final
io.netty	netty-buffer	4.1.118.Final
io.netty	netty-codec	4.1.118.Final
io.netty	netty-codec-http	4.1.118.Final
io.netty	netty-codec-http2	4.1.118.Final
io.netty	netty-codec-socks	4.1.118.Final
io.netty	netty-common	4.1.118.Final
io.netty	netty-handler	4.1.118.Final
io.netty	netty-handler-proxy	4.1.118.Final
io.netty	netty-resolver	4.1.118.Final
io.netty	netty-tcnative-boringssl-static	2.0.70.Final-db-r0-linux-aarch_64
io.netty	netty-tcnative-boringssl-static	2.0.70.Final-db-r0-linux-x86_64
io.netty	netty-tcnative-boringssl-static	2.0.70.Final-db-r0-osx-aarch_64
io.netty	netty-tcnative-boringssl-static	2.0.70.Final-db-r0-osx-x86_64
io.netty	netty-tcnative-boringssl-static	2.0.70.Final-db-r0-windows-x86_64
io.netty	netty-tcnative-classes	2.0.70.Final
io.netty	netty-transport	4.1.118.Final
io.netty	netty-transport-classes-epoll	4.1.118.Final
io.netty	netty-transport-classes-kqueue	4.1.118.Final
io.netty	netty-transport-native-epoll	4.1.118.Final
io.netty	netty-transport-native-epoll	4.1.118.Final-linux-aarch_64
io.netty	netty-transport-native-epoll	4.1.118.Final-linux-riscv64
io.netty	netty-transport-native-epoll	4.1.118.Final-linux-x86_64
io.netty	netty-transport-native-kqueue	4.1.118.Final-osx-aarch_64
io.netty	netty-transport-native-kqueue	4.1.118.Final-osx-x86_64
io.netty	netty-transport-native-unix-common	4.1.118.Final
io.prometheus	simpleclient	0.16.1-databricks
io.prometheus	simpleclient_common	0.16.1-databricks
io.prometheus	simpleclient_dropwizard	0.16.1-databricks
io.prometheus	simpleclient_pushgateway	0.16.1-databricks
io.prometheus	simpleclient_servlet	0.16.1-databricks
io.prometheus	simpleclient_servlet_common	0.16.1-databricks
io.prometheus	simpleclient_tracer_common	0.16.1-databricks
io.prometheus	simpleclient_tracer_otel	0.16.1-databricks
io.prometheus	simpleclient_tracer_otel_agent	0.16.1-databricks
io.prometheus.jmx	collector	0.18.0
jakarta.annotation	jakarta.annotation-api	1.3.5
jakarta.servlet	jakarta.servlet-api	4.0.3
jakarta.validation	jakarta.validation-api	2.0.2
jakarta.ws.rs	jakarta.ws.rs-api	2.1.6
javax.activation	activation	1.1.1
javax.annotation	javax.annotation-api	1.3.2
javax.el	javax.el-api	2.2.4
javax.jdo	jdo-api	3.0.1
javax.transaction	jta	1.1
javax.transaction	transaction-api	1.1
javax.xml.bind	jaxb-api	2.2.11
javolution	javolution	5.5.1
jline	jline	2.14.6
joda-time	joda-time	2.13.0
net.java.dev.jna	jna	5.8.0
net.razorvine	pickle	1.5
net.sf.jpam	jpam	1.1
net.sf.opencsv	opencsv	2.3
net.sf.supercsv	super-csv	2.2.0
net.snowflake	snowflake-ingest-sdk	0.9.6
net.sourceforge.f2j	arpack_combined_all	0.1
org.acplt.remotetea	remotetea-oncrpc	1.1.2
org.antlr	ST4	4.0.4
org.antlr	antlr-runtime	3.5.2
org.antlr	antlr4-runtime	4.13.1
org.antlr	stringtemplate	3.2.1
org.apache.ant	ant	1.10.11
org.apache.ant	ant-jsch	1.10.11
org.apache.ant	ant-launcher	1.10.11
org.apache.arrow	arrow-format	18.2.0
org.apache.arrow	arrow-memory-core	18.2.0
org.apache.arrow	arrow-memory-netty	18.2.0
org.apache.arrow	arrow-memory-netty-buffer-patch	18.2.0
org.apache.arrow	arrow-vector	18.2.0
org.apache.avro	avro	1.12.0
org.apache.avro	avro-ipc	1.12.0
org.apache.avro	avro-mapred	1.12.0
org.apache.commons	commons-collections4	4.4
org.apache.commons	commons-compress	1.27.1
org.apache.commons	commons-crypto	1.1.0
org.apache.commons	commons-lang3	3.17.0
org.apache.commons	commons-math3	3.6.1
org.apache.commons	commons-text	1.13.0
org.apache.curator	curator-client	5.7.1
org.apache.curator	curator-framework	5.7.1
org.apache.curator	curator-recipes	5.7.1
org.apache.datasketches	datasketches-java	6.1.1
org.apache.datasketches	datasketches-memory	3.0.2
org.apache.derby	derby	10.14.2.0
org.apache.hadoop	hadoop-client-runtime	3.4.1
org.apache.hive	hive-beeline	2.3.10
org.apache.hive	hive-cli	2.3.10
org.apache.hive	hive-jdbc	2.3.10
org.apache.hive	hive-llap-client	2.3.10
org.apache.hive	hive-llap-common	2.3.10
org.apache.hive	hive-serde	2.3.10
org.apache.hive	hive-shims	2.3.10
org.apache.hive	hive-storage-api	2.8.1
org.apache.hive.shims	hive-shims-0.23	2.3.10
org.apache.hive.shims	hive-shims-common	2.3.10
org.apache.hive.shims	hive-shims-scheduler	2.3.10
org.apache.httpcomponents	httpclient	4.5.14
org.apache.httpcomponents	httpcore	4.4.16
org.apache.ivy	ivy	2.5.3
org.apache.logging.log4j	log4j-1.2-api	2.24.3
org.apache.logging.log4j	log4j-api	2.24.3
org.apache.logging.log4j	log4j-core	2.24.3
org.apache.logging.log4j	log4j-layout-template-json	2.24.3
org.apache.logging.log4j	log4j-slf4j2-impl	2.24.3
org.apache.orc	orc-core	2.1.1-shaded-protobuf
org.apache.orc	orc-format	1.1.0-shaded-protobuf
org.apache.orc	orc-mapreduce	2.1.1-shaded-protobuf
org.apache.orc	orc-shims	2.1.1
org.apache.poi	poi	5.4.1
org.apache.poi	poi-ooxml	5.4.1
org.apache.poi	poi-ooxml-full	5.4.1
org.apache.poi	poi-ooxml-lite	5.4.1
org.apache.thrift	libfb303	0.9.3
org.apache.thrift	libthrift	0.16.0
org.apache.ws.xmlschema	xmlschema-core	2.3.1
org.apache.xbean	xbean-asm9-shaded	4.26
org.apache.xmlbeans	xmlbeans	5.3.0
org.apache.yetus	audience-annotations	0.13.0
org.apache.zookeeper	zookeeper	3.9.3
org.apache.zookeeper	zookeeper-jute	3.9.3
org.checkerframework	checker-qual	3.43.0
org.codehaus.janino	commons-compiler	3.0.16
org.codehaus.janino	janino	3.0.16
org.datanucleus	datanucleus-api-jdo	4.2.4
org.datanucleus	datanucleus-core	4.1.17
org.datanucleus	datanucleus-rdbms	4.1.19
org.datanucleus	javax.jdo	3.2.0-m3
org.eclipse.jetty	jetty-client	9.4.53.v20231009
org.eclipse.jetty	jetty-continuation	9.4.53.v20231009
org.eclipse.jetty	jetty-http	9.4.53.v20231009
org.eclipse.jetty	jetty-io	9.4.53.v20231009
org.eclipse.jetty	jetty-jndi	9.4.53.v20231009
org.eclipse.jetty	jetty-plus	9.4.53.v20231009
org.eclipse.jetty	jetty-proxy	9.4.53.v20231009
org.eclipse.jetty	jetty-security	9.4.53.v20231009
org.eclipse.jetty	jetty-server	9.4.53.v20231009
org.eclipse.jetty	jetty-servlet	9.4.53.v20231009
org.eclipse.jetty	jetty-servlets	9.4.53.v20231009
org.eclipse.jetty	jetty-util	9.4.53.v20231009
org.eclipse.jetty	jetty-util-ajax	9.4.53.v20231009
org.eclipse.jetty	jetty-webapp	9.4.53.v20231009
org.eclipse.jetty	jetty-xml	9.4.53.v20231009
org.eclipse.jetty.websocket	websocket-api	9.4.53.v20231009
org.eclipse.jetty.websocket	websocket-client	9.4.53.v20231009
org.eclipse.jetty.websocket	websocket-common	9.4.53.v20231009
org.eclipse.jetty.websocket	websocket-server	9.4.53.v20231009
org.eclipse.jetty.websocket	websocket-servlet	9.4.53.v20231009
org.fusesource.leveldbjni	leveldbjni-all	1.8
org.glassfish.hk2	hk2-api	2.6.1
org.glassfish.hk2	hk2-locator	2.6.1
org.glassfish.hk2	hk2-utils	2.6.1
org.glassfish.hk2	osgi-resource-locator	1.0.3
org.glassfish.hk2.external	aopalliance-repackaged	2.6.1
org.glassfish.hk2.external	jakarta.inject	2.6.1
org.glassfish.jersey.containers	jersey-container-servlet	2.41
org.glassfish.jersey.containers	jersey-container-servlet-core	2.41
org.glassfish.jersey.core	jersey-client	2.41
org.glassfish.jersey.core	jersey-common	2.41
org.glassfish.jersey.core	jersey-server	2.41
org.glassfish.jersey.inject	jersey-hk2	2.41
org.hibernate.validator	hibernate-validator	6.2.5.Final
org.ini4j	ini4j	0.5.4
org.javassist	javassist	3.29.2-GA
org.jboss.logging	jboss-logging	3.4.1.Final
org.jdbi	jdbi	2.63.1
org.jetbrains	annotations	17.0.0
org.joda	joda-convert	1.7
org.jodd	jodd-core	3.5.2
org.json4s	json4s-ast_2.12	4.0.7
org.json4s	json4s-core_2.12	4.0.7
org.json4s	json4s-jackson-core_2.12	4.0.7
org.json4s	json4s-jackson_2.12	4.0.7
org.json4s	json4s-scalap_2.12	4.0.7
org.lz4	lz4-java	1.8.0-databricks-1
org.mlflow	mlflow-spark_2.12	2.9.1
org.objenesis	objenesis	3.3
org.postgresql	postgresql	42.6.1
org.roaringbitmap	RoaringBitmap	1.2.1
org.rocksdb	rocksdbjni	9.8.4
org.rosuda.REngine	REngine	2.1.0
org.scala-lang	scala-compiler_2.12	2.12.15
org.scala-lang	scala-library_2.12	2.12.15
org.scala-lang	scala-reflect_2.12	2.12.15
org.scala-lang.modules	scala-collection-compat_2.12	2.11.0
org.scala-lang.modules	scala-java8-compat_2.12	0.9.1
org.scala-lang.modules	scala-parser-combinators_2.12	2.4.0
org.scala-lang.modules	scala-xml_2.12	2.3.0
org.scala-sbt	test-interface	1.0
org.scalacheck	scalacheck_2.12	1.18.0
org.scalactic	scalactic_2.12	3.2.19
org.scalanlp	breeze-macros_2.12	2.1.0
org.scalanlp	breeze_2.12	2.1.0
org.scalatest	scalatest-compatible	3.2.19
org.scalatest	scalatest-core_2.12	3.2.19
org.scalatest	scalatest-diagrams_2.12	3.2.19
org.scalatest	scalatest-featurespec_2.12	3.2.19
org.scalatest	scalatest-flatspec_2.12	3.2.19
org.scalatest	scalatest-freespec_2.12	3.2.19
org.scalatest	scalatest-funspec_2.12	3.2.19
org.scalatest	scalatest-funsuite_2.12	3.2.19
org.scalatest	scalatest-matchers-core_2.12	3.2.19
org.scalatest	scalatest-mustmatchers_2.12	3.2.19
org.scalatest	scalatest-propspec_2.12	3.2.19
org.scalatest	scalatest-refspec_2.12	3.2.19
org.scalatest	scalatest-shouldmatchers_2.12	3.2.19
org.scalatest	scalatest-wordspec_2.12	3.2.19
org.scalatest	scalatest_2.12	3.2.19
org.slf4j	jcl-over-slf4j	2.0.16
org.slf4j	jul-to-slf4j	2.0.16
org.slf4j	slf4j-api	2.0.16
org.slf4j	slf4j-simple	1.7.25
org.threeten	threeten-extra	1.8.0
org.tukaani	xz	1.10
org.typelevel	algebra_2.12	2.0.1
org.typelevel	cats-kernel_2.12	2.1.1
org.typelevel	spire-macros_2.12	0.17.0
org.typelevel	spire-platform_2.12	0.17.0
org.typelevel	spire-util_2.12	0.17.0
org.typelevel	spire_2.12	0.17.0
org.wildfly.openssl	wildfly-openssl	1.1.3.Final
org.xerial	sqlite-jdbc	3.42.0.0
org.xerial.snappy	snappy-java	1.1.10.3
org.yaml	snakeyaml	2.0
oro	oro	2.0.8
pl.edu.icm	JLargeArrays	1.5
software.amazon.cryptools	AmazonCorrettoCryptoProvider	2.4.1-linux-x86_64
stax	stax-api	1.0.1

tip

DBR 17.0 (Beta) new and updated features​

SQL procedure support​

Set a default collation for SQL Functions​

Recursive common table expressions (rCTE) support​

ANSI SQL enabled by default​

PySpark and Spark Connect now support the DataFrames df.mergeInto API​

Support ALL CATALOGS in SHOW SCHEMAS​

Liquid clustering now compacts deletion vectors more efficiently​

Allow non-deterministic expressions in UPDATE/INSERT column values for MERGE operations​

Ignore and rescue empty structs for AutoLoader ingestion (especially Avro)​

Change Delta MERGE Python and Scala APIs to return DataFrame instead of Unit​

Behavioral changes​

Behavioral change for the Auto Loader incremental directory listing option​

Removed the "True cache misses" section in Spark UI​

Removed the "Cache Metadata Manager Peak Disk Usage" metric in the Spark UI​

Removed the "Rescheduled cache miss bytes" section in the Spark UI​

CREATE VIEW column-level clauses now throw errors when the clause would only apply to materialized views​

Library upgrades​

Apache Spark​

Core and Spark SQL highlights​

Spark Core​

Spark SQL​

Features​

Functions​

Query optimization​

Query execution​

Spark Connectors​

DS v2 framework support changes​

Hive Catalog support changes​

XML support changes​

CSV support changes​

ORC support changes​

Avro support changes​

JDBC changes​

Other notable changes​

PySpark​

Highlights​

DataFrame APIs features​

Pandas API on Spark features​

Other notable PySpark changes​

Spark Streaming​

Highlights​

Other notable streaming changes​

Spark ML​

Spark UX​

Other notable Spark UX changes​

Spark Connect​

Highlights​

Other Spark Connect changes and improvements​

System environment​

Installed Python libraries​

Installed R libraries​

Installed Java and Scala libraries (Scala 2.13 cluster version)​

DBR 17.0 (Beta) new and updated features

SQL procedure support

Set a default collation for SQL Functions

Recursive common table expressions (rCTE) support

ANSI SQL enabled by default

PySpark and Spark Connect now support the DataFrames `df.mergeInto` API

Support `ALL CATALOGS` in `SHOW` SCHEMAS

Liquid clustering now compacts deletion vectors more efficiently

Allow non-deterministic expressions in `UPDATE`/`INSERT` column values for `MERGE` operations

Ignore and rescue empty structs for AutoLoader ingestion (especially Avro)

Change Delta MERGE Python and Scala APIs to return DataFrame instead of Unit

Behavioral changes

Behavioral change for the Auto Loader incremental directory listing option

Removed the "True cache misses" section in Spark UI

Removed the "Cache Metadata Manager Peak Disk Usage" metric in the Spark UI

Removed the "Rescheduled cache miss bytes" section in the Spark UI

`CREATE VIEW` column-level clauses now throw errors when the clause would only apply to materialized views

Library upgrades

Apache Spark

Core and Spark SQL highlights

Spark Core

Spark SQL

Features

Functions

Query optimization

Query execution

Spark Connectors

DS v2 framework support changes

Hive Catalog support changes

XML support changes

CSV support changes

ORC support changes

Avro support changes

JDBC changes

Other notable changes

PySpark

Highlights

DataFrame APIs features

Pandas API on Spark features

Other notable PySpark changes

Spark Streaming

Highlights

Other notable streaming changes

Spark ML

Spark UX

Other notable Spark UX changes

Spark Connect

Highlights

Other Spark Connect changes and improvements

System environment

Installed Python libraries

Installed R libraries

Installed Java and Scala libraries (Scala 2.13 cluster version)