Databricks Runtime maintenance updates

This article lists maintenance updates for supported Databricks Runtime versions. To add a maintenance update to an existing cluster, restart the cluster. For the maintenance updates on unsupported Databricks Runtime versions, see Maintenance updates for Databricks Runtime (archived).

Note

Releases are staged. Your Databricks account might not update for a few days after the initial release date.

Databricks Runtime releases

Databricks Runtime 14.3

See Databricks Runtime 14.3 LTS.

  • January 3, 2024

    • [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD.

    • [SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.

    • [SPARK-46954] XML: Wrap InputStreamReader with BufferedReader.

    • [SPARK-46655] Skip query context catching in DataFrame methods.

    • [SPARK-44815] Cache df.schema to avoid extra RPC.

    • [SPARK-46952] XML: Limit size of corrupt record.

    • [SPARK-46794] Remove subqueries from LogicalRDD constraints.

    • [SPARK-46736] retain empty message field in protobuf connector.

    • [SPARK-45182] Ignore task completion from old stage after retrying parent-indeterminate stage as determined by checksum.

    • [SPARK-46414] Use prependBaseUri to render javascript imports.

    • [SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of TaskInfo.accumulables().

    • [SPARK-46861] Avoid Deadlock in DAGScheduler.

    • [SPARK-46954] XML: Optimize schema index lookup.

    • [SPARK-46676] dropDuplicatesWithinWatermark should not fail on canonicalization of the plan.

    • [SPARK-46644] Change add and merge in SQLMetric to use isZero.

    • [SPARK-46731] Manage state store provider instance by state data source - reader.

    • [SPARK-46677] Fix dataframe["*"] resolution.

    • [SPARK-46610] Create table should throw exception when no value for a key in options.

    • [SPARK-46941] Can’t insert window group limit node for top-k computation if contains SizeBasedWindowFunction.

    • [SPARK-45433] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat.

    • [SPARK-46930] Add support for a custom prefix for Union type fields in Avro.

    • [SPARK-46227] Backport to 14.3.

    • [SPARK-46822] Respect spark.sql.legacy.charVarcharAsString when casting jdbc type to catalyst type in jdbc.

    • Operating system security updates.

Databricks Runtime 14.2

See Databricks Runtime 14.2.

  • February 8, 2024

    • [SPARK-46930] Add support for a custom prefix for Union type fields in Avro.

    • [SPARK-46822] Respect spark.sql.legacy.charVarcharAsString when casting jdbc type to catalyst type in jdbc.

    • [SPARK-46952] XML: Limit size of corrupt record.

    • [SPARK-46644] Change add and merge in SQLMetric to use isZero.

    • [SPARK-46861] Avoid Deadlock in DAGScheduler.

    • [SPARK-46794] Remove subqueries from LogicalRDD constraints.

    • [SPARK-46941] Can’t insert window group limit node for top-k computation if contains SizeBasedWindowFunction.

    • [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD.

    • Operating system security updates.

  • January 31, 2024

    • [SPARK-46382] XML: Update doc for ignoreSurroundingSpaces.

    • [SPARK-46382] XML: Capture values interspersed between elements.

    • [SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.

    • Revert [SPARK-46769] Refine timestamp related schema inference.

    • [SPARK-46677] Fix dataframe["*"] resolution.

    • [SPARK-46382] XML: Default ignoreSurroundingSpaces to true.

    • [SPARK-46633] Fix Avro reader to handle zero-length blocks.

    • [SPARK-45964] Remove private sql accessor in XML and JSON package under catalyst package.

    • [SPARK-46581] Update comment on isZero in AccumulatorV2.

    • [SPARK-45912] Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility.

    • [SPARK-45182] Ignore task completion from old stage after retrying parent-indeterminate stage as determined by checksum.

    • [SPARK-46660] ReattachExecute requests updates aliveness of SessionHolder.

    • [SPARK-46610] Create table should throw exception when no value for a key in options.

    • [SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of TaskInfo.accumulables().

    • [SPARK-46769] Refine timestamp related schema inference.

    • [SPARK-46684] Fix CoGroup.applyInPandas/Arrow to pass arguments properly.

    • [SPARK-46676] dropDuplicatesWithinWatermark should not fail on canonicalization of the plan.

    • [SPARK-45962] Remove treatEmptyValuesAsNulls and use nullValue option instead in XML.

    • [SPARK-46541] Fix the ambiguous column reference in self join.

    • [SPARK-46599] XML: Use TypeCoercion.findTightestCommonType for compatibility check.

    • Operating system security updates.

  • January 17, 2024

    • The shuffle node of the explain plan returned by a Photon query is updated to add the causedBroadcastJoinBuildOOM=true flag when an out-of-memory error occurs during a shuffle that is part of a broadcast join.

    • To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.

    • [SPARK-46261] DataFrame.withColumnsRenamed should keep the dict/map ordering.

    • [SPARK-46538] Fix the ambiguous column reference issue in ALSModel.transform.

    • [SPARK-46145] spark.catalog.listTables does not throw exception when the table or view is not found.

    • [SPARK-46484] Make resolveOperators helper functions keep the plan id.

    • [SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true.

    • [SPARK-46609] Avoid exponential explosion in PartitioningPreservingUnaryExecNode.

    • [SPARK-46446] Disable subqueries with correlated OFFSET to fix correctness bug.

    • [SPARK-46152] XML: Add DecimalType support in XML schema inference.

    • [SPARK-46602] Propagate allowExisting in view creation when the view/table does not exists.

    • [SPARK-45814] Make ArrowConverters.createEmptyArrowBatch call close() to avoid memory leak.

    • [SPARK-46058] Add separate flag for privateKeyPassword.

    • [SPARK-46132] Support key password for JKS keys for RPC SSL.

    • [SPARK-46600] Move shared code between SqlConf and SqlApiConf to SqlApiConfHelper.

    • [SPARK-46478] Revert SPARK-43049 to use oracle varchar(255) for string.

    • [SPARK-46417] Do not fail when calling hive.getTable and throwException is false.

    • [SPARK-46153] XML: Add TimestampNTZType support.

    • [BACKPORT][[SPARK-46056]]https://issues.apache.org/jira/browse/SPARK-46056) Fix Parquet vectorized read NPE with byteArrayDecimalType default value.

    • [SPARK-46466] Vectorized parquet reader should never do rebase for timestamp ntz.

    • [SPARK-46260] DataFrame.withColumnsRenamed should respect the dict ordering.

    • [SPARK-46036] Removing error-class from raise_error function.

    • [SPARK-46294] Clean up semantics of init vs zero value.

    • [SPARK-46173] Skipping trimAll call during date parsing.

    • [SPARK-46250] Deflake test_parity_listener.

    • [SPARK-46587] XML: Fix XSD big integer conversion.

    • [SPARK-46396] Timestamp inference should not throw exception.

    • [SPARK-46241] Fix error handling routine so it wouldn’t fall into infinite recursion.

    • [SPARK-46355] XML: Close InputStreamReader on read completion.

    • [SPARK-46370] Fix bug when querying from table after changing column defaults.

    • [SPARK-46265] Assertions in AddArtifact RPC make the connect client incompatible with older clusters.

    • [SPARK-46308] Forbid recursive error handling.

    • [SPARK-46337] Make CTESubstitution retain the PLAN_ID_TAG.

  • December 14, 2023

    • [SPARK-46141] Change default for spark.sql.legacy.ctePrecedencePolicy to CORRECTED.

    • [SPARK-45730] Make ReloadingX509TrustManagerSuite less flaky.

    • [SPARK-45852] Gracefully deal with recursion error during logging.

    • [SPARK-45808] Better error handling for SQL Exceptions.

    • [SPARK-45920] group by ordinal should be idempotent.

    • Revert “[SPARK-45649] Unify the prepare framework for OffsetWindowFunctionFrame”.

    • [SPARK-45733] Support multiple retry policies.

    • [SPARK-45509] Fix df column reference behavior for Spark Connect.

    • [SPARK-45655] Allow non-deterministic expressions inside AggregateFunctions in CollectMetrics.

    • [SPARK-45905] Least common type between decimal types should retain integral digits first.

    • [SPARK-45136] Enhance ClosureCleaner with Ammonite support.

    • [SPARK-46255] Support complex type -> string conversion.

    • [SPARK-45859] Make UDF objects in ml.functions lazy.

    • [SPARK-46028] Make Column.__getitem__ accept input column.

    • [SPARK-45798] Assert server-side session ID.

    • [SPARK-45892] Refactor optimizer plan validation to decouple validateSchemaOutput and validateExprIdUniqueness.

    • [SPARK-45844] Implement case-insensitivity for XML.

    • [SPARK-45770] Introduce plan DataFrameDropColumns for Dataframe.drop.

    • [SPARK-44790] XML: to_xml implementation and bindings for python, connect and SQL.

    • [SPARK-45851] Support multiple policies in scala client.

    • Operating system security updates.

  • November 29, 2023

    • Installed a new package, pyarrow-hotfix to remediate a PyArrow RCE vulnerability.

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.

    • [SPARK-45730] Improved time constraints for ReloadingX509TrustManagerSuite.

    • [SPARK-45852] The Python client for Spark Connect now catches recursion errors during text conversion.

    • [SPARK-45808] Improved error handling for SQL exceptions.

    • [SPARK-45920] GROUP BY ordinal is doesn’t replace the ordinal.

    • Revert [SPARK-45649].

    • [SPARK-45733] Added support for multiple retry policies.

    • [SPARK-45509] Fixed df column reference behavior for Spark Connect.

    • [SPARK-45655] Allow non-deterministic expressions inside AggregateFunctions in CollectMetrics.

    • [SPARK-45905] The least common type between decimal types now retain integral digits first.

    • [SPARK-45136] Enhance ClosureCleaner with Ammonite support.

    • [SPARK-45859] Made UDF objects in ml.functions lazy.

    • [SPARK-46028] Column.__getitem__ accepts input columns.

    • [SPARK-45798] Assert server-side session ID.

    • [SPARK-45892] Refactor optimizer plan validation to decouple validateSchemaOutput and validateExprIdUniqueness.

    • [SPARK-45844] Implement case-insensitivity for XML.

    • [SPARK-45770] Fixed column resolution with DataFrameDropColumns for Dataframe.drop.

    • [SPARK-44790] Added to_xml implementation and bindings for Python, Spark Connect, and SQL.

    • [SPARK-45851] Added support for multiple policies in the Scala client.

    • Operating system security updates.

Databricks Runtime 14.1

See Databricks Runtime 14.1.

  • February 8, 2024

    • [SPARK-46952] XML: Limit size of corrupt record.

    • [SPARK-45182] Ignore task completion from old stage after retrying parent-indeterminate stage as determined by checksum.

    • [SPARK-46794] Remove subqueries from LogicalRDD constraints.

    • [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD.

    • [SPARK-46861] Avoid Deadlock in DAGScheduler.

    • [SPARK-45582] Ensure that store instance is not used after calling commit within output mode streaming aggregation.

    • [SPARK-46930] Add support for a custom prefix for Union type fields in Avro.

    • [SPARK-46941] Can’t insert window group limit node for top-k computation if contains SizeBasedWindowFunction.

    • [SPARK-46396] Timestamp inference should not throw exception.

    • [SPARK-46822] Respect spark.sql.legacy.charVarcharAsString when casting jdbc type to catalyst type in jdbc.

    • [SPARK-45957] Avoid generating execution plan for non-executable commands.

    • Operating system security updates.

  • January 31, 2024

    • [SPARK-46684] Fix CoGroup.applyInPandas/Arrow to pass arguments properly.

    • [SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.

    • [SPARK-45498] Followup: Ignore task completion from old stage attempts.

    • [SPARK-46382] XML: Update doc for ignoreSurroundingSpaces.

    • [SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of TaskInfo.accumulables().

    • [SPARK-46382] XML: Default ignoreSurroundingSpaces to true.

    • [SPARK-46677] Fix dataframe["*"] resolution.

    • [SPARK-46676] dropDuplicatesWithinWatermark should not fail on canonicalization of the plan.

    • [SPARK-46633] Fix Avro reader to handle zero-length blocks.

    • [SPARK-45912] Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility.

    • [SPARK-46599] XML: Use TypeCoercion.findTightestCommonType for compatibility check.

    • [SPARK-46382] XML: Capture values interspersed between elements.

    • [SPARK-46769] Refine timestamp related schema inference.

    • [SPARK-46610] Create table should throw exception when no value for a key in options.

    • [SPARK-45964] Remove private sql accessor in XML and JSON package under catalyst package.

    • Revert [SPARK-46769] Refine timestamp related schema inference.

    • [SPARK-45962] Remove treatEmptyValuesAsNulls and use nullValue option instead in XML.

    • [SPARK-46541] Fix the ambiguous column reference in self join.

    • Operating system security updates.

  • January 17, 2024

    • The shuffle node of the explain plan returned by a Photon query is updated to add the causedBroadcastJoinBuildOOM=true flag when an out-of-memory error occurs during a shuffle that is part of a broadcast join.

    • To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.

    • [SPARK-46538] Fix the ambiguous column reference issue in ALSModel.transform.

    • [SPARK-46417] Do not fail when calling hive.getTable and throwException is false.

    • [SPARK-46484] Make resolveOperators helper functions keep the plan id.

    • [SPARK-46153] XML: Add TimestampNTZType support.

    • [SPARK-46152] XML: Add DecimalType support in XML schema inference.

    • [SPARK-46145] spark.catalog.listTables does not throw exception when the table or view is not found.

    • [SPARK-46478] Revert SPARK-43049 to use oracle varchar(255) for string.

    • [SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true.

    • [SPARK-46337] Make CTESubstitution retain the PLAN_ID_TAG.

    • [SPARK-46466] Vectorized parquet reader should never do rebase for timestamp ntz.

    • [SPARK-46587] XML: Fix XSD big integer conversion.

    • [SPARK-45814] Make ArrowConverters.createEmptyArrowBatch call close() to avoid memory leak.

    • [SPARK-46132] Support key password for JKS keys for RPC SSL.

    • [SPARK-46602] Propagate allowExisting in view creation when the view/table does not exists.

    • [SPARK-46173] Skipping trimAll call during date parsing.

    • [SPARK-46355] XML: Close InputStreamReader on read completion.

    • [SPARK-46600] Move shared code between SqlConf and SqlApiConf to SqlApiConfHelper.

    • [SPARK-46261] DataFrame.withColumnsRenamed should keep the dict/map ordering.

    • [SPARK-46056] Fix Parquet vectorized read NPE with byteArrayDecimalType default value.

    • [SPARK-46260] DataFrame.withColumnsRenamed should respect the dict ordering.

    • [SPARK-46250] Deflake test_parity_listener.

    • [SPARK-46370] Fix bug when querying from table after changing column defaults.

    • [SPARK-46609] Avoid exponential explosion in PartitioningPreservingUnaryExecNode.

    • [SPARK-46058] Add separate flag for privateKeyPassword.

  • December 14, 2023

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were handled incorrectly and interpreted as wildcards.

    • [SPARK-45509] Fix df column reference behavior for Spark Connect.

    • [SPARK-45844] Implement case-insensitivity for XML.

    • [SPARK-46141] Change default for spark.sql.legacy.ctePrecedencePolicy to CORRECTED.

    • [SPARK-46028] Make Column.__getitem__ accept input column.

    • [SPARK-46255] Support complex type -> string conversion.

    • [SPARK-45655] Allow non-deterministic expressions inside AggregateFunctions in CollectMetrics.

    • [SPARK-45433] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat.

    • [SPARK-45316] Add new parameters ignoreCorruptFiles/ignoreMissingFiles to HadoopRDD and NewHadoopRDD.

    • [SPARK-45852] Gracefully deal with recursion error during logging.

    • [SPARK-45920] group by ordinal should be idempotent.

    • Operating system security updates.

  • November 29, 2023

    • Installed a new package, pyarrow-hotfix to remediate a PyArrow RCE vulnerability.

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.

    • When ingesting CSV data using Auto Loader or Streaming Tables, large CSV files are now splittable and can be processed in parallel during both schema inference and data processing.

    • [SPARK-45892] Refactor optimizer plan validation to decouple validateSchemaOutput and validateExprIdUniqueness.

    • [SPARK-45620] APIs related to Python UDF now use camelCase.

    • [SPARK-44790] Added to_xml implementation and bindings for Python, Spark Connect, and SQL.

    • [SPARK-45770] Fixed column resolution with DataFrameDropColumns for Dataframe.drop.

    • [SPARK-45859] Made UDF objects in ml.functions lazy.

    • [SPARK-45730] Improved time constraints for ReloadingX509TrustManagerSuite.

    • [SPARK-44784] Made SBT testing hermetic.

    • Operating system security updates.

  • November 10, 2023

    • [SPARK-45545] SparkTransportConf inherits SSLOptions upon creation.

    • [SPARK-45250] Added support for stage-level task resource profile for yarn clusters when dynamic allocation is turned off.

    • [SPARK-44753] Added XML DataFrame reader and writer for PySpark SQL.

    • [SPARK-45396] Added a doc entry for PySpark.ml.connect module.

    • [SPARK-45584] Fixed subquery run failure with TakeOrderedAndProjectExec.

    • [SPARK-45541] Added SSLFactory.

    • [SPARK-45577] Fixed UserDefinedPythonTableFunctionAnalyzeRunner to pass folded values from named arguments.

    • [SPARK-45562] Made ‘rowTag’ a required option.

    • [SPARK-45427] Added RPC SSL settings to SSLOptions and SparkTransportConf.

    • [SPARK-43380] Fixed slowdown in Avro read.

    • [SPARK-45430] FramelessOffsetWindowFunction no longer fails when IGNORE NULLS and offset > rowCount.

    • [SPARK-45429] Added helper classes for SSL RPC communication.

    • [SPARK-45386] Fixed an issue where StorageLevel.NONE would incorrectly return 0.

    • [SPARK-44219] Added per-rule validation checks for optimization rewrites.

    • [SPARK-45543] Fixed an issue where InferWindowGroupLimit caused an issue if the other window functions didn’t have the same window frame as the rank-like functions.

    • Operating system security updates.

  • September 27, 2023

    • [SPARK-44823] Updated black to 23.9.1 and fixed erroneous check.

    • [SPARK-45339] PySpark now logs errors it retries.

    • Revert [SPARK-42946] Redacted sensitive data nested under variable substitutions.

    • [SPARK-44551] Edited comments to sync with OSS.

    • [SPARK-45360] Spark session builder supports initialization from SPARK_REMOTE.

    • [SPARK-45279] Attached plan_id to all logical plans.

    • [SPARK-45425] Mapped TINYINT to ShortType for MsSqlServerDialect.

    • [SPARK-45419] Removed file version map entry of larger versions to avoid reusing rocksdb sst file IDs.

    • [SPARK-45488] Added support for value in rowTag element.

    • [SPARK-42205] Removed logging of Accumulables in Task/Stage start events in JsonProtocol event logs.

    • [SPARK-45426] Added support for ReloadingX509TrustManager.

    • [SPARK-45256] DurationWriter fails when writing more values than the initial capacity.

    • [SPARK-43380] Fixed Avro data type conversion issues without causing performance regression.

    • [SPARK-45182] Added support for rolling back shuffle map stage so all stage tasks can be retried when the stage output is indeterminate.

    • [SPARK-45399] Added XML Options using newOption.

    • Operating system security updates.

Databricks Runtime 14.0

See Databricks Runtime 14.0.

  • February 8, 2024

    • [SPARK-46396] Timestamp inference should not throw exception.

    • [SPARK-46794] Remove subqueries from LogicalRDD constraints.

    • [SPARK-45182] Ignore task completion from old stage after retrying parent-indeterminate stage as determined by checksum.

    • [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD.

    • [SPARK-45957] Avoid generating execution plan for non-executable commands.

    • [SPARK-46861] Avoid Deadlock in DAGScheduler.

    • [SPARK-46930] Add support for a custom prefix for Union type fields in Avro.

    • [SPARK-46941] Can’t insert window group limit node for top-k computation if contains SizeBasedWindowFunction.

    • [SPARK-45582] Ensure that store instance is not used after calling commit within output mode streaming aggregation.

    • Operating system security updates.

  • January 31, 2024

    • [SPARK-46541] Fix the ambiguous column reference in self join.

    • [SPARK-46676] dropDuplicatesWithinWatermark should not fail on canonicalization of the plan.

    • [SPARK-46769] Refine timestamp related schema inference.

    • [SPARK-45498] Followup: Ignore task completion from old stage attempts.

    • Revert [SPARK-46769] Refine timestamp related schema inference.

    • [SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of TaskInfo.accumulables().

    • [SPARK-46633] Fix Avro reader to handle zero-length blocks.

    • [SPARK-46677] Fix dataframe["*"] resolution.

    • [SPARK-46684] Fix CoGroup.applyInPandas/Arrow to pass arguments properly.

    • [SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.

    • [SPARK-46610] Create table should throw exception when no value for a key in options.

    • Operating system security updates.

  • January 17, 2024

    • The shuffle node of the explain plan returned by a Photon query is updated to add the causedBroadcastJoinBuildOOM=true flag when an out-of-memory error occurs during a shuffle that is part of a broadcast join.

    • To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.

    • [SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true.

    • [SPARK-46250] Deflake test_parity_listener.

    • [SPARK-45814] Make ArrowConverters.createEmptyArrowBatch call close() to avoid memory leak.

    • [SPARK-46173] Skipping trimAll call during date parsing.

    • [SPARK-46484] Make resolveOperators helper functions keep the plan id.

    • [SPARK-46466] Vectorized parquet reader should never do rebase for timestamp ntz.

    • [SPARK-46056] Fix Parquet vectorized read NPE with byteArrayDecimalType default value.

    • [SPARK-46058] Add separate flag for privateKeyPassword.

    • [SPARK-46478] Revert SPARK-43049 to use oracle varchar(255) for string.

    • [SPARK-46132] Support key password for JKS keys for RPC SSL.

    • [SPARK-46417] Do not fail when calling hive.getTable and throwException is false.

    • [SPARK-46261] DataFrame.withColumnsRenamed should keep the dict/map ordering.

    • [SPARK-46370] Fix bug when querying from table after changing column defaults.

    • [SPARK-46609] Avoid exponential explosion in PartitioningPreservingUnaryExecNode.

    • [SPARK-46600] Move shared code between SqlConf and SqlApiConf to SqlApiConfHelper.

    • [SPARK-46538] Fix the ambiguous column reference issue in ALSModel.transform.

    • [SPARK-46337] Make CTESubstitution retain the PLAN_ID_TAG.

    • [SPARK-46602] Propagate allowExisting in view creation when the view/table does not exists.

    • [SPARK-46260] DataFrame.withColumnsRenamed should respect the dict ordering.

    • [SPARK-46145] spark.catalog.listTables does not throw exception when the table or view is not found.

  • December 14, 2023

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were handled incorrectly and interpreted as wildcards.

    • [SPARK-46255] Support complex type -> string conversion.

    • [SPARK-46028] Make Column.__getitem__ accept input column.

    • [SPARK-45920] group by ordinal should be idempotent.

    • [SPARK-45433] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat.

    • [SPARK-45509] Fix df column reference behavior for Spark Connect.

    • Operating system security updates.

  • November 29, 2023

    • Installed a new package, pyarrow-hotfix to remediate a PyArrow RCE vulnerability.

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.

    • When ingesting CSV data using Auto Loader or Streaming Tables, large CSV files are now splittable and can be processed in parallel during both schema inference and data processing.

    • Spark-snowflake connector is upgraded to 2.12.0.

    • [SPARK-45859] Made UDF objects in ml.functions lazy.

    • Revert [SPARK-45592].

    • [SPARK-45892] Refactor optimizer plan validation to decouple validateSchemaOutput and validateExprIdUniqueness.

    • [SPARK-45592] Fixed correctness issue in AQE with InMemoryTableScanExec.

    • [SPARK-45620] APIs related to Python UDF now use camelCase.

    • [SPARK-44784] Made SBT testing hermetic.

    • [SPARK-45770] Fixed column resolution with DataFrameDropColumns for Dataframe.drop.

    • [SPARK-45544] Integrated SSL support into TransportContext.

    • [SPARK-45730] Improved time constraints for ReloadingX509TrustManagerSuite.

    • Operating system security updates.

  • November 10, 2023

    • Changed data feed queries on Unity Catalog Streaming Tables and Materialized Views to display error messages.

    • [SPARK-45545] SparkTransportConf inherits SSLOptions upon creation.

    • [SPARK-45584] Fixed subquery run failure with TakeOrderedAndProjectExec.

    • [SPARK-45427] Added RPC SSL settings to SSLOptions and SparkTransportConf.

    • [SPARK-45541] Added SSLFactory.

    • [SPARK-45430] FramelessOffsetWindowFunction no longer fails when IGNORE NULLS and offset > rowCount.

    • [SPARK-45429] Added helper classes for SSL RPC communication.

    • [SPARK-44219] Added extra per-rule validations for optimization rewrites.

    • [SPARK-45543] Fixed an issue where InferWindowGroupLimit generated an error if the other window functions haven’t the same window frame as the rank-like functions.

    • Operating system security updates.

  • October 23, 2023

    • [SPARK-45426] Added support for ReloadingX509TrustManager.

    • [SPARK-45396] Added doc entry for PySpark.ml.connect module, and added Evaluator to __all__ at ml.connect.

    • [SPARK-45256] Fixed an issue where DurationWriter failed when writing more values than initial capacity.

    • [SPARK-45279] Attached plan_id to all logical plans.

    • [SPARK-45250] Added support for stage-level task resource profile for yarn clusters when dynamic allocation is turned off.

    • [SPARK-45182] Added support for rolling back shuffle map stage so all stage tasks can be retried when the stage output is indeterminate.

    • [SPARK-45419] Avoid reusing rocksdb sst files in a different rocksdb instance by removing file version map entries of larger versions.

    • [SPARK-45386] Fixed an issue where StorageLevel.NONE would incorrectly return 0.

    • Operating system security updates.

  • October 13, 2023

    • Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.

    • The array_insert function is 1-based for positive and negative indexes, while before, it was 0-based for negative indexes. It now inserts a new element at the end of input arrays for the index -1. To restore the previous behavior, set spark.sql.legacy.negativeIndexInArrayInsert to true.

    • Databricks no longer ignores corrupt files when a CSV schema inference with Auto Loader has enabled ignoreCorruptFiles.

    • [SPARK-45227] Fixed a subtle thread-safety issue with CoarseGrainedExecutorBackend.

    • [SPARK-44658] ShuffleStatus.getMapStatus should return None instead of Some(null).

    • [SPARK-44910] Encoders.bean does not support superclasses with generic type arguments.

    • [SPARK-45346] Parquet schema inference respects case-sensitive flags when merging schema.

    • Revert [SPARK-42946].

    • [SPARK-42205] Updated the JSON protocol to remove Accumulables logging in a task or stage start events.

    • [SPARK-45360] Spark session builder supports initialization from SPARK_REMOTE.

    • [SPARK-45316] Add new parameters ignoreCorruptFiles/ignoreMissingFiles to HadoopRDD and NewHadoopRDD.

    • [SPARK-44909] Skip running the torch distributor log streaming server when it is not available.

    • [SPARK-45084] StateOperatorProgress now uses accurate shuffle partition number.

    • [SPARK-45371] Fixed shading issues in the Spark Connect Scala Client.

    • [SPARK-45178] Fallback to running a single batch for Trigger.AvailableNow with unsupported sources rather than using the wrapper.

    • [SPARK-44840] Make array_insert() 1-based for negative indexes.

    • [SPARK-44551] Edited comments to sync with OSS.

    • [SPARK-45078] The ArrayInsert function now makes explicit casting when the element type does not equal the derived component type.

    • [SPARK-45339] PySpark now logs retry errors.

    • [SPARK-45057] Avoid acquiring read lock when keepReadLock is false.

    • [SPARK-44908] Fixed cross-validator foldCol param functionality.

    • Operating system security updates.

Databricks Runtime 13.3 LTS

See Databricks Runtime 13.3 LTS.

  • February 8, 2024

    • [SPARK-46794] Remove subqueries from LogicalRDD constraints.

    • [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD.

    • [SPARK-45582] Ensure that store instance is not used after calling commit within output mode streaming aggregation.

    • [SPARK-46396] Timestamp inference should not throw exception.

    • [SPARK-46861] Avoid Deadlock in DAGScheduler.

    • [SPARK-46941] Can’t insert window group limit node for top-k computation if contains SizeBasedWindowFunction.

    • Operating system security updates.

  • January 31, 2024

    • [SPARK-46610] Create table should throw exception when no value for a key in options.

    • [SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of TaskInfo.accumulables().

    • [SPARK-46600] Move shared code between SqlConf and SqlApiConf to SqlApiConfHelper.

    • [SPARK-46676] dropDuplicatesWithinWatermark should not fail on canonicalization of the plan.

    • [SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.

    • Operating system security updates.

  • January 17, 2024

    • The shuffle node of the explain plan returned by a Photon query is updated to add the causedBroadcastJoinBuildOOM=true flag when an out-of-memory error occurs during a shuffle that is part of a broadcast join.

    • To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.

    • [SPARK-46058] Add separate flag for privateKeyPassword.

    • [SPARK-46173] Skipping trimAll call during date parsing.

    • [SPARK-46370] Fix bug when querying from table after changing column defaults.

    • [SPARK-46370] Fix bug when querying from table after changing column defaults.

    • [SPARK-46370] Fix bug when querying from table after changing column defaults.

    • [SPARK-46609] Avoid exponential explosion in PartitioningPreservingUnaryExecNode.

    • [SPARK-46132] Support key password for JKS keys for RPC SSL.

    • [SPARK-46602] Propagate allowExisting in view creation when the view/table does not exists.

    • [SPARK-46249] Require instance lock for acquiring RocksDB metrics to prevent race with background operations.

    • [SPARK-46417] Do not fail when calling hive.getTable and throwException is false.

    • [SPARK-46538] Fix the ambiguous column reference issue in ALSModel.transform.

    • [SPARK-46478] Revert SPARK-43049 to use oracle varchar(255) for string.

    • [SPARK-46250] Deflake test_parity_listener.

    • [SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true.

    • [SPARK-46056] Fix Parquet vectorized read NPE with byteArrayDecimalType default value.

    • [SPARK-46145] spark.catalog.listTables does not throw exception when the table or view is not found.

    • [SPARK-46466] Vectorized parquet reader should never do rebase for timestamp ntz.

  • December 14, 2023

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were handled incorrectly and interpreted as wildcards.

    • [SPARK-45920] group by ordinal should be idempotent.

    • [SPARK-44582] Skip iterator on SMJ if it was cleaned up.

    • [SPARK-45433] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat.

    • [SPARK-45655] Allow non-deterministic expressions inside AggregateFunctions in CollectMetrics.

    • Operating system security updates.

  • November 29, 2023

    • Installed a new package, pyarrow-hotfix to remediate a PyArrow RCE vulnerability.

    • Spark-snowflake connector is upgraded to 2.12.0.

    • [SPARK-44846] Removed complex grouping expressions after RemoveRedundantAggregates.

    • [SPARK-45544] Integrated SSL support into TransportContext.

    • [SPARK-45892] Refactor optimizer plan validation to decouple validateSchemaOutput and validateExprIdUniqueness.

    • [SPARK-45730] Improved time constraints for ReloadingX509TrustManagerSuite.

    • [SPARK-45859] Made UDF objects in ml.functions lazy.

    • Operating system security updates.

  • November 10, 2023

    • Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.

    • Changed data feed queries on Unity Catalog Streaming Tables and Materialized Views to display error messages.

    • [SPARK-45545] SparkTransportConf inherits SSLOptions upon creation.

    • [SPARK-45584] Fixed subquery run failure with TakeOrderedAndProjectExec.

    • [SPARK-45427] Added RPC SSL settings to SSLOptions and SparkTransportConf.

    • [SPARK-45541] Added SSLFactory.

    • [SPARK-45430] FramelessOffsetWindowFunction no longer fails when IGNORE NULLS and offset > rowCount.

    • [SPARK-45429] Added helper classes for SSL RPC communication.

    • [SPARK-44219] Added extra per-rule validations for optimization rewrites.

    • [SPARK-45543] Fixed an issue where InferWindowGroupLimit caused an issue if the other window functions didn’t have the same window frame as the rank-like functions.

    • Operating system security updates.

  • October 23, 2023

    • [SPARK-45256] Fixed an issue where DurationWriter failed when writing more values than initial capacity.

    • [SPARK-45419] Avoid reusing rocksdb sst files in a different rocksdb instance by removing file version map entries of larger versions.

    • [SPARK-45426] Added support for ReloadingX509TrustManager.

    • Miscellaneous fixes.

  • October 13, 2023

    • Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.

    • The array_insert function is 1-based for positive and negative indexes, while before, it was 0-based for negative indexes. It now inserts a new element at the end of input arrays for the index -1. To restore the previous behavior, set spark.sql.legacy.negativeIndexInArrayInsert to true.

    • Fixed an issue around not ignoring corrupt files when ignoreCorruptFiles is enabled during CSV schema inference with Auto Loader.

    • Revert [SPARK-42946].

    • [SPARK-42205] Updated the JSON protocol to remove Accumulables logging in a task or stage start events.

    • [SPARK-45178] Fallback to running a single batch for Trigger.AvailableNow with unsupported sources rather than using the wrapper.

    • [SPARK-45316] Add new parameters ignoreCorruptFiles and ignoreMissingFiles to HadoopRDD and NewHadoopRDD.

    • [SPARK-44740] Fixed metadata values for Artifacts.

    • [SPARK-45360] Initialized Spark session builder configuration from SPARK_REMOTE.

    • [SPARK-44551] Edited comments to sync with OSS.

    • [SPARK-45346] Parquet schema inference now respects case-sensitive flags when merging schema.

    • [SPARK-44658] ShuffleStatus.getMapStatus now returns None instead of Some(null).

    • [SPARK-44840] Made array_insert() 1-based for negative indexes.

  • September 14, 2023

    • [SPARK-44873] Added support for alter view with nested columns in Hive client.

    • [SPARK-44878] Turned off strict limit for RocksDB write manager to avoid insertion exception on cache complete.

  • August 30, 2023

    • The dbutils cp command (dbutils.fs.cp) has been optimized for faster copying. With this improvement, copy operations can take up to 100 less time, depending on the file size. The feature is available across all Clouds and file systems accessible in Databricks, including for Unity Catalog Volumes and DBFS mounts.

    • [SPARK-44455] Quote identifiers with backticks in the SHOW CREATE TABLE result.

    • [SPARK-44763] Fixed an issue that showed a string as a double in binary arithmetic with interval.

    • [SPARK-44871] Fixed percentile_disc behavior.

    • [SPARK-44714] Ease restriction of LCA resolution regarding queries.

    • [SPARK-44818] Fixed race for pending task interrupt issued before taskThread is initialized.

    • [SPARK-44505] Added override for columnar support in Scan for DSv2.

    • [SPARK-44479] Fixed protobuf conversion from an empty struct type.

    • [SPARK-44718] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value.

    • [SPARK-42941] Added support for StreamingQueryListener in Python.

    • [SPARK-44558] Export PySpark’s Spark Connect Log Level.

    • [SPARK-44464] Fixed applyInPandasWithStatePythonRunner to output rows that have Null as the first column value.

    • [SPARK-44643] Fixed Row.__repr__ when the field is an empty row.

    • Operating system security updates.

Databricks Runtime 12.2 LTS

See Databricks Runtime 12.2 LTS.

  • February 13, 2024

    • [SPARK-46861] Avoid Deadlock in DAGScheduler.

    • [SPARK-46794] Remove subqueries from LogicalRDD constraints.

    • Operating system security updates.

  • January 31, 2024

    • [SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.

    • Operating system security updates.

  • December 25, 2023

    • To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.

    • [SPARK-39440] Add a config to disable event timeline.

    • [SPARK-46132] Support key password for JKS keys for RPC SSL.

    • [SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true.

    • [SPARK-46417] Do not fail when calling hive.getTable and throwException is false.

    • [SPARK-43067] Correct the location of error class resource file in Kafka connector.

    • [SPARK-46249] Require instance lock for acquiring RocksDB metrics to prevent race with background operations.

    • [SPARK-46602] Propagate allowExisting in view creation when the view/table does not exists.

    • [SPARK-46058] Add separate flag for privateKeyPassword.

    • [SPARK-46145] spark.catalog.listTables does not throw exception when the table or view is not found.

    • [SPARK-46538] Fix the ambiguous column reference issue in ALSModel.transform.

    • [SPARK-42852] Revert NamedLambdaVariable related changes from EquivalentExpressions.

  • December 14, 2023

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were handled incorrectly and interpreted as wildcards.

    • [SPARK-44582] Skip iterator on SMJ if it was cleaned up.

    • [SPARK-45920] group by ordinal should be idempotent.

    • [SPARK-45655] Allow non-deterministic expressions inside AggregateFunctions in CollectMetrics.

    • Operating system security updates.

  • November 29, 2023

    • Installed a new package, pyarrow-hotfix to remediate a PyArrow RCE vulnerability.

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.

    • [SPARK-42205] Removed logging accumulables in Stage and Task start events.

    • [SPARK-44846] Removed complex grouping expressions after RemoveRedundantAggregates.

    • [SPARK-43718] Fixed nullability for keys in USING joins.

    • [SPARK-45544] Integrated SSL support into TransportContext.

    • [SPARK-43973] Structured Streaming UI now displays failed queries correctly.

    • [SPARK-45730] Improved time constraints for ReloadingX509TrustManagerSuite.

    • [SPARK-45859] Made UDF objects in ml.functions lazy.

    • Operating system security updates.

  • November 14, 2023

    • Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.

    • [SPARK-45545] SparkTransportConf inherits SSLOptions upon creation.

    • [SPARK-45427] Added RPC SSL settings to SSLOptions and SparkTransportConf.

    • [SPARK-45584] Fixed subquery run failure with TakeOrderedAndProjectExec.

    • [SPARK-45541] Added SSLFactory.

    • [SPARK-45430] FramelessOffsetWindowFunction no longer fails when IGNORE NULLS and offset > rowCount.

    • [SPARK-45429] Added helper classes for SSL RPC communication.

    • Operating system security updates.

  • October 24, 2023

    • [SPARK-45426] Added support for ReloadingX509TrustManager.

    • Miscellaneous fixes.

  • October 13, 2023

    • Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.

    • [SPARK-42553] Ensure at least one time unit after interval.

    • [SPARK-45346] Parquet schema inference respects case sensitive flag when merging schema.

    • [SPARK-45178] Fallback to running a single batch for Trigger.AvailableNow with unsupported sources rather than using the wrapper.

    • [SPARK-45084] StateOperatorProgress to use an accurate, adequate shuffle partition number.

  • September 12, 2023

    • [SPARK-44873] Added support for alter view with nested columns in the Hive client.

    • [SPARK-44718] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value.

    • [SPARK-43799] Added descriptor binary option to PySpark Protobuf API.

    • Miscellaneous fixes.

  • August 30, 2023

  • August 15, 2023

    • [SPARK-44504] Maintenance task cleans up loaded providers on stop error.

    • [SPARK-44464] Fixed applyInPandasWithStatePythonRunner to output rows that have Null as the first column value.

    • Operating system security updates.

  • July 29, 2023

    • Fixed an issue where dbutils.fs.ls() returned INVALID_PARAMETER_VALUE.LOCATION_OVERLAP when called for a storage location path which clashed with other external or managed storage location.

    • [SPARK-44199] CacheManager no longer refreshes the fileIndex unnecessarily.

    • Operating system security updates.

  • July 24, 2023

    • [SPARK-44337] Fixed an issue where any field set to Any.getDefaultInstance caused parse errors.

    • [SPARK-44136] Fixed an issue where StateManager would get materialized in an executor instead of the driver in FlatMapGroupsWithStateExec.

    • Operating system security updates.

  • June 23, 2023

    • Operating system security updates.

  • June 15, 2023

    • Photonized approx_count_distinct.

    • Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.

    • [SPARK-43779] ParseToDate now loads EvalMode in the main thread.

    • [SPARK-43156][SPARK-43098] Extended scalar subquery count error test with decorrelateInnerQuery turned off.

    • Operating system security updates.

  • June 2, 2023

    • The JSON parser in failOnUnknownFields mode drops a record in DROPMALFORMED mode and fails directly in FAILFAST mode.

    • Improve the performance of incremental updates with SHALLOW CLONE Iceberg and Parquet.

    • Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.

    • [SPARK-43404] Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.

    • [SPARK-43413][11.3-13.0] Fixed IN subquery ListQuery nullability.

    • [SPARK-43522] Fixed creating struct column name with index of array.

    • [SPARK-43541] Propagate all Project tags in resolving of expressions and missing columns.

    • [SPARK-43527] Fixed catalog.listCatalogs in PySpark.

    • [SPARK-43123] Internal field metadata no longer leaks to catalogs.

    • [SPARK-43340] Fixed missing stack trace field in eventlogs.

    • [SPARK-42444] DataFrame.drop now handles duplicated columns correctly.

    • [SPARK-42937] PlanSubqueries now sets InSubqueryExec#shouldBroadcast to true.

    • [SPARK-43286] Updated aes_encrypt CBC mode to generate random IVs.

    • [SPARK-43378] Properly close stream objects in deserializeFromChunkedBuffer.

  • May 17, 2023

    • Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.

    • If an Avro file was read with just the failOnUnknownFields\ option or with Auto Loader in the failOnNewColumns\ schema evolution mode, columns that have different data types would be read as null\ instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the rescuedDataColumn\ option.

    • Auto Loader now does the following.

      • Correctly reads and no longer rescues Integer, Short, and Byte types if one of these data types is provided, but the Avro file suggests one of the other two types.

      • Prevents reading interval types as date or time stamp types to avoid getting corrupt dates.

      • Prevents reading Decimal types with lower precision.

    • [SPARK-43172] Exposes host and token from Spark connect client.

    • [SPARK-43293] __qualified_access_only is ignored in normal columns.

    • [SPARK-43098] Fixed correctness COUNT bug when scalar subquery is grouped by clause.

    • [SPARK-43085] Support for column DEFAULT assignment for multi-part table names.

    • [SPARK-43190] ListQuery.childOutput is now consistent with secondary output.

    • [SPARK-43192] Removed user agent charset validation.

    • Operating system security updates.

  • April 25, 2023

    • If a Parquet file was read with just the failOnUnknownFields option or with Auto Loader in the failOnNewColumns schema evolution mode, columns that had different data types would be read as null instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the rescuedDataColumn option.

    • Auto Loader now correctly reads and no longer rescues Integer, Short, and Byte types if one of these data types is provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be saved even though they were readable.

    • [SPARK-43009] Parameterized sql() with Any constants

    • [SPARK-42406] Terminate Protobuf recursive fields by dropping the field

    • [SPARK-43038] Support the CBC mode by aes_encrypt()/aes_decrypt()

    • [SPARK-42971] Change to print workdir if appDirs is null when worker handle WorkDirCleanup event

    • [SPARK-43018] Fix bug for INSERT commands with timestamp literals

    • Operating system security updates.

  • April 11, 2023

    • Support legacy data source formats in the SYNC command.

    • Fixes an issue in the %autoreload behavior in notebooks outside of a repo.

    • Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.

    • [SPARK-42928] Makes resolvePersistentFunction synchronized.

    • [SPARK-42936] Fixes LCan issue when the clause can be resolved directly by its child aggregate.

    • [SPARK-42967] Fixes SparkListenerTaskStart.stageAttemptId when a task starts after the stage is canceled.

    • Operating system security updates.

  • March 29, 2023

    • Databricks SQL now supports specifying default values for columns of Delta Lake tables, either at table creation time or afterward. Subsequent INSERT, UPDATE, DELETE, and MERGE commands can refer to any column’s default value using the explicit DEFAULT keyword. In addition, if any INSERT assignment has an explicit list of fewer columns than the target table, corresponding column default values are substituted for the remaining columns (or NULL if no default is specified).

      For example:

      CREATE TABLE t (first INT, second DATE DEFAULT CURRENT_DATE()) USING delta;
      INSERT INTO t VALUES (0, DEFAULT);
      INSERT INTO t VALUES (1, DEFAULT);
      SELECT first, second FROM t;
      \> 0, 2023-03-28
      1, 2023-03-28z
      
    • Auto Loader now initiates at least one synchronous RocksDB log cleanup for Trigger.AvailableNow streams to check that the checkpoint can get regularly cleaned up for fast-running Auto Loader streams. This can cause some streams to take longer before they shut down, but it will save you storage costs and improve the Auto Loader experience in future runs.

    • You can now modify a Delta table to add support to table features using DeltaTable.addFeatureSupport(feature_name).

    • [SPARK-42794] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming

    • [SPARK-42521] Add NULLs for INSERTs with user-specified lists of fewer columns than the target table

    • [SPARK-42702][SPARK-42623] Support parameterized query in subquery and CTE

    • [SPARK-42668] Catch exception while trying to close the compressed stream in HDFSStateStoreProvider stop

    • [SPARK-42403] JsonProtocol should handle null JSON strings

  • March 8, 2023

    • The error message “Failure to initialize configuration” has been improved to provide more context for the customer.

    • There is a terminology change for adding features to a Delta table using the table property. The preferred syntax is now 'delta.feature.featureName'='supported' instead of 'delta.feature.featureName'='enabled'. For backward compatibility, using 'delta.feature.featureName'='enabled' still works and will continue to work.

    • Starting from this release, it is possible to create/replace a table with an additional table property delta.ignoreProtocolDefaults to ignore protocol-related Spark configs, which includes default reader and writer versions and table features supported by default.

    • [SPARK-42070] Change the default value of the argument of the Mask function from -1 to NULL

    • [SPARK-41793] Incorrect result for window frames defined by a range clause on significant decimals

    • [SPARK-42484] UnsafeRowUtils better error message

    • [SPARK-42516] Always capture the session time zone config while creating views

    • [SPARK-42635] Fix the TimestampAdd expression.

    • [SPARK-42622] Turned off substitution in values

    • [SPARK-42534] Fix DB2Dialect Limit clause

    • [SPARK-42121] Add built-in table-valued functions posexplode, posexplode_outer, json_tuple and stack

    • [SPARK-42045] ANSI SQL mode: Round/Bround should return an error on tiny/small/significant integer overflow

    • Operating system security updates.

Databricks Runtime 11.3 LTS

See Databricks Runtime 11.3 LTS.

  • February 13, 2024

    • [SPARK-46794] Remove subqueries from LogicalRDD constraints.

    • [SPARK-46861] Avoid Deadlock in DAGScheduler.

    • Operating system security updates.

  • January 31, 2024

    • Operating system security updates.

  • December 25, 2023

    • To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.

    • [SPARK-46058] Add separate flag for privateKeyPassword.

    • [SPARK-46602] Propagate allowExisting in view creation when the view/table does not exists.

    • [SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true.

    • [SPARK-46538] Fix the ambiguous column reference issue in ALSModel.transform.

    • [SPARK-39440] Add a config to disable event timeline.

    • [SPARK-46249] Require instance lock for acquiring RocksDB metrics to prevent race with background operations.

    • [SPARK-46132] Support key password for JKS keys for RPC SSL.

  • December 14, 2023

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were handled incorrectly and interpreted as wildcards.

    • Operating system security updates.

  • November 29, 2023

    • Installed a new package, pyarrow-hotfix to remediate a PyArrow RCE vulnerability.

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.

    • [SPARK-43973] Structured Streaming UI now displays failed queries correctly.

    • [SPARK-45730] Improved time constraints for ReloadingX509TrustManagerSuite.

    • [SPARK-45544] Integrated SSL support into TransportContext.

    • [SPARK-45859] Made UDF objects in ml.functions lazy.

    • [SPARK-43718] Fixed nullability for keys in USING joins.

    • [SPARK-44846] Removed complex grouping expressions after RemoveRedundantAggregates.

    • Operating system security updates.

  • November 14, 2023

    • Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.

    • [SPARK-42205] Removed logging accumulables in Stage and Task start events.

    • [SPARK-45545] SparkTransportConf inherits SSLOptions upon creation.

    • Revert [SPARK-33861].

    • [SPARK-45541] Added SSLFactory.

    • [SPARK-45429] Added helper classes for SSL RPC communication.

    • [SPARK-45584] Fixed subquery run failure with TakeOrderedAndProjectExec.

    • [SPARK-45430] FramelessOffsetWindowFunction no longer fails when IGNORE NULLS and offset > rowCount.

    • [SPARK-45427] Added RPC SSL settings to SSLOptions and SparkTransportConf.

    • Operating system security updates.

  • October 24, 2023

    • [SPARK-45426] Added support for ReloadingX509TrustManager.

    • Miscellaneous fixes.

  • October 13, 2023

    • Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.

    • [SPARK-45178] Fallback to running a single batch for Trigger.AvailableNow with unsupported sources rather than using the wrapper.

    • [SPARK-45084] StateOperatorProgress to use an accurate, adequate shuffle partition number.

    • [SPARK-45346] Parquet schema inference now respects case-sensitive flag when merging a schema.

    • Operating system security updates.

  • September 10, 2023

    • Miscellaneous fixes.

  • August 30, 2023

    • [SPARK-44818] Fixed race for pending task interrupt issued before taskThread is initialized.

    • [SPARK-44871][11.3-13.0] Fixed percentile_disc behavior.

    • Operating system security updates.

  • August 15, 2023

    • [SPARK-44485] Optimized TreeNode.generateTreeString.

    • [SPARK-44504] Maintenance task cleans up loaded providers on stop error.

    • [SPARK-44464] Fixed applyInPandasWithStatePythonRunner to output rows that have Null as the first column value.

    • Operating system security updates.

  • July 27, 2023

    • Fixed an issue where dbutils.fs.ls() returned INVALID_PARAMETER_VALUE.LOCATION_OVERLAP when called for a storage location path which clashed with other external or managed storage location.

    • [SPARK-44199] CacheManager no longer refreshes the fileIndex unnecessarily.

    • Operating system security updates.

  • July 24, 2023

    • [SPARK-44136] Fixed an issue that StateManager can get materialized in executor instead of driver in FlatMapGroupsWithStateExec.

    • Operating system security updates.

  • June 23, 2023

    • Operating system security updates.

  • June 15, 2023

    • Photonized approx_count_distinct.

    • Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.

    • [SPARK-43779] ParseToDate now loads EvalMode in the main thread.

    • [SPARK-40862] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery

    • [SPARK-43156][SPARK-43098] Extended scalar subquery count bug test with decorrelateInnerQuery turned off.

    • [SPARK-43098] Fix correctness COUNT bug when scalar subquery has a group by clause

    • Operating system security updates.

  • June 2, 2023

    • The JSON parser in failOnUnknownFields mode drops a record in DROPMALFORMED mode and fails directly in FAILFAST mode.

    • Improve the performance of incremental updates with SHALLOW CLONE Iceberg and Parquet.

    • Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.

    • [SPARK-43404]Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.

    • [SPARK-43527] Fixed catalog.listCatalogs in PySpark.

    • [SPARK-43413][11.3-13.0] Fixed IN subquery ListQuery nullability.

    • [SPARK-43340] Fixed missing stack trace field in eventlogs.

Databricks Runtime 10.4 LTS

See Databricks Runtime 10.4 LTS.

  • February 13, 2024

    • [SPARK-46861] Avoid Deadlock in DAGScheduler.

    • Operating system security updates.

  • January 31, 2024

    • Operating system security updates.

  • December 25, 2023

    • To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.

    • [SPARK-46058] Add separate flag for privateKeyPassword.

    • [SPARK-46538] Fix the ambiguous column reference issue in ALSModel.transform.

    • [SPARK-39440] Add a config to disable event timeline.

    • [SPARK-46132] Support key password for JKS keys for RPC SSL.

  • December 14, 2023

    • Operating system security updates.

  • November 29, 2023

    • Installed a new package, pyarrow-hotfix to remediate a PyArrow RCE vulnerability.

    • [SPARK-45544] Integrated SSL support into TransportContext.

    • [SPARK-45859] Made UDF objects in ml.functions lazy.

    • [SPARK-43718] Fixed nullability for keys in USING joins.

    • [SPARK-45730] Improved time constraints for ReloadingX509TrustManagerSuite.

    • [SPARK-42205] Removed logging accumulables in Stage and Task start events.

    • [SPARK-44846] Removed complex grouping expressions after RemoveRedundantAggregates.

    • Operating system security updates.

  • November 14, 2023

  • October 24, 2023

    • [SPARK-45426] Added support for ReloadingX509TrustManager.

    • Operating system security updates.

  • October 13, 2023

    • [SPARK-45084] StateOperatorProgress to use an accurate, adequate shuffle partition number.

    • [SPARK-45178] Fallback to running a single batch for Trigger.AvailableNow with unsupported sources rather than using the wrapper.

    • Operating system security updates.

  • September 10, 2023

    • Miscellaneous fixes.

  • August 30, 2023

    • [SPARK-44818] Fixed race for pending task interrupt issued before taskThread is initialized.

    • Operating system security updates.

  • August 15, 2023

    • [SPARK-44504] Maintenance task cleans up loaded providers on stop error.

    • [SPARK-43973] Structured Streaming UI now appears failed queries correctly.

    • Operating system security updates.

  • June 23, 2023

    • Operating system security updates.

  • June 15, 2023

    • Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.

    • [SPARK-43098] Fix correctness COUNT bug when scalar subquery has a group by clause

    • [SPARK-40862] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery

    • [SPARK-43156][SPARK-43098] Extended scalar subquery count test with decorrelateInnerQuery turned off.

    • Operating system security updates.

  • June 2, 2023

    • The JSON parser in failOnUnknownFields mode drops a record in DROPMALFORMED mode and fails directly in FAILFAST mode.

    • Fixed an issue in JSON rescued data parsing to prevent UnknownFieldException.

    • Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.

    • [SPARK-43404] Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.

    • [SPARK-43413] Fixed IN subquery ListQuery nullability.

    • Operating system security updates.

  • May 17, 2023

    • Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.

    • [SPARK-41520] Split AND_OR tree pattern to separate AND and OR.

    • [SPARK-43190] ListQuery.childOutput is now consistent with secondary output.

    • Operating system security updates.

  • April 25, 2023

    • [SPARK-42928] Make resolvePersistentFunction synchronized.

    • Operating system security updates.

  • April 11, 2023

    • Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.

    • [SPARK-42937] PlanSubqueries now sets InSubqueryExec#shouldBroadcast to true.

    • [SPARK-42967] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is canceled.

  • March 29, 2023

    • [SPARK-42668] Catch exception while trying to close the compressed stream in HDFSStateStoreProvider stop

    • [SPARK-42635] Fix the …

    • Operating system security updates.

  • March 14, 2023

    • [SPARK-41162] Fix anti- and semi-join for self-join with aggregations

    • [SPARK-33206] Fix shuffle index cache weight calculation for small index files

    • [SPARK-42484] Improved the UnsafeRowUtils error message

    • Miscellaneous fixes.

  • February 28, 2023

    • Support generated column for yyyy-MM-dd date_format. This change supports partition pruning for yyyy-MM-dd as a date_format in generated columns.

    • Users can now read and write specific Delta tables requiring Reader version 3 and Writer version 7, using Databricks Runtime 9.1 LTS or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.

    • Support generated column for yyyy-MM-dd date_format. This change supports partition pruning for yyyy-MM-dd as a date_format in generated columns.

    • Operating system security updates.

  • February 16, 2023

    • [SPARK-30220] Enable using Exists/In subqueries outside of the Filter node

    • Operating system security updates.

  • January 31, 2023

    • Table types of JDBC tables are now EXTERNAL by default.

  • January 18, 2023

    • Azure Synapse connector returns a more descriptive error message when a column name contains not valid characters such as whitespaces or semicolons. In such cases, the following message will be returned: Azure Synapse Analytics failed to run the JDBC query produced by the connector. Check column names do not include not valid characters such as ';' or white space.

    • [SPARK-38277] Clear write batch after RocksDB state store’s commit

    • [SPARK-41199] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

    • [SPARK-41198] Fix metrics in streaming query having CTE and DSv1 streaming source.

    • [SPARK-41339] Close and recreate RocksDB write batch instead of just clearing.

    • [SPARK-41732] Apply tree-pattern based pruning for the rule SessionWindowing.

    • Operating system security updates.

  • November 29, 2022

    • Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control white space handling:

      • csvignoreleadingwhitespace, when set to true, removes leading white space from values during writes when tempformat is set to CSV or CSV GZIP. Whitespaces are retained when the config is set to false. By default, the value is true.

      • csvignoretrailingwhitespace, when set to true, removes trailing white space from values during writes when tempformat is set to CSV or CSV GZIP. Whitespaces are retained when the config is set to false. By default, the value is true.

    • Fixed an issue with JSON parsing in Auto Loader when all columns were left as strings (cloudFiles.inferColumnTypes was not set or set to false) and the JSON contained nested objects.

    • Operating system security updates.

  • November 15, 2022

    • Upgraded Apache commons-text to 1.10.0.

    • [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behavior, set spark.sql.json.enablePartialResults to true. The flag is turned off by default to preserve the original behavior.

    • [SPARK-40292] Fix column names in arrays_zip function when arrays are referenced from nested structs

    • Operating system security updates.

  • November 1, 2022

    • Fixed an issue where if a Delta table had a user-defined column named _change_type, but Change data feed was turned off on that table, data in that column would incorrectly fill with NULL values when running MERGE.

    • Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when allowOverwrites is enabled

    • [SPARK-40697] Add read-side char padding to cover external data files

    • [SPARK-40596] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo

    • Operating system security updates.

  • October 18, 2022

    • Operating system security updates.

  • October 5, 2022

    • [SPARK-40468] Fix column pruning in CSV when _corrupt_record is selected.

    • Operating system security updates.

  • September 22, 2022

    • Users can set spark.conf.set(spark.databricks.io.listKeysWithPrefix.azure.enabled, true) to re-enable the built-in listing for Auto Loader on ADLS Gen2. Built-in listing was previously turned off due to performance issues but can have led to increased storage costs for customers.

    • [SPARK-40315] Add hashCode() for Literal of ArrayBasedMapData

    • [SPARK-40213] Support ASCII value conversion for Latin-1 characters

    • [SPARK-40380] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan

    • [SPARK-38404] Improve CTE resolution when a nested CTE references an outer CTE

    • [SPARK-40089] Fix sorting for some Decimal types

    • [SPARK-39887] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique

  • September 6, 2022

    • [SPARK-40235] Use interruptible lock instead of synchronized in Executor.updateDependencies().

    • [SPARK-40218] GROUPING SETS should preserve the grouping columns.

    • [SPARK-39976] ArrayIntersect should handle null in left expression correctly.

    • [SPARK-40053] Add assume to dynamic cancel cases which require Python runtime environment.

    • [SPARK-35542] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it.

    • [SPARK-40079] Add Imputer inputCols validation for empty input case.

  • August 24, 2022

    • [SPARK-39983] Do not cache unserialized broadcast relations on the driver.

    • [SPARK-39775] Disable validate default values when parsing Avro schemas.

    • [SPARK-39962] Apply projection when group attributes are empty

    • [SPARK-37643] when charVarcharAsString is true, for char datatype predicate query should skip rpadding rule.

    • Operating system security updates.

  • August 9, 2022

    • [SPARK-39847] Fix race condition in RocksDBLoader.loadLibrary() if the caller thread is interrupted

    • [SPARK-39731] Fix issue in CSV and JSON data sources when parsing dates in “yyyyMMdd” format with CORRECTED time parser policy

    • Operating system security updates.

  • July 27, 2022

    • [SPARK-39625] Add Dataset.as(StructType).

    • [SPARK-39689]Support 2-chars lineSep in CSV data source.

    • [SPARK-39104] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe.

    • [SPARK-39570] Inline table should allow expressions with alias.

    • [SPARK-39702] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel.

    • [SPARK-39575] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer.

    • [SPARK-39476] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float.

    • [SPARK-38868] Don’t propagate exceptions from filter predicate when optimizing outer joins.

    • Operating system security updates.

  • July 20, 2022

    • Make Delta MERGE operation results consistent when the source is non-deterministic.

    • [SPARK-39355] Single column uses quoted to construct UnresolvedAttribute.

    • [SPARK-39548] CreateView Command with a window clause query press a wrong window definition not found issue.

    • [SPARK-39419] Fix ArraySort to throw an exception when the comparator returns null.

    • Turned off Auto Loader’s use of built-in cloud APIs for directory listing on Azure.

    • Operating system security updates.

  • July 5, 2022

    • [SPARK-39376] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN

    • Operating system security updates.

  • June 15, 2022

    • [SPARK-39283] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator.

    • [SPARK-39285] Spark should not check field names when reading files.

    • [SPARK-34096] Improve performance for nth_value ignore nulls over offset window.

    • [SPARK-36718] Fix the isExtractOnly check in CollapseProject.

  • June 2, 2022

    • [SPARK-39093] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral.

    • [SPARK-38990] Avoid NullPointerException when evaluating date_trunc/trunc format as a bound reference.

    • Operating system security updates.

  • May 18, 2022

    • Fixes a potential built-in memory leak in Auto Loader.

    • [SPARK-38918] Nested column pruning should filter out attributes that do not belong to the current relation.

    • [SPARK-37593] Reduce default page size by LONG_ARRAY_OFFSET if G1GC and ON_HEAP are used.

    • [SPARK-39084] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion.

    • [SPARK-32268] Add ColumnPruning in injectBloomFilter.

    • [SPARK-38974] Filter registered functions with a given database name in list functions.

    • [SPARK-38931] Create root dfs directory for RocksDBFileManager with an unknown number of keys on 1st checkpoint.

    • Operating system security updates.

  • April 19, 2022

    • Upgraded Java AWS SDK from version 1.11.655 to 1.12.1899.

    • Fixed an issue with notebook-scoped libraries not working in batch streaming jobs.

    • [SPARK-38616] Keep track of SQL query text in Catalyst TreeNode

    • Operating system security updates.

  • April 6, 2022

    • The following Spark SQL functions are now available with this release:

      • timestampadd() and dateadd(): Add a time duration in a specified unit to a time stamp expression.

      • timestampdiff() and datediff(): Calculate the time difference between two-time stamp expressions in a specified unit.

    • Parquet-MR has been upgraded to 1.12.2

    • Improved support for comprehensive schemas in parquet files

    • [SPARK-38631] Uses Java-based implementation for un-tarring at Utils.unpack.

    • [SPARK-38509][SPARK-38481] Cherry-pick three timestmapadd/diff changes.

    • [SPARK-38523] Fix referring to the corrupt record column from CSV.

    • [SPARK-38237] Allow ClusteredDistribution to require full clustering keys.

    • [SPARK-38437] Lenient serialization of datetime from datasource.

    • [SPARK-38180] Allow safe up-cast expressions in correlated equality predicates.

    • [SPARK-38155] Disallow distinct aggregate in lateral subqueries with unsupported predicates.

    • Operating system security updates.

Databricks Runtime 9.1 LTS

See Databricks Runtime 9.1 LTS.

  • February 13, 2024

    • [SPARK-46861] Avoid Deadlock in DAGScheduler.

    • Operating system security updates.

  • January 31, 2024

    • Operating system security updates.

  • December 25, 2023

    • To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.

    • [SPARK-46058] Add separate flag for privateKeyPassword.

    • [SPARK-39440] Add a config to disable event timeline.

    • [SPARK-46132] Support key password for JKS keys for RPC SSL.

  • December 14, 2023

    • Operating system security updates.

  • November 29, 2023

    • Installed a new package, pyarrow-hotfix to remediate a PyArrow RCE vulnerability.

    • [SPARK-45859] Made UDF objects in ml.functions lazy.

    • [SPARK-45544] Integrated SSL support into TransportContext.

    • [SPARK-45730] Improved time constraints for ReloadingX509TrustManagerSuite.

    • Operating system security updates.

  • November 14, 2023

    • [SPARK-45545] SparkTransportConf inherits SSLOptions upon creation.

    • [SPARK-45429] Added helper classes for SSL RPC communication.

    • [SPARK-45427] Added RPC SSL settings to SSLOptions and SparkTransportConf.

    • [SPARK-45584] Fixed subquery run failure with TakeOrderedAndProjectExec.

    • [SPARK-45541] Added SSLFactory.

    • [SPARK-42205] Removed logging accumulables in Stage and Task start events.

    • Operating system security updates.

  • October 24, 2023

    • [SPARK-45426] Added support for ReloadingX509TrustManager.

    • Operating system security updates.

  • October 13, 2023

    • Operating system security updates.

  • September 10, 2023

    • Miscellaneous fixes.

  • August 30, 2023

    • Operating system security updates.

  • August 15, 2023

    • Operating system security updates.

  • June 23, 2023

    • Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.

    • Operating system security updates.

  • June 15, 2023

    • [SPARK-43098] Fix correctness COUNT bug when scalar subquery has a group by clause.

    • [SPARK-43156][SPARK-43098] Extend scalar subquery count bug test with decorrelateInnerQuery turned off.

    • [SPARK-40862] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery.

    • Operating system security updates.

  • June 2, 2023

    • The JSON parser in failOnUnknownFields mode drops a record in DROPMALFORMED mode and fails directly in FAILFAST mode.

    • Fixed an issue in JSON rescued data parsing to prevent UnknownFieldException.

    • Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.

    • [SPARK-37520] Add the startswith() and endswith() string functions

    • [SPARK-43413] Fixed IN subquery ListQuery nullability.

    • Operating system security updates.

  • May 17, 2023

    • Operating system security updates.

  • April 25, 2023

    • Operating system security updates.

  • April 11, 2023

    • Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.

    • [SPARK-42967] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is canceled.

  • March 29, 2023

    • Operating system security updates.

  • March 14, 2023

    • [SPARK-42484] Improved error message for UnsafeRowUtils.

    • Miscellaneous fixes.

  • February 28, 2023

    • Users can now read and write specific Delta tables requiring Reader version 3 and Writer version 7, using Databricks Runtime 9.1 LTS or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.

    • Operating system security updates.

  • February 16, 2023

    • Operating system security updates.

  • January 31, 2023

    • Table types of JDBC tables are now EXTERNAL by default.

  • January 18, 2023

    • Operating system security updates.

  • November 29, 2022

    • Fixed an issue with JSON parsing in Auto Loader when all columns were left as strings (cloudFiles.inferColumnTypes was not set or set to false) and the JSON contained nested objects.

    • Operating system security updates.

  • November 15, 2022

    • Upgraded Apache commons-text to 1.10.0.

    • Operating system security updates.

    • Miscellaneous fixes.

  • November 1, 2022

    • Fixed an issue where if a Delta table had a user-defined column named _change_type, but Change data feed was turned off on that table, data in that column would incorrectly fill with NULL values when running MERGE.

    • Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when allowOverwrites is enabled

    • [SPARK-40596] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo

    • Operating system security updates.

  • October 18, 2022

    • Operating system security updates.

  • October 5, 2022

    • Miscellaneous fixes.

    • Operating system security updates.

  • September 22, 2022

    • Users can set spark.conf.set(“spark.databricks.io.listKeysWithPrefix.azure.enabled”, “true”) to re-enable the built-in listing for Auto Loader on ADLS Gen2. Built-in listing was previously turned off due to performance issues but can have led to increased storage costs for customers.

    • [SPARK-40315] Add hashCode() for Literal of ArrayBasedMapData

    • [SPARK-40089] Fix sorting for some Decimal types

    • [SPARK-39887] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique

  • September 6, 2022

    • [SPARK-40235] Use interruptible lock instead of synchronized in Executor.updateDependencies()

    • [SPARK-35542] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it

    • [SPARK-40079] Add Imputer inputCols validation for empty input case

  • August 24, 2022

    • [SPARK-39666] Use UnsafeProjection.create to respect spark.sql.codegen.factoryMode in ExpressionEncoder

    • [SPARK-39962] Apply projection when group attributes are empty

    • Operating system security updates.

  • August 9, 2022

    • Operating system security updates.

  • July 27, 2022

    • Make Delta MERGE operation results consistent when the source is non-deterministic.

    • [SPARK-39689] Support for 2-chars lineSep in CSV data source

    • [SPARK-39575] Added ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer.

    • [SPARK-37392] Fixed the performance error for catalyst optimizer.

    • Operating system security updates.

  • July 13, 2022

    • [SPARK-39419] ArraySort throws an exception when the comparator returns null.

    • Turned off Auto Loader’s use of built-in cloud APIs for directory listing on Azure.

    • Operating system security updates.

  • July 5, 2022

    • Operating system security updates.

    • Miscellaneous fixes.

  • June 15, 2022

    • [SPARK-39283] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator.

  • June 2, 2022

    • [SPARK-34554] Implement the copy() method in ColumnarMap.

    • Operating system security updates.

  • May 18, 2022

    • Fixed a potential built-in memory leak in Auto Loader.

    • Upgrade AWS SDK version from 1.11.655 to 1.11.678.

    • [SPARK-38918] Nested column pruning should filter out attributes that do not belong to the current relation

    • [SPARK-39084] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion

    • Operating system security updates.

  • April 19, 2022

    • Operating system security updates.

    • Miscellaneous fixes.

  • April 6, 2022

    • [SPARK-38631] Uses Java-based implementation for un-tarring at Utils.unpack.

    • Operating system security updates.

  • March 22, 2022

    • Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user’s home directory. Previously, the active directory was /databricks/driver.

    • [SPARK-38437] Lenient serialization of datetime from datasource

    • [SPARK-38180] Allow safe up-cast expressions in correlated equality predicates

    • [SPARK-38155] Disallow distinct aggregate in lateral subqueries with unsupported predicates

    • [SPARK-27442] Removed a check field when reading or writing data in a parquet.

  • March 14, 2022

    • [SPARK-38236] Absolute file paths specified in the create/alter table are treated as relative

    • [SPARK-34069] Interrupt task thread if local property SPARK_JOB_INTERRUPT_ON_CANCEL is set to true.

  • February 23, 2022

    • [SPARK-37859] SQL tables created with JDBC with Spark 3.1 are not readable with Spark 3.2.

  • February 8, 2022

    • [SPARK-27442] Removed a check field when reading or writing data in a parquet.

    • Operating system security updates.

  • February 1, 2022

    • Operating system security updates.

  • January 26, 2022

    • Fixed an issue where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.

    • Fixed an issue where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.

  • January 19, 2022

    • Minor fixes and security enhancements.

    • Operating system security updates.

  • November 4, 2021

    • Fixed an issue that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException.

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: No FileSystem for scheme or that might cause modifications to sparkContext.hadoopConfiguration to not take effect in queries.

    • The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.

  • October 20, 2021

    • Upgraded BigQuery connector from 0.18.1 to 0.22.2. This adds support for the BigNumeric type.