Compute access mode limitations

Databricks recommends using Unity Catalog and shared access mode for most workloads. Limitations apply to all access mode configurations. This article outlines various limitations for each access mode, including additional limitations added by Unity Catalog.

Databricks recommends using compute policies to simplify configuration options for most users. See Create and manage compute policies.

Note

No-isolation shared is a legacy access mode that does not support Unity Catalog.

Important

Init scripts and libraries have different support across access modes and Databricks Runtime versions. See Compute compatibility with libraries and init scripts.

Cluster limitations for Unity Catalog

  • On Databricks Runtime 13.2 and below, Scala is supported only on clusters that use single user access mode. To use Scala on a cluster that uses shared access mode, the cluster must be on Databricks Runtime 13.3 or above.

  • Workloads that use Databricks Runtime for Machine Learning are supported only on clusters that use single user access mode.

  • R is supported only on clusters that use single user access mode.

  • Spark-submit jobs are supported on single user access but not shared clusters. See Access modes.

Single user access mode limitations on Unity Catalog

Single user access mode on Unity Catalog has the following limtiations:

  • To read from a view, you must have SELECT on all referenced tables and views.

  • Dynamic views are not supported.

  • Cannot use a single user cluster to access a table that has a row filter or column mask.

  • When used with credential passthrough, Unity Catalog features are disabled.

  • Cannot use a single user cluster to query tables created by a Unity Catalog-enabled Delta Live Tables pipeline, including streaming tables and materialized views created in Databricks SQL. To query tables created by a Delta Live Tables pipeline, you must use a shared cluster using Databricks Runtime 13.1 and above.

Shared access mode limitations on Unity Catalog

Shared access mode on Unity Catalog has the following limitations:

  • When used with credential passthrough, Unity Catalog features are disabled.

  • Custom containers are not supported.

  • Spark-submit jobs are not supported.

  • Databricks Runtime ML and Spark ML is not supported.

  • Cannot use R, RDD APIs, or clients that directly read the data from cloud storage, such as DBUtils.

  • Spark Context (sc) and sqlContext and not supported for Scala in any Databricks Runtime and are not supported in Python in Databricks Runtime 14.0 and above.

    • Databricks recommends using the spark variable to interact with the SparkSession instance.

    • The following sc functions are also not supported: emptyRDD, range, init_batched_serializer, parallelize, pickleFile, textFile, wholeTextFiles, binaryFiles, binaryRecords, sequenceFile, newAPIHadoopFile, newAPIHadoopRDD, hadoopFile, hadoopRDD, union, runJob, setSystemProperty, uiWebUrl, stop, setJobGroup, setLocalProperty, getConf.

  • Can use Scala only on Databricks Runtime 13.3 and above. Due to user isolation, Scala code cannot access the Spark Driver JVM internal state nor access system files. Additionally, SparkContext and SQLContext classes and their methods are not available.

  • Instance profiles cannot be used to configure access to streaming sources.

  • User-defined functions (UDFs) have the following limitations:

    • Cannot use Hive UDFs.

    • In Databricks Runtime 14.2 and above, you can use Scala scalar UDFs. Other Scala UDFs and UDAFs are not supported.

    • In Databricks Runtime 13.2 and above, Python scalar UDFs and Pandas UDFs are supported.

    • In Databricks Runtime 13.1 and below, you cannot use Python UDFs, including UDAFs, UDTFs, and Pandas on Spark.

    See User-defined functions (UDFs) in Unity Catalog.

  • applyInPandas and mapInPandas are not supported.

  • Must run commands on cluster nodes as a low-privilege user forbidden from accessing sensitive parts of the filesystem. In Databricks Runtime 11.3 and below, you can only create network connections to ports 80 and 443.

  • Cannot connect to the instance metadata service (IMDS), other EC2 instances, or any other services running in the Databricks VPC. This prevents access to any service that uses the IMDS, such as boto3 and the AWS CLI.

Attempts to get around these restrictions will fail. These restrictions are in place so that users can’t access unprivileged data through the cluster.

Structured Streaming limitations on Unity Catalog-enabled compute

Support for Structured Streaming on Unity Catalog tables depends on the Databricks Runtime version that you are running and on whether you are using shared or single user access mode.

Shared limitations

Support for shared clusters requires Databricks Runtime 12.2 LTS and above, with the following limitations:

  • Python only.

  • Apache Spark continuous processing mode is not supported. See Continuous Processing in the Spark Structured Streaming Programming Guide.

  • applyInPandasWithState is not supported.

  • Working with socket sources is not supported.

  • StreamingQueryListener cannot use credentials or interact with objects managed by Unity Catalog.

  • The sourceArchiveDir must be in the same external location as the source when you use option("cleanSource", "archive") with a data source managed by Unity Catalog.

  • For Kafka sources and sinks, the following options are unsupported:

    • kafka.sasl.client.callback.handler.class

    • kafka.sasl.login.callback.handler.class

    • kafka.sasl.login.class

    • kafka.partition.assignment.strategy

  • The following Kafka options are supported in Databricks Runtime 13.0 but unsupported in Databricks Runtime 12.2 LTS. You can only specify external locations managed by Unity Catalog for these options:

    • kafka.ssl.truststore.location

    • kafka.ssl.keystore.location

  • You cannot use instance profiles to configure access to external sources such as Kafka or Kinesis for streaming workloads in shared access mode.

Single user limitations

Support for Single User access is available on Databricks Runtime 11.3 LTS and above, with the following limitations:

  • Apache Spark continuous processing mode is not supported. See Continuous Processing in the Spark Structured Streaming Programming Guide.

  • StreamingQueryListener cannot use credentials or interact with objects managed by Unity Catalog.

  • Asynchronous checkpointing is not supported in Databricks Runtime 11.3 LTS and below. It is supported in Databricks Runtime 12.0 and above.

See also Using Unity Catalog with Structured Streaming.