SQL Analytics release notes

Preview

This feature is in Public Preview. Contact your Databricks representative to request access.

May 13, 2021

  • SQL Analytics no longer tries to guess column types. Previously, a column with the format xxxx-yy-dd was automatically treated as a date, even if it was an identification code. Now that column is no longer automatically treated as a date. You must specify that in the query if so desired. This change may cause some visualizations that relied on the previous behavior to no longer work. In this release, you can change the User Settings Icon > Settings > Backwards Compatibility option to return to the previous behavior. In a future release we will remove that capability.

  • The query editor now has a query progress indicator. State changes are now visible in a continually updated progress bar.

    Progress bar

May 6, 2021

  • You can now download the contents of the dashboard as a PDF. See Download as PDF.
  • An admin user now has view access to all the queries and dashboards. In this view an admin can view and delete any query or dashboard. However, the admin can’t edit the query or dashboard if it is not shared with the admin. See Query admin view and Dashboard admin view.
  • The ability to increase endpoint concurrency with multi-cluster load balancing is now available for all accounts. You can create endpoints that autoscale between specified minimum and maximum cluster counts. Overloaded endpoints will scale up and underloaded endpoints will scale down.

April 29, 2021

  • Query options and details are now organized in a set of tabs to the left of the query editor:

April 22, 2021

  • Fixed an issue in which endpoints were inaccessible and appeared to be deleted due to internal error.

April 16, 2021

SQL Analytics maintains compatibility with Apache Spark SQL semantics. This release updates the semantics to match those of Apache Spark 3.1. Previously SQL Analytics was aligned to Apache Spark 3.0 semantics.

  • Statistical aggregation functions, including std, stddev, stddev_samp, variance, var_samp, skewness, kurtosis, covar_samp, and corr, return NULL instead of Double.NaN when DivideByZero occurs during expression evaluation, for example, when stddev_samp applied on a single element set. Prior to this release, it would return Double.NaN.
  • grouping_id() returns long values. Prior to this release, this function returned int values.
  • The query plan explain results is now formatted.
  • from_unixtime, unix_timestamp,to_unix_timestamp, to_timestamp, and to_date will fail if the specified datetime pattern is invalid. Prior to this release, they returned NULL.
  • The Parquet, ORC, Avro, and JSON data sources throw the exception org.apache.spark.sql.AnalysisException: “Found duplicate column(s) in the data schema in read if they detect duplicate names in top-level columns as well in nested structures.”
  • Structs and maps are wrapped by the {} brackets in casting them to strings. For instance, the show() action and the CAST expression use such brackets.Prior to this release, the d brackets were used for the same purpose.
  • NULL elements of structures, arrays and maps are converted to “null” in casting them to strings. Prior to this release, NULL elements were converted to empty strings.
  • The sum of decimal type column overflows returns null. Prior to this release, in the case, the sum of decimal type column may return null or incorrect result, or even fails at runtime (depending on the actual query plan execution).
  • IllegalArgumentException is returned for the incomplete interval literals, for example, INTERVAL '1', INTERVAL '1 DAY 2', which are invalid. Prior to this release, these literals result in NULLs.
  • Loading and saving of timestamps from and to Parquet files fails if the timestamps are before 1900-01-01 00:00:00Z, and loaded (saved) as the INT96 type. Prior to this release, the actions don’t fail but might lead to shifting of the input timestamps due to rebasing from/to Julian to/from Proleptic Gregorian calendar.
  • The schema_of_json and schema_of_csv functions return the schema in the SQL format in which field names are quoted. Prior to this release, the function returns a catalog string without field quoting and in lower case.
  • CHAR,CHARACTER, and VARCHAR types are supported in table schema. Table scan and insertion respects the char/varchar semantic. If char/varchar is used in places other than table schema, an exception is thrown (CAST is an exception that simply treats char/varchar as string like before).
  • The following exceptions are thrown for tables from Hive external catalog:
    • ALTER TABLE .. ADD PARTITION throws PartitionsAlreadyExistException if new partition exists already.
    • ALTER TABLE .. DROP PARTITION throws NoSuchPartitionsException for not existing partitions.

April 13, 2021

  • Improved query throughput with SQL endpoint queuing. Queries submitted to a SQL endpoint now queue when the endpoint is already saturated with running queries. This improves query throughput by not overloading the endpoint with requests. You can view the improved performance in the endpoint monitoring screen.

April 01, 2021

  • Quickly find the time spent in compilation, execution, and result fetching for a query in Query History. See View query time. Previously this information was only available by clicking a query and opening the Execution Details tab.
  • SQL endpoints no longer scale beyond the maximum specified clusters. All clusters allocated to a SQL endpoint are recycled after 24 hours, which can create a brief window in which there is one additional cluster.

March 18, 2021

  • Autocomplete in the query editor now supports Databricks SQL syntax and is context and alias aware. See Construct a query.
  • JDBC and ODBC requests no longer fail with invalid session errors after the session times out on the server. BI clients are now able to seamlessly recover when session timeouts occur.

March 11, 2021

  • Administrators and users in workspaces newly enabled for SQL Analytics no longer automatically have access to SQL Analytics. To enable access to SQL Analytics, the administrator must:
    1. Go to the admin console.
    2. Click the Users tab.
    3. In the row for their account, click the SQL Analytics access checkbox.
    4. Click Confirm.
    5. Repeat steps 3 and 4 to grant users access to SQL Analytics or follow the instructions in Grant a group access to SQL Analytics to grant access to groups.
  • Easily create queries, dashboards, and alerts by selecting Create Icon > [Query | Dashboard | Alert] at the top of the sidebar.
  • Query Editor now saves drafts, and you can revert to a saved query. See Revert to a saved query.
  • You can no longer create external data sources.
  • The reliability of the SQL endpoint monitoring chart has been improved. The chart no longer intermittently shows spurious error messages.

March 04, 2021

  • The Queries and Dashboards API documentation is now available. See Queries and Dashboards API.
  • Scheduled dashboard refreshes are now always performed. The refreshes are performed in the web application, so you no longer need to keep the dashboard open in a browser. See Automatically refresh a dashboard.
  • New SQL endpoints created using the SQL Endpoints API now have Auto Stop enabled with a default timeout of two hours.
  • Tableau Online users can now connect to SQL endpoints. See the new Tableau Online quickstart.
  • SQL endpoints no longer fail to launch due to inadequate AWS resources in a single availability zone.

February 26, 2021

The new Power BI connector for Azure Databricks, released in public preview in September 2020, is now GA. It provides:

  • Simple connection configuration: the new Power BI Databricks connector is integrated into Power BI, and you configure it using a simple dialog with a couple of clicks.
  • Faster imports and optimized metadata calls, thanks to the new Databricks ODBC driver, which comes with significant performance improvements.
  • Access to Databricks data through Power BI respects Databricks table access control.

For more information, see Power BI.

February 25, 2021

  • Setting permissions on a SQL endpoint is now faster. It’s a step right after you create a new SQL endpoint and easily accessible when you edit an existing endpoint. See Create a SQL endpoint and Edit endpoint permissions.
  • To reuse visualization settings you can now duplicate a visualization. See Duplicate a visualization.
  • Query results are now stored in your account instead of the Databricks account.
  • To prevent leaking information by listing all defined permissions on an object, to run SHOW GRANTS [<user>] <object> you must be either:
    • A SQL Analytics administrator or the owner of <object>.
    • The user specified in [<user>].

January 07, 2021

  • To reduce spending on idle endpoints, new SQL endpoints now have Auto Stop enabled with a default timeout of two hours. After the timeout is reached, the endpoint is stopped. You can edit the timeout period or disable Auto Stop at any time.
  • Except for TEXT type query parameters, quotation marks are no longer added to query parameters. If you have used Dropdown List, Query Based Dropdown List, or any Date type query parameters, you must add quotation marks in order for the query to work. For example, if your query is SELECT {{ d }}, now this query must be SELECT '{{ d }}'.

November 18, 2020

Databricks is pleased to introduce the Public Preview of SQL Analytics, an intuitive environment for running ad-hoc queries and creating dashboards on data stored in your data lake. SQL Analytics empowers your organization to operate a multi-cloud lakehouse architecture that provides data warehousing performance with data lake economics while providing a delightful SQL analytics user experience. SQL Analytics:

  • Integrates with the BI tools you use today, like Tableau and Microsoft Power BI, to query the most complete and recent data in your data lake.
  • Complements existing BI tools with a SQL-native interface that allows data analysts and data scientists to query data lake data directly within Databricks.
  • Enables you to share query insights through rich visualizations and drag-and-drop dashboards with automatic alerting for important data changes.
  • Uses SQL endpoints to bring reliability, quality, scale, security, and performance to your data lake, so you can run traditional analytics workloads using your most recent and complete data.
  • Introduces the USAGE privilege to simplify data access administration. In order to use an object in a database, you must be granted the USAGE privilege on that database in addition to any privileges you need to perform the action. The USAGE privilege can be granted to databases or to the catalog. For workspaces that already use table access control, the USAGE privilege is granted automatically to the users group on the root CATALOG. See Data access control for details.

See the Databricks SQL Analytics guide for details. Contact your Databricks representative to request access.

Known issues

April 16, 2020

  • To use an external metastore you must set spark.hadoop.javax.jdo.option.ConnectionDriverName in the data access configuration.
  • You cannot use a MySQL 8.0 external metastore. Queries fail with Auto reconnect is not allowed for this connection.
  • DESCRIBE DETAIL commands on Delta tables fail with java.lang.ClassCastException: java.sql.Timestamp cannot be cast to java.time.Instant.
  • Reading Parquet files with INT96 timestamps fails. INT96 timestamps can be present in Parquet files written by Apache Spark 2.x.

November 18, 2020

  • Reads from data sources other than Delta Lake in multi-cluster load balanced SQL endpoints can be inconsistent.
  • Photon is disabled for writes (for example, CREATE TABLE AS SELECT).
  • When a user has Can Run permission on a query and runs it, if the query was created by another user, the query history displays the creator of the query—not the runner of the query—as the user.
  • Delta tables accessed in SQL Analytics upload their schema and table properties to the configured metastore. If you are using an external metastore, you will be able to see Delta Lake information in the metastore. Delta Lake tries to keep this information as up-to-date as possible on a best-effort basis. You can also use the DESCRIBE <table> command to ensure that the information is updated in your metastore.

Frequently asked questions (FAQ)

How are SQL Analytics workloads charged?

SQL Analytics workloads are charged according to the SQL Compute (Preview) SKU.

Is the SQL Analytics Public Preview supported in production?

Yes, it is fully supported. See Databricks preview releases.

Where do SQL endpoints run?

Like Databricks Workspace clusters, SQL endpoints are created and managed in your AWS account. SQL endpoints manage SQL-optimized clusters automatically in your account and scale to match end-user demand.

Can I use SQL endpoints from Databricks Workspace SQL notebooks?

No. You can use SQL endpoints from SQL Analytics queries, BI tools, and other JDBC/ODBC clients.

I have been granted access to data using a cloud provider credential. Why can’t I access this data in SQL Analytics?

In SQL Analytics, all access to data is subject to data access control, and an administrator or data owner must first grant you the appropriate privileges.