July 2018

These features and Databricks platform improvements were released in July 2018.

Note

Releases are staged. Your Databricks account may not be updated until a week after the initial release date.

Libraries API supports Python wheel files

July 31-August 7, 2018: Version 2.77

You can now install wheel libraries using the Libraries API. When you install a wheel library on a cluster running Databricks Runtime 4.2 or above, all of the dependencies specified in the library setup.py file are included. When you install a wheel library on a cluster running Databricks Runtime 4.1 or below, the file is added to the PYTHONPATH variable, without installing the dependencies.

IPython notebook export

July 31-August 7, 2018: Version 2.77

When you export a Databricks notebook to the IPython notebook format, results are now included in the export.

New instance types (beta)

July 17-24, 2018: Version 2.76

We’re pleased to announce the availability of the following AWS EC2 instance types on Databricks clusters:

  • c5.xlarge, c5.2xlarge, c5.4xlarge, c5.9xlarge

  • i3.16xlarge

  • m4.16xlarge

  • m5.large, m5.xlarge, m5.2xlarge, m5.4xlarge, m5.12xlarge

Warning

The new instance types may be unstable. Databricks does not recommend running production workloads on beta instance types.

Cluster mode and High Concurrency clusters

July 17-24, 2018: Version 2.76

When creating a cluster, the Cluster Type option has been renamed to Cluster Mode. The Serverless Pool option has been replaced by High Concurrency cluster mode. High Concurrency clusters are tuned to provide efficient resource utilization, isolation, security, and the best performance when shared by multiple concurrently active users. A High Concurrency cluster supports only SQL, Python, and R languages. High Concurrency clusters provide all the benefits of serverless pools while also allowing flexibility in Spark and resource configuration. For further information, see Compute configuration reference.

Table access control

July 17-24, 2018: Version 2.76

The Table Access Control checkbox is available only for High Concurrency clusters.

Table ACL checkbox

RStudio integration

July 3-10, 2018: Version 2.75

Databricks now integrates with RStudio Server, the popular IDE for R. With this powerful new integration, you can:

  • Launch the RStudio UI directly from Databricks.

  • Import SparkR and sparklyr packages inside the RStudio IDE.

  • Access, explore, and transform large datasets from RStudio IDE using Apache Spark.

  • Execute and monitor Spark jobs on a Databricks cluster.

  • Manage your code using version control.

  • Use either the Open Source or Pro editions of RStudio Server on Databricks.

RStudio integration requires the Premium plan or above. You must install the integration on a High Concurrency cluster. For details, see RStudio on Databricks.

R Markdown support

July 3-10, 2018: Version 2.75

Databricks R notebooks can be exported to R Markdown format, and R Markdown documents can be imported as Databricks notebooks.

Home page redesign, with ability to drop files to import data

July 3-10, 2018: Version 2.75

The new home page adds a cleaner, simpler interface, with links to an improved Getting Started tutorial and the ability to drag and drop files to import data. See Explore and create tables in DBFS.

Widget default behavior

July 3-10, 2018: Version 2.75

The default execution behavior when a new value is selected for a widget is now to Do Nothing. You must update the widget settings if you want to rerun a complete notebook or only value-related commands when you change a widget value. See Configure widget settings.

Table creation UI

July 3-10, 2018: Version 2.75

When you create a table in the UI, you now select Add Data from the Data page.

Add data

See Explore and create tables in DBFS.

Multi-line JSON data import

July 3-10, 2018: Version 2.75

You can now import multi-line JSON data files when you are creating tables. Previously, JSON data files had to be flattened to one line. See Explore and create tables in DBFS.