TensorFlow is an open-source framework for machine learning created by Google. It supports deep-learning and general numerical computations on CPUs, GPUs, and clusters of GPUs. It is subject to the terms and conditions of the Apache 2.0 License.

In the sections below, we provide guidance on installing TensorFlow on Databricks and give an example of running TensorFlow programs. See Integrating Deep Learning Libraries with Apache Spark for an example of integrating a deep learning library with Spark.


This guide is not a comprehensive guide on TensorFlow. See the TensorFlow website.

Install TensorFlow

TensorFlow versions included in Databricks Runtime ML

Databricks Runtime ML includes TensorFlow and TensorBoard so you can use these libraries without installing any packages. Here are the Tensorflow versions included:

  • Databricks Runtime 5.5 ML: 1.13.1
  • Databricks Runtime 5.1 - 5.4 ML: 1.12.0
  • Databricks Runtime 5.0 ML: 1.10.0

Install TensorFlow on Databricks Runtime ML

Install TensorFlow 1.14 and 2.0 Beta on Databricks Runtime 5.5 ML

Install TensorFlow as a Databricks PyPI library. Specify it as following and replace <tensorflow version> with either 1.14.0 or 2.0.0-beta1:

  • CPU: tensorflow==<tensorflow version>
  • GPU: tensorflow-gpu==<tensorflow version>

Install TensorFlow 1.14 and 2.0 Beta on Databricks Runtime 5.4 ML

Databricks provides instructions for installing newer releases of TensorFlow on Databricks Runtime ML, so that you can try out the latest features in TensorFlow. Databricks recommends installing newer versions of TensorFlow on Databricks Runtime 5.4 ML using init scripts. Replace <tensorflow version> with either 1.14.0 or 2.0.0-beta1 below:

  • Init script for CPU clusters:

    set -e
    /databricks/python/bin/python -V
    . /databricks/conda/etc/profile.d/conda.sh
    conda activate /databricks/python
    conda install -y -c anaconda wrapt=1.11.1
    pip install tensorflow==<tensorflow version>
  • Init script for GPU clusters:

    set -e
    /databricks/python/bin/python -V
    . /databricks/conda/etc/profile.d/conda.sh
    conda activate /databricks/python
    conda uninstall -y cudnn
    DEBIAN_FRONTEND=noninteractive sudo apt-get install -yq --no-install-recommends \
            libcudnn7=  \
    conda install -y -c anaconda wrapt=1.11.1 cudatoolkit=10.0
    pip install tensorflow-gpu==<tensorflow version>

Install TensorFlow on Databricks Runtime

Install TensorFlow on Databricks Runtime using Databricks Library Utilities by executing the command below in a notebook cell. Replace <tensorflow version> with either 1.14.0 or 2.0.0-beta1 below:

  • CPU: dbutils.library.installPyPI("tensorflow", version="<tensorflow version>" )
  • GPU: dbutils.library.installPyPI("tensorflow-gpu", version="<tensorflow version>")

Due to package dependencies, there might be compatibility issues.


TensorBoard is TensorFlow’s suite of visualization tools for debugging, optimizing, and understanding TensorFlow programs.

Using TensorBoard

To start TensorBoard from your notebook, use the dbutils.tensorboard utility.


This command displays a link that, when clicked, opens TensorBoard in a new tab.


TensorBoard reads from the same log directory that you write to in TensorFlow (for example, tf.summary.FileWriter("/tmp/tensorflow_log_dir", graph=sess.graph)). For the best performance, we recommend you use a local directory on the driver, for example, /tmp/tensorflow_log_dir, to store your log files and copy to persistent storage as needed.

TensorBoard continues to run until you either stop it with dbutils.tensorboard.stop() or you shut down your cluster. Only one instance of TensorBoard can run on a cluster at a time.


If you attach TensorFlow to your cluster as a Databricks library, you may need to reattach your notebook before starting TensorBoard.

Set up TensorBoard on Databricks Runtime 3.3 to 4.2


  • We recommend using TensorBoard with Databricks Runtime 4.3 or above, which do not require the setup steps described below.
  • Databricks Runtime 4.2 and lower (including Databricks Runtime 4.1 ML (Beta)) do not support TensorBoard for Spark clusters that have disabled public IP addresses and Community Edition accounts.

To run TensorBoard on your Databricks cluster using Databricks Runtime 3.3 to 4.2, you must update the Databricks security group in your AWS account to give ingress access to incoming TensorBoard connections. You will need to specify which IP addresses are allowed to connect to TensorBoard. You can give access to an individual IP address or provide a range that represents your entire office IP range. You or your admin only need to complete this step once. To set it up:

  1. In your AWS console, find the Databricks security group. It will have a label similar to <databricks-instance>-worker-unmanaged. For example, dbc-fb3asdddd3-worker-unmanaged.

  2. Edit the security group and add an inbound TCP rule to allow port 6006 to worker machines. It can be a single IP address of your machine or a range. Make sure your laptop and office allows sending TCP traffic on port 6006.

    TensorBoard Security Group
  3. Click Save.


    Anyone with an allowed IP address will be able to access TensorBoard.

Use TensorFlow on a single node

To test and migrate single-machine TensorFlow workflows, you can start with a driver-only cluster on Databricks by setting the number of workers to zero. Though Apache Spark is not functional under this setting, it is a cost-effective way to run single-machine TensorFlow workflows. This example shows how you can run TensorFlow, with TensorBoard monitoring on a driver-only cluster.

Spark-TensorFlow data conversion

spark-tensorflow-connector is a library within the TensorFlow ecosystem that enables conversion between Spark DataFrames and TFRecords (a popular format for storing data for TensorFlow). With spark-tensorflow-connector, you can use Spark DataFrame APIs to read TFRecords files into DataFrames and write DataFrames as TFRecords.



The spark-tensorflow-connector library is included in Databricks Runtime ML, a machine learning runtime that provides a ready-to-go environment for machine learning and data science. Instead of installing the library using the instructions below, you can simply create a cluster using Databricks Runtime ML. See Overview of Databricks Runtime for Machine Learning.

To use spark-tensorflow-connector on Databricks, you’ll need to build the project JAR locally, upload it to Databricks, and attach it to your cluster as a library.

  1. Ensure you have Maven in your PATH (see the Maven installation instructions if needed).

  2. Clone the TensorFlow ecosystem repository and cd into the spark-tensorflow-connector subdirectory:

    git clone https://github.com/tensorflow/ecosystem
    cd ecosystem/spark/spark-tensorflow-connector
  3. Follow the instructions in the README to build the project locally. For the build to succeed, you may need to modify the test configuration so that tests run serially. You can do this by adding a <configuration> tag to the scalatest plugin in ecosystem/spark/spark-tensorflow-connector/pom.xml:


    The build command prints the path of the spark-tensorflow-connector JAR, for example:

    Installing /Users/<yourusername>/ecosystem/spark/spark-tensorflow-connector/target/spark-tensorflow-connector_2.11-1.6.0.jar
    to /Users/<yourusername>/.m2/repository/org/tensorflow/spark-tensorflow-connector_2.11/1.6.0/spark-tensorflow-connector_2.11-1.6.0.jar
  4. Upload this JAR to Databricks as a library and attach it to your cluster. You should now be able to run the example notebook (adapted from the spark-tensorflow-connector usage examples):