Databricks Connect for Python tutorial

This article demonstrates how to quickly get started with Databricks Connect by using Python and PyCharm. For the Scala version of this tutorial, see the Databricks Connect for Scala tutorial.

Databricks Connect enables you to connect popular IDEs such as PyCharm, notebook servers, and other custom applications to Databricks clusters.


This article covers Databricks Connect for Databricks Runtime 13.0 and above.

For information beyond this tutorial about Databricks Connect for Databricks Runtime 13.0 and above, see the Databricks Connect reference.

For information about Databricks Connect for prior Databricks Runtime versions, see Databricks Connect for Databricks Runtime 12.2 LTS and below.


  • You have PyCharm installed.

  • You have a Databricks workspace and its corresponding account that are enabled for Unity Catalog. See Get started using Unity Catalog and Enable a workspace for Unity Catalog.

  • You have a Databricks cluster in the workspace. The cluster has Databricks Runtime 13.0 or higher installed. The cluster also has a cluster access mode of assigned or shared. See Access modes.

  • You have Python 3 installed on your development machine, and the minor version of your client Python installation is the same as the minor Python version of your Databricks cluster. The following table shows the Python version installed with each Databricks Runtime.

    Databricks Runtime version

    Python version

    13.0 ML - 13.3 ML LTS, 13.0 - 13.3 LTS


To complete this tutorial, follow these steps:

Step 1: Create a personal access token

This tutorial uses Databricks personal access token authentication and a Databricks configuration profile for authenticating with your Databricks workspace. If you already have a Databricks personal access token and a matching Databricks configuration profile, skip ahead to Step 3.

To create a personal access token:

  1. In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.

  2. Click Developer.

  3. Next to Access tokens, click Manage.

  4. Click Generate new token.

  5. (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).

  6. Click Generate.

  7. Copy the displayed token to a secure location, and then click Done.

    Be sure to save the copied token in a secure location. Do not share your copied token with others. If you lose the copied token, you cannot regenerate that exact same token. Instead, you must repeat this procedure to create a new token. If you lose the copied token, or you believe that the token has been compromised, Databricks strongly recommends that you immediately delete that token from your workspace by clicking the X next to the token on the Access tokens page.


    If you are not able to create or use tokens in your workspace, this might be because your workspace administrator has disabled tokens or has not given you permission to create or use tokens. See your workspace administrator or the following:

Step 2: Create an authentication configuration profile

Create a Databricks authentication configuration profile to store necessary information about your personal access token on your local machine. Databricks developer tools and SDKs can use this configuration profile to quickly authenticate with your Databricks workspace.

To create a profile:

  1. Create a file named .databrickscfg in the root of your user’s home directory on your machine, if this file does not already exist. For Linux and macOS, the path is ~/.databrickscfg. For Windows, the path is %USERPROFILE%\.databrickscfg.

  2. Use a text editor to add the following configuration profile to this file and then save the file:

    host = <my-workspace-url>
    token = <my-personal-access-token-value>
    cluster_id = <my-cluster-id>

    Replace the following placeholders:

    For example:

    host =
    token = dapi...
    cluster_id = abc123...


    The preceding fields host and token are for Databricks personal access token authentication, which is the most common type of Databricks authentication. Some Databricks developer tools and SDKs also use the cluster_id field in some scenarios. For other supported Databricks authentication types and scenarios, see your tool’s or SDK’s documentation or Databricks client unified authentication.

Step 3: Create the project

  1. Start PyCharm.

  2. Click File > New Project.

  3. For Location, click the folder icon, and complete the on-screen directions to specify the path to your new Python project.

  4. Expand Python interpreter: New environment.

  5. Click the New environment using option.

  6. In the drop-down list, select Virtualenv.

  7. Leave Location with the suggested path to the venv folder.

  8. For Base interpreter, use the drop-down list or click the ellipses to specify the path to the Python interpreter from the preceding requirements.

  9. Click Create.

Step 4: Add the Databricks Connect package

  1. On PyCharm’s main menu, click View > Tool Windows > Python Packages.

  2. In the search box, enter databricks-connect.

  3. In the PyPI repository list, click databricks-connect.

  4. In the result pane’s latest drop-down list, select the version that matches your cluster’s Databricks Runtime version. For example, if your cluster has Databricks Runtime 13.2 installed, select 13.2.0.

  5. Click Install.

  6. After the package installs, you can close the Python Packages window.

Step 5: Add code

  1. In the Project tool window, right-click the project’s root folder, and click New > Python File.

  2. Enter and click Python file.

  3. Enter the following code into the file and then save the file:

    from databricks.connect import DatabricksSession
    spark = DatabricksSession.builder.getOrCreate()
    df ="samples.nyctaxi.trips")

Step 6: Run the code

  1. Start the target cluster in your remote Databricks workspace.

  2. After the cluster has started, on the main menu, click Run > Run. If prompted, select main > Run.

  3. In the Run tool window (View > Tool Windows > Run), in the Run tab’s main pane, the first 5 rows of the samples.nyctaxi.trips appear.

Step 7: Debug the code

  1. With the cluster still running, in the preceding code, click the gutter next to to set a breakpoint.

  2. On the main menu, click Run > Debug. If prompted, select main > Debug.

  3. In the Debug tool window (View > Tool Windows > Debug), in the Debugger tab’s Variables pane, expand the df and spark variable nodes to browse information about the code’s df and spark variables.

  4. In the Debug tool window’s sidebar, click the green arrow (Resume Program) icon.

  5. In the Debugger tab’s Console pane, the first 5 rows of the samples.nyctaxi.trips appear.

Next steps

To learn more about Databricks Connect and experiment with a more complex code example, see the Databricks Connect reference. This reference article includes guidance for the following topics:

  • Supported Databricks authentication types in addition to Databricks personal access token authentication.

  • How to use SparkShell, and use IDEs in addition to PyCharm such as JupyterLab, classic Jupyter Notebook, Visual Studio Code, and Eclipse with PyDev.

  • Migrate from Databricks Connect for Databricks Runtime 12.2 LTS and below to Databricks Connect for Databricks Runtime 13.0 and above.

  • How to use Databricks Connect to access Databricks Utilities.

  • Provides troubleshooters.

  • Lists the limitations of Databricks Connect.