Work with Python and R modules

This article describes how you can use relative paths to import custom Python and R modules stored alongside your Databricks notebooks. You must have Files in Repos enabled and use Databricks Runtime 8.4 or above.

Files in Repos can facilitate tighter development lifecycles, allowing you to modularize your code, convert %run commands to import statements, and refactor Python wheels to co-versioned modules. You can also use the built-in Databricks web terminal to test your code.

Import Python and R modules

The current working directory of your repo and notebook are automatically added to the Python path. When you work in the repo root, you can import modules from the root directory and all subdirectories.

To import modules from another repo, you must add that repo to sys.path. For example:

import sys
sys.path.append("/Workspace/Repos/<user-name>/<repo-name>")

# to use a relative path
import sys
import os
sys.path.append(os.path.abspath('..'))

You import functions from a module in a repo just as you would from a module saved as a cluster library or notebook-scoped library:

from sample import power
power.powerOfTwo(3)
source("sample.R")
power.powerOfTwo(3)

Important

When you use an import statement in a notebook in a repo, the library in the repo takes precedence over a library or wheel with the same name that is installed on the cluster.

Example notebook for working with non-notebook files in Repos

This notebook shows examples of working with arbitrary files in Databricks Repos.

Arbitrary Files in Repos example notebook

Open notebook in new tab

Autoreload for Python modules

While developing Python code, if you are editing multiple files, you can use the following commands in any cell to force a reload of all modules.

%load_ext autoreload
%autoreload 2

Refactor code

A best practice for code development is to modularize code so it can be easily reused. You can create custom Python files in a repo and make the code in those files available to a notebook using the import statement. See the example notebook.

To refactor notebook code into reusable files:

  1. From the Repos UI, create a new branch.

  2. Create a new source code file for your code.

  3. Add Python import statements to the notebook to make the code in your new file available to the notebook.

  4. Commit and push your changes to your Git provider.

Migrate from %run commands

If you are using %run commands to make Python or R functions defined in a notebook available to another notebook, or are installing custom .whl files on a cluster, consider including those custom modules in a Databricks repo. In this way, you can keep your notebooks and other code modules in sync, ensuring that your notebook always uses the correct version.

%run commands let you include one notebook within another and are often used to make supporting Python or R code available to a notebook. In this example, a notebook named power.py includes the code below.

# This code is in a notebook named "power.py".
def n_to_mth(n,m):
  print(n, "to the", m, "th power is", n**m)

You can then make functions defined in power.py available to a different notebook with a %run command:

# This notebook uses a %run command to access the code in "power.py".
%run ./power
n_to_mth(3, 4)

Using Files in Repos, you can directly import the module that contains the Python code and run the function.

from power import n_to_mth
n_to_mth(3, 4)

Refactor Python .whl files to relative libraries

You can install custom .whl files onto a cluster and then import them into a notebook attached to that cluster. For code that is frequently updated, this process is cumbersome and error-prone. Files in Repos lets you keep these Python files in the same repo with the notebooks that use the code, ensuring that your notebook always uses the correct version.

For more information about packaging Python projects, see this tutorial.

Use Databricks web terminal for testing

You can use Databricks web terminal to test modifications to your Python or R code without having to import the file to a notebook and execute the notebook.

  1. Open web terminal.

  2. Change to the Repo directory: cd /Workspace/Repos/<path_to_repo>/.

  3. Run the Python or R file: python file_name.py or Rscript file_name.r.