This article describes how you can use relative paths to import custom Python and R modules stored in workspace files alongside your Databricks notebooks. Workspace files can facilitate tighter development lifecycles, allowing you to modularize your code, convert %run commands to import statements, and refactor Python wheels to co-versioned modules. You can also use the built-in Databricks web terminal to test your code.
In Databricks Runtime 14.0 and above, the the default current working directory (CWD) for code executed locally is the directory containing the notebook or script being run. This is a change in behavior from Databricks Runtime 13.3 LTS and below. See What is the default current working directory in Databricks Runtime 14.0 and above?.
In Databricks Runtime 13.0 and above, directories added to the Python
sys.path are automatically distributed to all executors in the cluster. In Databricks Runtime 12.2 LTS and below, libraries added to the
sys.path must be explicitly installed on executors.
In Databricks Runtime 11.2 and above, the current working directory of your notebook is automatically added to the Python path. If you’re using Repos, the root repo directory and all subdirectories are added.
To import modules from another directory, you must add the directory containing the module to
sys.path. For example:
import sys sys.path.append("/Workspace/Users/<user-name>/<repo-name>") # to use a relative path import sys import os sys.path.append(os.path.abspath('..'))
You import functions from a module stored in workspace files just as you would from a module saved as a cluster library or notebook-scoped library:
from sample import power power.powerOfTwo(3)
When you use an
import statement, Databricks follows a set precedence if multiple libraries of the same name exist. See Python library precedence.
While developing Python code, if you are editing multiple files, you can use the following commands in any cell to force a reload of all modules.
%load_ext autoreload %autoreload 2
A best practice for code development is to modularize code so it can be easily reused. You can create custom Python files with workspace files and make the code in those files available to a notebook using the
To refactor notebook code into reusable files:
Create a new source code file for your code.
Add Python import statements to the notebook to make the code in your new file available to the notebook.
If you are using
%run commands to make Python or R functions defined in a notebook available to another notebook, or are installing custom
.whl files on a cluster, consider including those custom modules as workspace files. In this way, you can keep your notebooks and other code modules in sync, ensuring that your notebook always uses the correct version.
%run commands let you include one notebook within another and are often used to make supporting Python or R code available to a notebook. In this example, a notebook named
power.py includes the code below.
# This code is in a notebook named "power.py". def n_to_mth(n,m): print(n, "to the", m, "th power is", n**m)
You can then make functions defined in
power.py available to a different notebook with a
# This notebook uses a %run command to access the code in "power.py". %run ./power n_to_mth(3, 4)
Using workspace files, you can directly import the module that contains the Python code and run the function.
from power import n_to_mth n_to_mth(3, 4)
You can install custom
.whl files onto a cluster and then import them into a notebook attached to that cluster. For code that is frequently updated, this process might be cumbersome and error-prone. Workspace files lets you keep these Python files in the same directory with the notebooks that use the code, ensuring that your notebook always uses the correct version.
For more information about packaging Python projects, see this tutorial.
You can use Databricks web terminal to test modifications to your Python or R code without having to import the file to a notebook and execute the notebook.
Open web terminal.
Change to the directory:
Run the Python or R file: