Skip to main content

Manage Python dependencies for Lakeflow Declarative Pipelines

Lakeflow Declarative Pipelines supports external dependencies in your pipelines. Databricks recommends using one of two patterns to install Python packages:

  1. Use the Environment settings to add packages to the pipeline environment for all source files in a pipeline.
  2. Import modules or libraries from source code stored in workspace files. See Import Python modules from Git folders or workspace files.

Lakeflow Declarative Pipelines also supports using global and cluster-scoped init scripts. However, these external dependencies, particularly init scripts, increase the risk of issues with runtime upgrades. To mitigate these risks, minimize using init scripts in your pipelines. If your processing requires init scripts, automate testing of your pipeline to detect problems early. If you use init scripts, Databricks recommends increasing your testing frequency.

important

Because JVM libraries are not supported in Lakeflow Declarative Pipelines, do not use an init script to install JVM libraries. However, You can install other library types, such as Python libraries, with an init script.

Python libraries

To specify external Python libraries, edit the environment for your pipeline.

  1. From the pipeline editor, click Settings.
  2. Under Pipeline environment, select Pencil icon. Edit environment.
  3. Click Plus icon. Add dependency.
  4. Type the name of the dependency. Databricks recommends pinning the version of the library. For example, to add a dependency on simplejson version 3.19, type simplejson==3.19.*.

You can also install a Python wheel package from a Unity Catalog volume, by specifying its path, such as /Volumes/my_catalog/my_schema/my_dlt_volume/dltfns-1.0-py3-none-any.whl.

Can I use Scala or Java libraries in Lakeflow Declarative Pipelines?

No, Lakeflow Declarative Pipelines supports only SQL and Python. You cannot use JVM libraries in a pipeline. Installing JVM libraries will cause unpredictable behavior, and may break with future Lakeflow Declarative Pipelines releases. If your pipeline uses an init script, you must also ensure that JVM libraries are not installed by the script.