In addition to using notebooks or the file editor in your Databricks workspace to implement pipeline code that uses the Delta Live Tables Python interface, you can also develop your code in your local development environment. For example, using your favorite integrated development environment (IDE) such as Visual Studio Code or PyCharm. After writing your pipeline code locally, you can manually move it into your Databricks workspace or use Databricks tools to operationalize your pipeline, including deploying and running the pipeline.
This article describes the tools and methods available to develop your Python pipelines locally and deploy those pipelines to your Databricks workspace. Links to articles that provide more details on using these tools and methods are also provided.
Databricks provides a Python module you can install in your local environment to assist with the development of code for your Delta Live Tables pipelines. This module has the interfaces and docstring references for the Delta Live Tables Python interface, providing syntax checking, autocomplete, and data type checking as you write code in your IDE.
This module includes interfaces but no functional implementations. You cannot use this library to create or run a Delta Live Tables pipeline locally. Instead, use one of the methods described below to deploy your code.
The Python module for local development is available on PyPI. For installation and usage instructions, see Python stub for Delta Live Tables.
After implementing your Delta Live Tables pipeline code, Databricks recommends using Databricks Asset Bundles to operationalize the code. Databricks Asset Bundles provide CI/CD capabilities to your pipeline development lifecycle, including validation of the pipeline artifacts, packaging of all pipeline artifacts such as source code and configuration, deployment of the code to your Databricks workspace, and starting pipeline updates.
To learn how to create a bundle to manage your pipeline code using Databricks Asset Bundles, see Develop a Delta Live Tables pipeline by using Databricks Asset Bundles.
If you use the Visual Studio Code IDE for development, you can use the Python module to develop your code and then use the Databricks extension for Visual Studio Code to sync your code directly from Visual Studio Code to your workspace. See What is the Databricks extension for Visual Studio Code?.
To learn how to create a pipeline using the code you synced to your workspace using the Databricks extension for Visual Studio Code, see Import Python modules from Databricks repos or workspace files.
Instead of creating a bundle using Databricks Asset Bundles or using the Databricks extension for Visual Studio Code, you can sync your code to your Databricks workspace and use that code to create a pipeline inside the workspace. This can be particularly useful during development and testing stages when you want to iterate on code quickly. Databricks supports several methods to move code from your local environment to your workspace.
To learn how to create a pipeline using the code you synced to your workspace using one of the below methods, see Import Python modules from Databricks repos or workspace files.
Workspace files: You can use Databricks workspace files to upload your pipeline source code to your Databricks workspace and then import that code into a pipeline. To learn how to use workspace files, see What are workspace files?.
Databricks Repos: To facilitate collaboration and version control, Databricks recommends using Databricks Repos to sync code between your local environment and your Databricks workspace. Databricks Repos integrates with your Git provider, allowing you to push code from your local environment and then import that code into a pipeline in your workspace. To learn how to use Databricks Repos, see Git integration with Databricks Repos.
Manually copy your code: You can copy the code from your local environment, paste the code into a Databricks notebook, and use the Delta Live Tables UI to create a new pipeline with the notebook. To learn how to create a pipeline in the UI, see Tutorial: Run your first Delta Live Tables pipeline.
If you prefer to write scripts to manage your pipelines, Databricks has a REST API, the Databricks command line interface (CLI), and software development kits (SDKs) for popular programming languages. You can also use the
databricks_pipeline Resource in the Databricks Terraform provider.
To learn how to use the REST API, see Delta Live Tables in the Databricks REST API Reference.
To learn how to use the Databricks CLI, see What is the Databricks CLI?.
To learn how to use Databricks SDKs for other languages, see Use SDKs with Databricks.