Skip to main content

DLT Python language reference

This section has details for the DLT Python programming interface.

dlt module overview

DLT Python functions are defined in the dlt module. Your pipelines implemented with the Python API must import this module:

Python
import dlt

Functions for dataset definitions

DLT uses Python decorator for defining datasets such as materialized views and streaming tables. See Functions to define datasets.

API reference

Considerations for Python DLT

The following are important considerations when you implement pipelines with the DLT Python interface:

  • DLT evaluates the code that defines a pipeline multiple times during planning and pipeline runs. Python functions that define datasets should include only the code required to define the table or view. Arbitrary Python logic included in dataset definitions might lead to unexpected behavior.
  • Do not try to implement custom monitoring logic in your dataset definitions. See Define custom monitoring of DLT pipelines with event hooks.
  • The function used to define a dataset must return a Spark DataFrame. Do not include logic in your dataset definitions that does not relate to a returned DataFrame.
  • Never use methods that save or write to files or tables as part of your DLT dataset code.

Examples of Apache Spark operations that should never be used in DLT code:

  • collect()
  • count()
  • toPandas()
  • save()
  • saveAsTable()
  • start()
  • toTable()