Expectations
This page contains Python reference documentation for pipeline expectations.
Expectation decorators declare data quality constraints on materialized views, streaming tables, or temporary views created in a pipeline.
The dp module includes six decorators to control expectations behavior. The following table describes the dimensions on which these permutations differ:
Behavior | Options |
|---|---|
Action on violation |
|
Number of expectations | A single expectation or multiple expectations. |
You can add multiple expectation decorators to your datasets, providing flexibility in strictness for your data quality constraints.
When you use expect_all decorators, each expectation has its own description and reports granular metrics.
Syntax
Expectation decorators come after a @dp.table(), @dp.materialized_view or @dp.temporary_view() decorator and before a dataset definition function, as in the following example:
from pyspark import pipelines as dp
@dp.table()
@dp.expect(description, constraint)
@dp.expect_or_drop(description, constraint)
@dp.expect_or_fail(description, constraint)
@dp.expect_all({description: constraint, ...})
@dp.expect_all_or_drop({description: constraint, ...})
@dp.expect_all_or_fail({description: constraint, ...})
def <function-name>():
return (<query>)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| Required. A description that identifies the constraint. Constraint descriptions must be unique for each dataset. |
|
| Required. The constraint clause is a SQL conditional statement that must evaluate to |
The expect_all decorators require descriptions and constraints to be passed as a dict of key-value pairs.