WATERMARK clause

Applies to: check marked yes Databricks Runtime 12.0 and above

Adds a watermark to a relation in a select statement. The WATERMARK clause only applies to queries on stateful streaming data, which include stream-stream joins and aggregation.

Syntax

from_item
{ table_name [ TABLESAMPLE clause ] [ watermark_clause ] [ table_alias ] |
  JOIN clause |
  [ LATERAL ] table_valued_function [ table_alias ] |
  VALUE clause |
  [ LATERAL ] ( query ) [ TABLESAMPLE clause ] [ watermark_clause ] [ table_alias ] }

watermark_clause
  WATERMARK named_expression DELAY OF interval

Parameters

  • named_expression

    An expression that provides a value of type timestamp. The expression must be either a reference to the existing column, or a deterministic transformation against existing column(s). The expression adds a column of timestamp type which is used to track the watermark. The added column is available to query.

  • interval_clause

    An interval literal that defines the delay threshold of the watermark. Must be a positive value less than a month.

Examples

Assume a streaming relation is defined with the DataFrame API, and a temporary view named stream_relation is created from the relation.

-- define watermark in SELECT statement
> SELECT * FROM stream_relation WATERMARK to_timestamp(ts) AS event_time DELAY OF INTERVAL 10 SECONDS AS stream_with_watermark;