cloud_files_state table-valued function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 11.3 and above

Returns the file-level state of an Auto Loader or read_files stream.

Syntax

cloud_files_state( { TABLE ( table_name ) | checkpoint } )

Arguments

  • table_name: The identifier of the streaming table that’s being written to by read_files. The name must not include a temporal specification. Available in Databricks Runtime 13.1 and above.

  • checkpoint: A STRING literal. The checkpoint directory for a stream using the Auto Loader source. See What is Auto Loader?.

Returns

Returns a table with the following schema:

  • path STRING NOT NULL PRIMARY KEY

    The path of a file.

  • size BIGINT NOT NULL

    The size of a file in bytes.

  • create_time TIMESTAMP NOT NULL

    The time that a file was created.

  • discovery_time TIMESTAMP NOT NULL

    Preview

    This feature is in Private Preview. To try it, reach out to your Databricks contact.

    The time that a file was discovered.

  • commit_time TIMESTAMP

    Preview

    This feature is in Private Preview. To try it, reach out to your Databricks contact.

    The time that a file was committed to the checkpoint after processing. NULL if the file is not yet processed. A file might be processed, but might be marked as committed arbitrarily later. Marking the file as committed means that Auto Loader does not require the file for processing again.

  • archive_time TIMESTAMP

    Preview

    This feature is in Private Preview. To try it, reach out to your Databricks contact.

    The time that a file was archived. NULL if the file has not been archived.

  • source_id STRING

    The ID of the Auto Loader source in the streaming query. This value is '0' for streams that ingest from a single cloud object store location.

Permissions

You need to have:

  • OWNER privileges on the streaming table if using a streaming table identifier.

    • READ FILES privileges on the checkpoint location if providing a checkpoint under an external location.

Examples

- Simple example from checkpoint
> SELECT path FROM CLOUD_FILES_STATE('/some/checkpoint');
  /some/input/path
  /other/input/path

- Simple example from source subdir
> SELECT path FROM CLOUD_FILES_STATE('/some/checkpoint/sources/0');
  /some/input/path
  /other/input/path