cloud_files_state table-valued function
Applies to: Databricks SQL
Databricks Runtime 11.3 LTS and above
Returns the file-level state of an Auto Loader or read_files stream.
Syntax
cloud_files_state( { TABLE ( table_name ) | checkpoint } )
Arguments
- table_name: The identifier of the streaming table that's being written to by
read_files. The name must not include a temporal specification. Available in Databricks Runtime 13.3 LTS and above. checkpoint: ASTRINGliteral. The checkpoint directory for a stream using the Auto Loader source. See What is Auto Loader?.
Returns
Returns a table with the following schema:
-
path STRING NOT NULL PRIMARY KEYThe path of a file.
-
size BIGINT NOT NULLThe size of a file in bytes.
-
create_time TIMESTAMP NOT NULLThe time that a file was created.
-
discovery_time TIMESTAMP NOT NULLApplies to:
Databricks SQL
Databricks Runtime 16.4 and above
The time that a file was discovered.
-
processed_time TIMESTAMP NOT NULLApplies to:
Databricks SQL
Databricks Runtime 16.4 and above when
cloudFiles.cleanSourceis enabled. See Auto Loader options.The time that a file was processed. If a batch encounters a failure and is retried, a file might be processed multiple times. When retries happen, this field contains the most recent processing time.
-
commit_time TIMESTAMPApplies to:
Databricks SQL
Databricks Runtime 16.4 and above when
cloudFiles.cleanSourceis enabled. See Auto Loader options.The time that a file was committed to the checkpoint after processing.
NULLif the file is not yet processed. There is no guaranteed latency for marking a file as committed; a file might be processed but marked as committed arbitrarily later. Marking the file as committed means that Auto Loader does not require the file for processing again. -
archive_time TIMESTAMPApplies to:
Databricks SQL
Databricks Runtime 16.4 and above when
cloudFiles.cleanSourceis enabled. See Auto Loader options.The time that a file was archived.
NULLif the file has not been archived. -
archive_mode STRINGApplies to:
Databricks SQL
Databricks Runtime 16.4 and above when
cloudFiles.cleanSourceis enabled. See Auto Loader options.MOVEifcloudFiles.cleanSourcewas set toMOVEwhen the file was archived.DELETEifcloudFiles.cleanSourcewas set toDELETEwhen the file was archived.NULLifcloudFiles.cleanSourcewas set toOFF(default). -
move_location STRINGApplies to:
Databricks SQL
Databricks Runtime 16.4 and above when
cloudFiles.cleanSourceis enabled. See Auto Loader options.The full path of where the file was moved to during the archival operation when
cloudFiles.cleanSourcewas set toMOVE.NULLif the file has not been archived orcloudFiles.cleanSourceis one ofDELETEorOFF. -
source_id STRINGThe ID of the Auto Loader source in the streaming query. This value is
'0'for streams that ingest from a single cloud object store location. -
flow_name STRINGApplies to:
Databricks SQL
Databricks Runtime 13.3 and above
Represents a specific streaming flow in Lakeflow Declarative Pipelines that contains one or more cloud files sources. NULL if no table_name was given.
-
ingestion_state STRINGApplies to:
Databricks SQL
Databricks Runtime 16.4 and above when
cloudFiles.cleanSourceis enabled. See Auto Loader options.Whether the file has been ingested, indicated by one of the following states:
NULL: The file has not been processed yet, or the file state cannot be determined by Auto Loader.PROCESSING: The file is being processed.SKIPPED_CORRUPTED: The file was not ingested because it was corrupt.SKIPPED_MISSING: The file was not ingested because it was not found during processing.INGESTED: The file has been processed by the sink at least once. It may be processed again by non-idempotent sinks likeforeachBatchin case of failures in the stream. Only files with a non-nullcommit_timefield that are in theINGESTEDstate have completed processing.NOT_RECOGNIZED_BY_DBR: Reserved for version compatibility. This state will be displayed for states introduced in later Databricks Runtime versions that are unrecognized by earlier Databricks Runtime versions.
Permissions
You need to have:
- If using a streaming table identifier
- Databricks Runtime 17.1 and below:
OWNERprivileges on the streaming table - Databricks SQL and Databricks Runtime 17.2 and above:
SELECTandMODIFYprivileges on the streaming table
- Databricks Runtime 17.1 and below:
READ FILESprivileges on the checkpoint location if providing a checkpoint under an external location.
Examples
-- Simple example from checkpoint
> SELECT path FROM CLOUD_FILES_STATE('/some/checkpoint');
/some/input/path
/other/input/path
-- Simple example from source subdir
> SELECT path FROM CLOUD_FILES_STATE('/some/checkpoint/sources/0');
/some/input/path
/other/input/path
-- Simple example from streaming table
> SELECT path FROM CLOUD_FILES_STATE(TABLE(my_streaming_table));
/some/input/path
/other/input/path
Limitations
- Users reading a view that references
cloud_files_stateover a streaming table must have bothSELECTprivilege on the view and the required privileges on the streaming table.