Cache Select (Delta Lake on Databricks)

Important

This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See CACHE SELECT.

CACHE SELECT column_name[, column_name, ...] FROM [db_name.]table_name [ WHERE boolean_expression ]

Cache the data accessed by the specified simple SELECT query in the disk cache. You can choose a subset of columns to be cached by providing a list of column names and choose a subset of rows by providing a predicate. This enables subsequent queries to avoid scanning the original files as much as possible. This construct is applicable only to Parquet tables. Views are also supported, but the expanded queries are restricted to the simple queries, as described above.

See Automatic and manual caching for the differences between the RDD cache and the Databricks IO cache.

Examples

CACHE SELECT * FROM boxes
CACHE SELECT width, length FROM boxes WHERE height=3