CacheΒΆ

CACHE SELECT column_name[, column_name, ...] FROM [db_name.]table_name [ WHERE boolean_expression ]

Cache the data accessed by the specified simple SELECT query in the Databricks IO Cache. It is possible to choose a subset of columns to be cached by providing a list of column names, and to choose a subset of rows by providing a predicate. Subsequent queries accessing the cached data will bypass scanning the original files as much as possible.

See the DBIO cache documentation page to find out about the differences between the RDD cache and the Databricks IO cache.

Examples:

CACHE SELECT * FROM boxes
CACHE SELECT width, length FROM boxes WHERE height=3