CACHE TABLE

Applies to: check marked yes Databricks Runtime

Caches contents of a table or output of a query with the given storage level in Apache Spark cache. If a query is cached, then a temp view is created for this query. This reduces scanning of the original files in future queries.

Syntax

CACHE [ LAZY ] TABLE table_name
  [ OPTIONS ( 'storageLevel' [ = ] value ) ] [ [ AS ] query ]

See Disk cache vs. Spark cache for the differences between disk caching and the Apache Spark cache.

Parameters

  • LAZY

    Only cache the table when it is first used, instead of immediately.

  • table_name

    Identifies the Delta table or view to cache. The name must not include a temporal specification. If the table cannot be found Databricks raises a TABLE_OR_VIEW_NOT_FOUND error.

  • OPTIONS ( ‘storageLevel’ [ = ] value )

    OPTIONS clause with storageLevel key and value pair. A warning is issued when a key other than storageLevel is used. The valid options for storageLevel are:

    • NONE

      • DISK_ONLY

      • DISK_ONLY_2

      • MEMORY_ONLY

      • MEMORY_ONLY_2

      • MEMORY_ONLY_SER

      • MEMORY_ONLY_SER_2

      • MEMORY_AND_DISK

      • MEMORY_AND_DISK_2

      • MEMORY_AND_DISK_SER

      • MEMORY_AND_DISK_SER_2

      • OFF_HEAP

    An Exception is thrown when an invalid value is set for storageLevel. If storageLevel is not explicitly set using OPTIONS clause, the default storageLevel is set to MEMORY_AND_DISK.

  • query

    A query that produces the rows to be cached. It can be in one of following formats:

    • A SELECT statement

    • A TABLE statement

    • A FROM statement

Examples

> CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') SELECT * FROM testData;