Catalog
User-facing catalog API, accessible through SparkSession.catalog. This is a thin wrapper around its Scala implementation org.apache.spark.sql.catalog.Catalog.
Syntax
# Access through SparkSession
spark.catalog
Methods
Method | Description |
|---|---|
Returns the current default catalog in this session. | |
Sets the current default catalog in this session. | |
Returns a list of catalogs in this session. | |
Returns the current default database in this session. | |
Sets the current default database in this session. | |
Returns a list of databases available across all sessions. | |
Gets the database with the specified name. Throws an AnalysisException when the database cannot be found. | |
Checks if the database with the specified name exists. | |
Returns a list of tables and views in the specified database. Includes all temporary views. | |
Gets the table or view with the specified name. Throws an AnalysisException when no table can be found. | |
Checks if the table or view with the specified name exists. | |
Returns a list of columns for the given table or view in the specified database. | |
Returns a list of functions registered in the specified database. Includes all temporary functions. | |
Checks if the function with the specified name exists. Includes temporary functions. | |
Gets the function with the specified name. Throws an AnalysisException when the function cannot be found. | |
| Creates a table based on the dataset in a data source and returns the associated DataFrame. |
Drops the local temporary view with the given name. Also uncaches the view if it was cached. | |
Drops the global temporary view with the given name. Also uncaches the view if it was cached. | |
Returns true if the table is currently cached in-memory. | |
Caches the specified table in-memory or with the given storage level. Defaults to MEMORY_AND_DISK. | |
Removes the specified table from the in-memory cache. | |
Removes all cached tables from the in-memory cache. | |
Invalidates and refreshes all cached data and metadata of the given table. | |
Recovers all the partitions of the given table and updates the catalog. Only works with partitioned tables. | |
Invalidates and refreshes all cached data and metadata for any DataFrame containing the given data source path. |
Examples
spark.catalog.currentDatabase()
'default'
spark.catalog.listDatabases()
[Database(name='default', catalog='spark_catalog', description='default database', ...)]
_ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
spark.catalog.tableExists("tbl1")
True
spark.catalog.cacheTable("tbl1")
spark.catalog.isCached("tbl1")
True
spark.catalog.uncacheTable("tbl1")
spark.catalog.isCached("tbl1")
False