Skip to main content

Catalog

User-facing catalog API, accessible through SparkSession.catalog. This is a thin wrapper around its Scala implementation org.apache.spark.sql.catalog.Catalog.

Syntax

Python
# Access through SparkSession
spark.catalog

Methods

Method

Description

currentCatalog()

Returns the current default catalog in this session.

setCurrentCatalog(catalogName)

Sets the current default catalog in this session.

listCatalogs(pattern)

Returns a list of catalogs in this session.

currentDatabase()

Returns the current default database in this session.

setCurrentDatabase(dbName)

Sets the current default database in this session.

listDatabases(pattern)

Returns a list of databases available across all sessions.

getDatabase(dbName)

Gets the database with the specified name. Throws an AnalysisException when the database cannot be found.

databaseExists(dbName)

Checks if the database with the specified name exists.

listTables(dbName, pattern)

Returns a list of tables and views in the specified database. Includes all temporary views.

getTable(tableName)

Gets the table or view with the specified name. Throws an AnalysisException when no table can be found.

tableExists(tableName, dbName)

Checks if the table or view with the specified name exists.

listColumns(tableName, dbName)

Returns a list of columns for the given table or view in the specified database.

listFunctions(dbName, pattern)

Returns a list of functions registered in the specified database. Includes all temporary functions.

functionExists(functionName, dbName)

Checks if the function with the specified name exists. Includes temporary functions.

getFunction(functionName)

Gets the function with the specified name. Throws an AnalysisException when the function cannot be found.

createTable(tableName, path, source, schema, description, **options)

Creates a table based on the dataset in a data source and returns the associated DataFrame.

dropTempView(viewName)

Drops the local temporary view with the given name. Also uncaches the view if it was cached.

dropGlobalTempView(viewName)

Drops the global temporary view with the given name. Also uncaches the view if it was cached.

isCached(tableName)

Returns true if the table is currently cached in-memory.

cacheTable(tableName, storageLevel)

Caches the specified table in-memory or with the given storage level. Defaults to MEMORY_AND_DISK.

uncacheTable(tableName)

Removes the specified table from the in-memory cache.

clearCache()

Removes all cached tables from the in-memory cache.

refreshTable(tableName)

Invalidates and refreshes all cached data and metadata of the given table.

recoverPartitions(tableName)

Recovers all the partitions of the given table and updates the catalog. Only works with partitioned tables.

refreshByPath(path)

Invalidates and refreshes all cached data and metadata for any DataFrame containing the given data source path.

Examples

Python
spark.catalog.currentDatabase()
Output
'default'
Python
spark.catalog.listDatabases()
Output
[Database(name='default', catalog='spark_catalog', description='default database', ...)]
Python
_ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
spark.catalog.tableExists("tbl1")
Output
True
Python
spark.catalog.cacheTable("tbl1")
spark.catalog.isCached("tbl1")
Output
True
Python
spark.catalog.uncacheTable("tbl1")
spark.catalog.isCached("tbl1")
Output
False