PySpark reference
This page provides an overview of reference available for PySpark, a Python API for Spark. For more information about PySpark, see PySpark on Databricks.
Data types
For a complete list of PySpark data types, see PySpark data types.
Classes
Reference | Description |
|---|---|
Support for reading and writing data in Apache Avro format. | |
Interface for managing databases, tables, functions, and other catalog metadata. | |
Operations for working with DataFrame columns, including transformations and expressions. | |
Available data types in PySpark SQL, including primitive types, complex types, and user-defined types. | |
Distributed collection of data organized into named columns, similar to a table in a relational database. | |
Functionality for working with missing data in a DataFrame. | |
Interface used to load a DataFrame from external storage systems. | |
Functionality for statistical functions with a DataFrame. | |
Interface used to write a DataFrame to external storage systems. | |
Interface used to write a DataFrame to external storage (version 2). | |
APIs for implementing custom data sources to read from external systems. For information about custom data sources, see PySpark custom data sources. | |
A base class for data source writers that process data using PyArrow's | |
A wrapper for data source registration. | |
A base class for data source readers. | |
A base class for data stream writers that process data using PyArrow's | |
A base class for streaming data source readers. | |
A base class for data stream writers. | |
Methods for grouping data and performing aggregation operations on grouped DataFrames. | |
Collects metrics and observes DataFrames during query execution for monitoring and debugging. | |
Accessor for DataFrame plotting functionality in PySpark. | |
Support for serializing and deserializing data using Protocol Buffers format. | |
Represents a row of data in a DataFrame, providing access to individual field values. | |
Runtime configuration options for Spark SQL, including execution and optimizer settings. For information on configuration that is only available on Databricks, see Set Spark configuration properties on Databricks. | |
The entry point for reading data and executing SQL queries in PySpark applications. | |
Manages state across streaming batches for complex stateful operations in structured streaming. | |
User-defined functions for applying custom Python logic to DataFrame columns. | |
Wrapper for user-defined function registration. This instance can be accessed by | |
User-defined table functions that return multiple rows for each input row. | |
Wrapper for user-defined table function registration. This instance can be accessed by | |
Represents semi-structured data with flexible schema, which supports dynamic types and nested structures. | |
Window functions for performing calculations across a set of table rows related to the current row. | |
Window functions for performing calculations across a set of table rows related to the current row. |
Functions
For a complete list of available built-in functions, see PySpark functions.