SparkSession
The entry point to programming Spark with the Dataset and DataFrame API. A SparkSession can be used to create DataFrames, register DataFrames as tables, execute SQL over tables, cache tables, and read parquet files.
Syntax
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
Properties
Property | Description |
|---|---|
The version of Spark on which this application is running. | |
Runtime configuration interface for Spark. | |
Interface through which the user may create, drop, alter or query underlying databases, tables, functions, etc. | |
Returns a UDFRegistration for UDF registration. | |
Returns a UDTFRegistration for UDTF registration. | |
Returns a DataSourceRegistration for data source registration. | |
Returns a Profile for performance/memory profiling. | |
Returns the underlying SparkContext. Classic mode only. | |
Returns a DataFrameReader that can be used to read data as a DataFrame. | |
Returns a DataStreamReader that can be used to read data streams as a streaming DataFrame. | |
Returns a StreamingQueryManager that allows managing all active streaming queries. | |
Returns a TableValuedFunction for calling table-valued functions (TVFs). |
Methods
Method | Description |
|---|---|
Creates a DataFrame from an RDD, a list, a pandas DataFrame, a numpy ndarray, or a pyarrow Table. | |
Returns a DataFrame representing the result of the given query. | |
Returns the specified table as a DataFrame. | |
Creates a DataFrame with a single LongType column named | |
Returns a new SparkSession with separate SQLConf, registered temporary views, and UDFs, but shared SparkContext and table cache. Classic mode only. | |
Returns the active SparkSession for the current thread. | |
Returns the active or default SparkSession for the current thread. | |
Stops the underlying SparkContext. | |
Adds artifact(s) to the client session. | |
Interrupts all operations of this session currently running on the server. | |
Interrupts all operations of this session with the given tag. | |
Interrupts an operation of this session with the given operationId. | |
Adds a tag to be assigned to all operations started by this thread in this session. | |
Removes a tag previously added for operations started by this thread. | |
Gets the tags currently set to be assigned to all operations started by this thread. | |
Clears the current thread's operation tags. |
Builder
Method | Description |
|---|---|
| Sets a config option. Options are automatically propagated to both SparkConf and SparkSession's own configuration. |
| Sets the Spark master URL to connect to. |
| Sets the Spark remote URL to connect via Spark Connect. |
| Sets a name for the application, which will be shown in the Spark web UI. |
| Enables Hive support, including connectivity to a persistent Hive metastore. |
| Gets an existing SparkSession or, if there is no existing one, creates a new one based on the options set in this builder. |
| Creates a new SparkSession. |
Examples
spark = (
SparkSession.builder
.master("local")
.appName("Word Count")
.config("spark.some.config.option", "some-value")
.getOrCreate()
)
spark.sql("SELECT * FROM range(10) where id > 7").show()
+---+
| id|
+---+
| 8|
| 9|
+---+
spark.createDataFrame([('Alice', 1)], ['name', 'age']).show()
+-----+---+
| name|age|
+-----+---+
|Alice| 1|
+-----+---+
spark.range(1, 7, 2).show()
+---+
| id|
+---+
| 1|
| 3|
| 5|
+---+