DataFrameWriterV2 class
Interface used to write a DataFrame to external storage using the v2 API.
Supports Spark Connect
Syntax
Use DataFrame.writeTo(table) to access this interface.
Methods
Method | Description |
|---|---|
Specifies a provider for the underlying output data source. | |
Add a write option. | |
Add write options. | |
Add table property. | |
Partition the output table created by create, createOrReplace, or replace using the given columns or transforms. | |
| Clusters the data by the given columns to optimize query performance. |
Create a new table from the contents of the data frame. | |
Replace an existing table with the contents of the data frame. | |
Create a new table or replace an existing table with the contents of the data frame. | |
Append the contents of the data frame to the output table. | |
Overwrite rows matching the given filter condition with the contents of the data frame in the output table. | |
Overwrite all partition for which the data frame contains at least one row with the contents of the data frame in the output table. |
Examples
Creating a new table
# Create a new table with DataFrame contents
df = spark.createDataFrame([{"name": "Alice", "age": 30}])
df.writeTo("my_table").create()
# Create with a specific provider
df.writeTo("my_table").using("parquet").create()
Partitioning data
# Partition by single column
df.writeTo("my_table") \
.partitionedBy("year") \
.create()
# Partition by multiple columns
df.writeTo("my_table") \
.partitionedBy("year", "month") \
.create()
# Partition using transform functions
from pyspark.sql.functions import years, months, days
df.writeTo("my_table") \
.partitionedBy(years("date"), months("date")) \
.create()
Setting table properties
# Add table properties
df.writeTo("my_table") \
.tableProperty("key1", "value1") \
.tableProperty("key2", "value2") \
.create()
Using options
# Add write options
df.writeTo("my_table") \
.option("compression", "snappy") \
.option("maxRecordsPerFile", "10000") \
.create()
# Add multiple options at once
df.writeTo("my_table") \
.options(compression="snappy", maxRecordsPerFile="10000") \
.create()
Clustering data
# Cluster by columns for query optimization
df.writeTo("my_table") \
.clusterBy("user_id", "timestamp") \
.create()
Replace operations
# Replace existing table
df.writeTo("my_table") \
.using("parquet") \
.replace()
# Create or replace (safe operation)
df.writeTo("my_table") \
.using("parquet") \
.createOrReplace()
Append operations
# Append to existing table
df.writeTo("my_table").append()
Overwrite operations
from pyspark.sql.functions import col
# Overwrite specific rows based on condition
df.writeTo("my_table") \
.overwrite(col("date") == "2025-01-01")
# Overwrite entire partitions
df.writeTo("my_table") \
.overwritePartitions()
Method chaining
# Combine multiple configurations
df.writeTo("my_table") \
.using("parquet") \
.option("compression", "snappy") \
.tableProperty("description", "User data table") \
.partitionedBy("year", "month") \
.clusterBy("user_id") \
.createOrReplace()
Differences from DataFrameWriter
DataFrameWriterV2 is the newer v2 API that provides:
- Better table property support
- More fine-grained control over partitioning
- Conditional overwrite capabilities
- Support for clustering
- Clearer semantics for create/replace operations
For most use cases with Databricks tables and Delta Lake, DataFrameWriterV2 provides more powerful and flexible options than the original DataFrameWriter.