Skip to main content

DataFrameWriterV2 class

Interface used to write a DataFrame to external storage using the v2 API.

Supports Spark Connect

Syntax

Use DataFrame.writeTo(table) to access this interface.

Methods

Method

Description

using(provider)

Specifies a provider for the underlying output data source.

option(key, value)

Add a write option.

options(**options)

Add write options.

tableProperty(property, value)

Add table property.

partitionedBy(col, *cols)

Partition the output table created by create, createOrReplace, or replace using the given columns or transforms.

clusterBy(col, *cols)

Clusters the data by the given columns to optimize query performance.

create()

Create a new table from the contents of the data frame.

replace()

Replace an existing table with the contents of the data frame.

createOrReplace()

Create a new table or replace an existing table with the contents of the data frame.

append()

Append the contents of the data frame to the output table.

overwrite(condition)

Overwrite rows matching the given filter condition with the contents of the data frame in the output table.

overwritePartitions()

Overwrite all partition for which the data frame contains at least one row with the contents of the data frame in the output table.

Examples

Creating a new table

Python
# Create a new table with DataFrame contents
df = spark.createDataFrame([{"name": "Alice", "age": 30}])
df.writeTo("my_table").create()

# Create with a specific provider
df.writeTo("my_table").using("parquet").create()

Partitioning data

Python
# Partition by single column
df.writeTo("my_table") \
.partitionedBy("year") \
.create()

# Partition by multiple columns
df.writeTo("my_table") \
.partitionedBy("year", "month") \
.create()

# Partition using transform functions
from pyspark.sql.functions import years, months, days

df.writeTo("my_table") \
.partitionedBy(years("date"), months("date")) \
.create()

Setting table properties

Python
# Add table properties
df.writeTo("my_table") \
.tableProperty("key1", "value1") \
.tableProperty("key2", "value2") \
.create()

Using options

Python
# Add write options
df.writeTo("my_table") \
.option("compression", "snappy") \
.option("maxRecordsPerFile", "10000") \
.create()

# Add multiple options at once
df.writeTo("my_table") \
.options(compression="snappy", maxRecordsPerFile="10000") \
.create()

Clustering data

Python
# Cluster by columns for query optimization
df.writeTo("my_table") \
.clusterBy("user_id", "timestamp") \
.create()

Replace operations

Python
# Replace existing table
df.writeTo("my_table") \
.using("parquet") \
.replace()

# Create or replace (safe operation)
df.writeTo("my_table") \
.using("parquet") \
.createOrReplace()

Append operations

Python
# Append to existing table
df.writeTo("my_table").append()

Overwrite operations

Python
from pyspark.sql.functions import col

# Overwrite specific rows based on condition
df.writeTo("my_table") \
.overwrite(col("date") == "2025-01-01")

# Overwrite entire partitions
df.writeTo("my_table") \
.overwritePartitions()

Method chaining

Python
# Combine multiple configurations
df.writeTo("my_table") \
.using("parquet") \
.option("compression", "snappy") \
.tableProperty("description", "User data table") \
.partitionedBy("year", "month") \
.clusterBy("user_id") \
.createOrReplace()

Differences from DataFrameWriter

DataFrameWriterV2 is the newer v2 API that provides:

  • Better table property support
  • More fine-grained control over partitioning
  • Conditional overwrite capabilities
  • Support for clustering
  • Clearer semantics for create/replace operations

For most use cases with Databricks tables and Delta Lake, DataFrameWriterV2 provides more powerful and flexible options than the original DataFrameWriter.