to
Returns a new DataFrame where each row is reconciled to match the specified schema.
Syntax
to(schema: StructType)
Parameters
Parameter | Type | Description |
|---|---|---|
| StructType | Specified schema. |
Returns
DataFrame: Reconciled DataFrame.
Notes
- Reorder columns and/or inner fields by name to match the specified schema.
- Project away columns and/or inner fields that are not needed by the specified schema. Missing columns and/or inner fields (present in the specified schema but not input DataFrame) lead to failures.
- Cast the columns and/or inner fields to match the data types in the specified schema, if the types are compatible, e.g., numeric to numeric (error if overflows), but not string to int.
- Carry over the metadata from the specified schema, while the columns and/or inner fields still keep their own metadata if not overwritten by the specified schema.
- Fail if the nullability is not compatible. For example, the column and/or inner field is nullable but the specified schema requires them to be not nullable.
Supports Spark Connect.
Examples
Python
from pyspark.sql.types import StructField, StringType
df = spark.createDataFrame([("a", 1)], ["i", "j"])
df.schema
# StructType([StructField('i', StringType(), True), StructField('j', LongType(), True)])
schema = StructType([StructField("j", StringType()), StructField("i", StringType())])
df2 = df.to(schema)
df2.schema
# StructType([StructField('j', StringType(), True), StructField('i', StringType(), True)])
df2.show()
# +---+---+
# | j| i|
# +---+---+
# | 1| a|
# +---+---+