Skip to main content

from_csv

Parses a column containing a CSV string into a row with the specified schema. Returns null if the string cannot be parsed.

Syntax

Python
from pyspark.sql import functions as sf

sf.from_csv(col, schema, options=None)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

A column or column name in CSV format.

schema

pyspark.sql.Column or str

A column, or Python string literal with schema in DDL format, to use when parsing the CSV column.

options

dict, optional

Options to control parsing. Accepts the same options as the CSV datasource.

Returns

pyspark.sql.Column: A column of parsed CSV values.

Examples

Example 1: Parsing a simple CSV string

Python
from pyspark.sql import functions as sf
data = [("1,2,3",)]
df = spark.createDataFrame(data, ("value",))
df.select(sf.from_csv(df.value, "a INT, b INT, c INT")).show()
Output
+---------------+
|from_csv(value)|
+---------------+
| {1, 2, 3}|
+---------------+

Example 2: Using schema_of_csv to infer the schema

Python
from pyspark.sql import functions as sf
data = [("1,2,3",)]
value = data[0][0]
df.select(sf.from_csv(df.value, sf.schema_of_csv(value))).show()
Output
+---------------+
|from_csv(value)|
+---------------+
| {1, 2, 3}|
+---------------+

Example 3: Ignoring leading white space in the CSV string

Python
from pyspark.sql import functions as sf
data = [(" abc",)]
df = spark.createDataFrame(data, ("value",))
options = {'ignoreLeadingWhiteSpace': True}
df.select(sf.from_csv(df.value, "s string", options)).show()
Output
+---------------+
|from_csv(value)|
+---------------+
| {abc}|
+---------------+

Example 4: Parsing a CSV string with a missing value

Python
from pyspark.sql import functions as sf
data = [("1,2,",)]
df = spark.createDataFrame(data, ("value",))
df.select(sf.from_csv(df.value, "a INT, b INT, c INT")).show()
Output
+---------------+
|from_csv(value)|
+---------------+
| {1, 2, NULL}|
+---------------+

Example 5: Parsing a CSV string with a different delimiter

Python
from pyspark.sql import functions as sf
data = [("1;2;3",)]
df = spark.createDataFrame(data, ("value",))
options = {'delimiter': ';'}
df.select(sf.from_csv(df.value, "a INT, b INT, c INT", options)).show()
Output
+---------------+
|from_csv(value)|
+---------------+
| {1, 2, 3}|
+---------------+