Skip to main content

schema (DataSource)

Returns the schema of the data source.

It can refer to any field initialized in the __init__ method to infer the data source's schema when users do not explicitly specify it. This method is invoked once when calling spark.read.format(...).load() to get the schema for a data source read operation. If this method is not implemented, and a user does not provide a schema when reading the data source, an exception will be thrown.

Syntax

schema()

Returns

StructType or str

The schema of this data source or a DDL string representing the schema.

Examples

Returns a DDL string:

Python
def schema(self):
return "a INT, b STRING"

Returns a StructType:

Python
def schema(self):
return StructType().add("a", "int").add("b", "string")