Skip to main content

DataSourceRegistration

A wrapper for data source registration.

This instance can be accessed via spark.dataSource. Use it to register a custom DataSource subclass so it can be referenced by name in spark.read.format() and df.write.format().

Syntax

Python
spark.dataSource.register(MyDataSource)

Methods

Method

Description

register(dataSource)

Registers a Python user-defined data source. dataSource must be a subclass of DataSource.

Examples

Register a custom data source and read from it:

Python
from pyspark.sql.datasource import DataSource, DataSourceReader

class MyDataSource(DataSource):
@classmethod
def name(cls):
return "my_data_source"

def schema(self):
return "id INT, value STRING"

def reader(self, schema):
return MyDataSourceReader(schema)

class MyDataSourceReader(DataSourceReader):
def read(self, partition):
yield (1, "hello")
yield (2, "world")

spark.dataSource.register(MyDataSource)
df = spark.read.format("my_data_source").load()
df.show()