Skip to main content

jdbc (DataFrameReader)

Constructs a DataFrame representing the database table accessible via JDBC URL url. Partitions of the table are retrieved in parallel if either column or predicates is specified. If both column and predicates are specified, column takes precedence.

Syntax

jdbc(url, table, column=None, lowerBound=None, upperBound=None,
numPartitions=None, predicates=None, properties=None)

Parameters

Parameter

Type

Description

url

str

The JDBC URL of the form jdbc:subprotocol:subname.

table

str

The name of the table in the external database.

column

str, optional

The column to use for partitioning (alias for the partitionColumn option). Requires lowerBound, upperBound, and numPartitions.

lowerBound

int or str, optional

The minimum value of column for partitioning. Required when column is specified.

upperBound

int or str, optional

The maximum value of column for partitioning. Required when column is specified.

numPartitions

int, optional

The number of partitions. Required when column is specified.

predicates

list, optional

A list of WHERE clause expressions, each defining one partition of the DataFrame. Ignored if column is specified.

properties

dict, optional

JDBC connection arguments, typically including 'user' and 'password'. For example, {'user': 'SYSTEM', 'password': 'mypassword'}.

Returns

DataFrame

Notes

Avoid creating too many partitions in parallel on a large cluster, as this can crash external database systems.