Skip to main content

sql

Returns a DataFrame representing the result of the given query.

When kwargs is specified, this method formats the given string using the Python standard formatter. The method binds named parameters to SQL literals or positional parameters from args. Named and positional parameters cannot be mixed in the same SQL query.

Syntax

sql(sqlQuery, args=None, **kwargs)

Parameters

Parameter

Type

Description

sqlQuery

str

SQL query string.

args

dict or list, optional

A dictionary of parameter names to Python objects, or a list of Python objects that can be converted to SQL literal expressions. Use :param_name syntax for named parameters or ? for positional parameters in the query string.

**kwargs

optional

Variables that can be referenced in the query using Python formatter syntax (for example, {varname}). This feature is experimental and unstable.

Returns

DataFrame

Notes

In Spark Classic, a temporary view referenced in spark.sql is resolved immediately. In Spark Connect, it is lazily analyzed, so if a view is dropped, modified, or replaced after spark.sql, the execution may fail or generate different results.

Examples

Python
# Execute a basic SQL query.
spark.sql("SELECT * FROM range(10) where id > 7").show()
# +---+
# | id|
# +---+
# | 8|
# | 9|
# +---+

# Use Python formatter variables.
spark.sql(
"SELECT * FROM range(10) WHERE id > {bound1} AND id < {bound2}", bound1=7, bound2=9
).show()
# +---+
# | id|
# +---+
# | 8|
# +---+

# Use named parameters with the : prefix.
from pyspark.sql.functions import create_map, lit
mydf = spark.createDataFrame([(1, 4), (2, 4), (3, 6)], ["A", "B"])
spark.sql(
"SELECT *, element_at(:m, 'a') AS C FROM {df} WHERE {df[B]} > :minB",
{"minB": 5, "m": create_map(lit('a'), lit(1))}, df=mydf).show()
# +---+---+---+
# | A| B| C|
# +---+---+---+
# | 3| 6| 1|
# +---+---+---+

# Use positional parameters marked by ?.
from pyspark.sql.functions import array
spark.sql(
"SELECT *, element_at(?, 1) AS C FROM {df} WHERE {df[B]} > ? and ? < {df[A]}",
args=[array(lit(1), lit(2), lit(3)), 5, 2], df=mydf).show()
# +---+---+---+
# | A| B| C|
# +---+---+---+
# | 3| 6| 1|
# +---+---+---+