Parameter markers
Applies to: Databricks Runtime 12.1 and later
Parameter markers are named and typed placeholder variables used to supply values from the API invoking the SQL statement.
Parameter markers can be provided by:
Python using its pyspark.sql.SparkSession.sql() API.
Scala using its org.apache.spark.sql.SparkSession.sql() API.
Using parameter markers protects your code from SQL injection attacks since it clearly separates provided values from the SQL statements.
Notes
You can reference the same parameter marker multiple times within the same SQL Statement. If no value has been bound to the parameter marker an UNBOUND_SQL_PARAMETER error is raised. You are not required to reference all supplied parameter markers.
The mandatory preceding :
(colon) differentiates the namespace of parameter markers from that of column names and SQL parameters.
You must not reference a parameter marker in a DDL statement, such as a generated column or DEFAULT definition, a view, or a SQL function.
Examples
The following example defines two parameter markers:
later: An
INTERVAL HOUR
with value 3.x: A
FLOAT
with value 15.0
x
is referenced multiple times, while later
is referenced once.
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.appName("Spark named parameter marker example")
.getOrCreate()
spark.sql(
sqlText = "SELECT current_timestamp() + :later, :x * :x AS square",
args = Map("later" -> "INTERVAL'3' HOUR",
"x" -> "15.0")).show()
// +----------------------------------------+------+
// |current_timestamp() + INTERVAL '03' HOUR|square|
// +----------------------------------------+------+
// | 2023-02-27 17:48:...|225.00|
// +----------------------------------------+------+
import org.apache.spark.sql.*;
import static java.util.Map.entry;
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL UDF scalar example")
.getOrCreate();
Map<String, String> argMap = Map.ofEntries(
entry("later", "INTERVAL'3' HOUR"),
entry("x", "15.0")
);
spark.sql(
sqlText = "SELECT current_timestamp() + :later, :x * :x AS square",
args = argMap).show();
// +----------------------------------------+------+
// |current_timestamp() + INTERVAL '03' HOUR|square|
// +----------------------------------------+------+
// | 2023-02-27 17:48:...|225.00|
// +----------------------------------------+------+
spark.sql("SELECT current_timestamp() + :later, :x * :x AS square",
args = { "later", "INTERVAL'3' HOUR",
"x" , "15.0" }).show()
// +----------------------------------------+------+
// |current_timestamp() + INTERVAL '03' HOUR|square|
// +----------------------------------------+------+
// | 2023-02-27 17:48:...|225.00|
// +----------------------------------------+------+