Skip to main content

array_insert

Inserts an item into a given array at a specified array index. Array indices start at 1, or start from the end if index is negative. Index above array size appends the array, or prepends the array if index is negative, with 'null' elements.

Syntax

Python
from pyspark.sql import functions as sf

sf.array_insert(arr, pos, value)

Parameters

Parameter

Type

Description

arr

pyspark.sql.Column or str

Name of column containing an array

pos

pyspark.sql.Column, str, or int

Name of Numeric type column indicating position of insertion (starting at index 1, negative position is a start from the back of the array)

value

Any

A literal value, or a Column expression.

Returns

pyspark.sql.Column: an array of values, including the new specified value

Examples

Example 1: Inserting a value at a specific position

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(['a', 'b', 'c'],)], ['data'])
df.select(sf.array_insert(df.data, 2, 'd')).show()
Output
+------------------------+
|array_insert(data, 2, d)|
+------------------------+
| [a, d, b, c]|
+------------------------+

Example 2: Inserting a value at a negative position

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(['a', 'b', 'c'],)], ['data'])
df.select(sf.array_insert(df.data, -2, 'd')).show()
Output
+-------------------------+
|array_insert(data, -2, d)|
+-------------------------+
| [a, b, d, c]|
+-------------------------+

Example 3: Inserting a value at a position greater than the array size

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(['a', 'b', 'c'],)], ['data'])
df.select(sf.array_insert(df.data, 5, 'e')).show()
Output
+------------------------+
|array_insert(data, 5, e)|
+------------------------+
| [a, b, c, NULL, e]|
+------------------------+

Example 4: Inserting a NULL value

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(['a', 'b', 'c'],)], ['data'])
df.select(sf.array_insert(df.data, 2, sf.lit(None))).show()
Output
+---------------------------+
|array_insert(data, 2, NULL)|
+---------------------------+
| [a, NULL, b, c]|
+---------------------------+

Example 5: Inserting a value into a NULL array

Python
from pyspark.sql import functions as sf
from pyspark.sql.types import ArrayType, IntegerType, StructType, StructField
schema = StructType([StructField("data", ArrayType(IntegerType()), True)])
df = spark.createDataFrame([(None,)], schema=schema)
df.select(sf.array_insert(df.data, 1, 5)).show()
Output
+------------------------+
|array_insert(data, 1, 5)|
+------------------------+
| NULL|
+------------------------+