Skip to main content

slice

Returns a new array column by slicing the input array column from a start index to a specific length. The indices start at 1, and can be negative to index from the end of the array. The length specifies the number of elements in the resulting array.

Syntax

Python
from pyspark.sql import functions as sf

sf.slice(x, start, length)

Parameters

Parameter

Type

Description

x

pyspark.sql.Column or str

Input array column or column name to be sliced.

start

pyspark.sql.Column, str, or int

The start index for the slice operation. If negative, starts the index from the end of the array.

length

pyspark.sql.Column, str, or int

The length of the slice, representing number of elements in the resulting array.

Returns

pyspark.sql.Column: A new Column object of Array type, where each value is a slice of the corresponding list from the input column.

Examples

Example 1: Basic usage of the slice function.

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 2, 3],), ([4, 5],)], ['x'])
df.select(sf.slice(df.x, 2, 2)).show()
Output
+--------------+
|slice(x, 2, 2)|
+--------------+
| [2, 3]|
| [5]|
+--------------+

Example 2: Slicing with negative start index.

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 2, 3],), ([4, 5],)], ['x'])
df.select(sf.slice(df.x, -1, 1)).show()
Output
+---------------+
|slice(x, -1, 1)|
+---------------+
| [3]|
| [5]|
+---------------+

Example 3: Slice function with column inputs for start and length.

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 2, 3], 2, 2), ([4, 5], 1, 3)], ['x', 'start', 'length'])
df.select(sf.slice(df.x, df.start, df.length)).show()
Output
+-----------------------+
|slice(x, start, length)|
+-----------------------+
| [2, 3]|
| [4, 5]|
+-----------------------+