st_collect
Applies to: Databricks Runtime 18 LTS and above
This feature is in Public Preview.
Collects an array of Geography or Geometry values into a single multipoint, multilinestring, multipolygon, or geometry collection.
For the corresponding Databricks SQL function, see st_collect function.
Syntax
from pyspark.databricks.sql import functions as dbf
dbf.st_collect(col=<col>)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| An array of Geography values, or an array of Geometry values. |
Returns
pyspark.sql.Column: A Geography or Geometry value, representing a multipoint, multilinestring, multipolygon, or geometry collection.
Any None values in the input array are ignored. The type of the output depends on the types of the non-None input geometries:
- If all non-
Noneelements are points, returns a multipoint. - If all non-
Noneelements are linestrings, returns a multilinestring. - If all non-
Noneelements are polygons, returns a multipolygon. - Otherwise, returns a geometry collection.
Each output contains one element per non-None array element.
Multi-typed inputs (multipoint, multilinestring, multipolygon) and geometry collection inputs are preserved as elements of the resulting geometry collection; they are not flattened.
The SRID value of the output is the common SRID value of the non-None input geometries.
The dimension of the output is the maximum common dimension of the non-None input geometries.
If the input array is empty or contains only None values, the 2D empty geometry collection is returned. In this case, the SRID of the output is determined as follows:
- If the input array's element type is
GEOGRAPHY(ANY), the SRID of the output is4326. - If the input array's element type is
GEOMETRY(ANY), the SRID of the output is0. - Otherwise, the SRID of the output is that of the input array's element type.
If any two non-None input geometries have different SRID values, the function raises a ST_DIFFERENT_SRID_VALUES error.
The function returns None if the input is None.
Examples
Collects an array of points into a multipoint.
from pyspark.databricks.sql import functions as dbf
from pyspark.sql import functions as sf
df = spark.createDataFrame([('POINT(1 2)', 'POINT(3 4)')], ['wkt1', 'wkt2'])
df.select(dbf.st_astext(dbf.st_collect(sf.array(dbf.st_geomfromtext('wkt1'), dbf.st_geomfromtext('wkt2')))).alias('result')).collect()
[Row(result='MULTIPOINT((1 2),(3 4))')]
Collects an array of polygons into a multipolygon.
from pyspark.databricks.sql import functions as dbf
from pyspark.sql import functions as sf
df = spark.createDataFrame([('POLYGON((0 0,10 0,10 10,0 10,0 0))',)], ['wkt'])
df.select(dbf.st_astext(dbf.st_collect(sf.array(dbf.st_geomfromtext('wkt')))).alias('result')).collect()
[Row(result='MULTIPOLYGON(((0 0,10 0,10 10,0 10,0 0)))')]
Collects an array of mixed geometry kinds into a geometry collection.
from pyspark.databricks.sql import functions as dbf
from pyspark.sql import functions as sf
df = spark.createDataFrame([('POLYGON((0 0,10 0,10 10,0 10,0 0))', 'LINESTRING(1 2,3 4)')], ['wkt1', 'wkt2'])
df.select(dbf.st_astext(dbf.st_collect(sf.array(dbf.st_geomfromtext('wkt1'), dbf.st_geomfromtext('wkt2')))).alias('result')).collect()
[Row(result='GEOMETRYCOLLECTION(POLYGON((0 0,10 0,10 10,0 10,0 0)),LINESTRING(1 2,3 4))')]
Returns the 2D empty geometry collection for an empty input array.
from pyspark.databricks.sql import functions as dbf
from pyspark.sql import functions as sf
df = spark.range(1)
df.select(dbf.st_astext(dbf.st_collect(sf.array())).alias('result')).collect()
[Row(result='GEOMETRYCOLLECTION EMPTY')]
Returns None for a None input.
from pyspark.databricks.sql import functions as dbf
from pyspark.sql import functions as sf
df = spark.range(1)
df.select(dbf.st_collect(sf.lit(None)).alias('result')).collect()
[Row(result=None)]