Skip to main content

zstd_compress

Returns a compressed value of expr using Zstandard with the specified compression level. The default level is 3. Uses single-pass mode by default.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.zstd_compress(input=<input>, level=<level>, streaming_mode=<streaming_mode>)

Parameters

Parameter

Type

Description

input

pyspark.sql.Column or str

The binary value to compress.

level

pyspark.sql.Column or int, optional

Optional integer argument that represents the compression level. The compression level controls the trade-off between compression speed and compression ratio. Valid values: between 1 and 22 inclusive, where 1 means fastest but lowest compression ratio, and 22 means slowest but highest compression ratio. The default level is 3 if not specified.

streaming_mode

pyspark.sql.Column or bool, optional

Optional boolean argument that represents whether to use streaming mode. If true, the function will compress in streaming mode. The default value is false.

Returns

pyspark.sql.Column: A new column that contains a compressed value.

Examples

Example 1: Compress data using Zstandard

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("Apache Spark " * 10,)], ["input"])
df.select(dbf.base64(dbf.zstd_compress(df.input)).alias("result")).show(truncate=False)
Output
+----------------------------------------+
|result |
+----------------------------------------+
|KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=|
+----------------------------------------+

Example 2: Compress data using Zstandard with given compression level

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("Apache Spark " * 10,)], ["input"])
df.select(dbf.base64(dbf.zstd_compress(df.input, dbf.lit(5))).alias("result")).show(truncate=False)
Output
+----------------------------------------+
|result |
+----------------------------------------+
|KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=|
+----------------------------------------+

Example 3: Compress data using Zstandard in streaming mode

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("Apache Spark " * 10,)], ["input"])
df.select(dbf.base64(dbf.zstd_compress(df.input, dbf.lit(3), dbf.lit(True))).alias("result")).show(truncate=False)
Output
+--------------------------------------------+
|result |
+--------------------------------------------+
|KLUv/QBYpAAAaEFwYWNoZSBTcGFyayABABLS+QUBAAA=|
+--------------------------------------------+