zstd_compress
Returns a compressed value of expr using Zstandard with the specified compression level. The default level is 3. Uses single-pass mode by default.
Syntax
from pyspark.databricks.sql import functions as dbf
dbf.zstd_compress(input=<input>, level=<level>, streaming_mode=<streaming_mode>)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| The binary value to compress. |
|
| Optional integer argument that represents the compression level. The compression level controls the trade-off between compression speed and compression ratio. Valid values: between 1 and 22 inclusive, where 1 means fastest but lowest compression ratio, and 22 means slowest but highest compression ratio. The default level is 3 if not specified. |
|
| Optional boolean argument that represents whether to use streaming mode. If true, the function will compress in streaming mode. The default value is false. |
Returns
pyspark.sql.Column: A new column that contains a compressed value.
Examples
Example 1: Compress data using Zstandard
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("Apache Spark " * 10,)], ["input"])
df.select(dbf.base64(dbf.zstd_compress(df.input)).alias("result")).show(truncate=False)
+----------------------------------------+
|result |
+----------------------------------------+
|KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=|
+----------------------------------------+
Example 2: Compress data using Zstandard with given compression level
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("Apache Spark " * 10,)], ["input"])
df.select(dbf.base64(dbf.zstd_compress(df.input, dbf.lit(5))).alias("result")).show(truncate=False)
+----------------------------------------+
|result |
+----------------------------------------+
|KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=|
+----------------------------------------+
Example 3: Compress data using Zstandard in streaming mode
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("Apache Spark " * 10,)], ["input"])
df.select(dbf.base64(dbf.zstd_compress(df.input, dbf.lit(3), dbf.lit(True))).alias("result")).show(truncate=False)
+--------------------------------------------+
|result |
+--------------------------------------------+
|KLUv/QBYpAAAaEFwYWNoZSBTcGFyayABABLS+QUBAAA=|
+--------------------------------------------+