partitionBy (DataStreamWriter)

ファイルシステム上で、指定された列に基づいて出力を分割します。出力はHiveのパーティショニング方式と同様のレイアウトになっています。

構文

partitionBy(*cols)

パラメーター

パラメーター	Type	説明
`*cols`	文字列またはリスト	パーティション分割に使用する列の名前。

戻り値

DataStreamWriter

例

Python
df = spark.readStream.format("rate").load()
df.writeStream.partitionBy("value")
# <...streaming.readwriter.DataStreamWriter object ...>

レートソースストリームをタイムスタンプで分割し、 Parquetに書き込みます。

Python
import tempfile
import time
with tempfile.TemporaryDirectory(prefix="partitionBy1") as d:
    with tempfile.TemporaryDirectory(prefix="partitionBy2") as cp:
        df = spark.readStream.format("rate").option("rowsPerSecond", 10).load()
        q = df.writeStream.partitionBy(
            "timestamp").format("parquet").option("checkpointLocation", cp).start(d)
        time.sleep(5)
        q.stop()
        spark.read.schema(df.schema).parquet(d).show()

構文​

パラメーター​

戻り値​

例​

構文

パラメーター

戻り値

例