Skip to main content

inputFiles

Returns a best-effort snapshot of the files that compose this DataFrame. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.

Syntax

inputFiles()

Returns

list: List of file paths.

Examples

Python
import os
import tempfile
with tempfile.TemporaryDirectory(prefix="inputFiles") as d:
spark.createDataFrame(
[{"age": 100, "name": "Alice"}]
).repartition(1).write.json(d, mode="overwrite")

df = spark.read.format("json").load(d)

if os.environ.get('PYTEST_DBCONNECT_MODE') is None:
len(df.inputFiles())
else:
1 # dbconnect doesn't support inputFiles.
# 1