Data format options

Databricks has built-in keyword bindings for all of the data formats natively supported by Apache Spark. Databricks uses Delta Lake as the default protocol for reading and writing data and tables, whereas Apache Spark uses Parquet.

These articles provide an overview of many of the options and configurations available when you query data on Databricks.

The following data formats have built-in keyword configurations in Apache Spark DataFrames and SQL:

Delta Lake
Iceberg
Delta Sharing
Parquet
ORC
JSON
CSV
Avro
Text
Binary
XML

Databricks also provides a custom keyword for loading MLflow experiments.

Data formats with special considerations

Some data formats require additional configuration or special considerations for use:

Databricks recommends loading images as binary data.
Databricks can directly read compressed files in many file formats. You can also unzip compressed files on Databricks if necessary.

For more information about Apache Spark data sources, see Generic Load/Save Functions and Generic File Source Options.

Data formats with special considerations​

Data formats with special considerations