Introduction to importing, reading, and modifying data

This article describes how to import data into Databricks using the UI, read imported data using the Spark and local APIs, and modify imported data using Databricks File System (DBFS) commands.

Import data

If you have small data files on your local machine that you want to analyze with Databricks, you can import them to DBFS using the UI.


This feature may be disabled by admin users. To enable or disable this setting, see Manage data upload.

There are two ways to upload data to DBFS with the UI:

  • Upload files to the FileStore in the Upload Data UI.

    Upload data
  • Upload data to a table with the Create table UI, which is also accessible via the Import & Explore Data box on the landing page.

    Import and explore data

Files imported to DBFS using these methods are stored in FileStore.

For production environments, we recommend that you explicitly upload files into DBFS using the DBFS CLI, DBFS API 2.0, Databricks file system utility (dbutils.fs).

You can also use a wide variety of data sources to access data.

Read data on cluster nodes using Spark APIs

You read data imported to DBFS into Apache Spark DataFrames using Spark APIs. For example, if you import a CSV file, you can read the data using one of these examples.


For easier access, we recommend that you create a table. See Databases and tables for more information.

sparkDF ="/FileStore/tables/state_income-9f7c5.csv", header="true", inferSchema="true")
sparkDF <- read.df(source = "csv", path = "/FileStore/tables/state_income-9f7c5.csv", header="true", inferSchema = "true")
val sparkDF ="csv")
.option("header", "true")
.option("inferSchema", "true")

Read data on cluster nodes using local APIs

You can also read data imported to DBFS in programs running on the Spark driver node using local file APIs. For example:

pandas_df = pd.read_csv("/dbfs/FileStore/tables/state_income-9f7c5.csv", header='infer')
df = read.csv("/dbfs/FileStore/tables/state_income-9f7c5.csv", header = TRUE)

Modify uploaded data

You cannot edit imported data directly within Databricks, but you can overwrite a data file using Spark APIs, the DBFS CLI, DBFS API 2.0, and Databricks file system utility (dbutils.fs).

To delete data from DBFS, use the same APIs and tools. For example, you can use the Databricks utilities command dbutils.fs.rm:



Deleted data cannot be recovered.