Importing Data

The Create Table UI provides a simple way to upload small files into Databricks to get started.

Tip

We recommend using Databricks File System - DBFS instead of the Create Table UI to load your data into Databricks in a production manner. You can use a wide variety of Spark Data Sources to import data directly in your notebooks.

Uploading Data

If you have a small file on your local machine that you wish to analyze with Databricks you need to upload it to Databricks File System - DBFS in The FileStore. To do so,

  1. Click Data to open the data panel. Then click the Add Table Icon at the top of the Tables panel.
../_images/data-import-link.png
  1. Upload your file either by dragging it to the dropzone or clicking on dropzone and choosing your files
  2. After upload you will see a path displayed. You can use this path in a notebook to read data into your cluster (see below). This path will be something like /FileStore/tables/2esy8tnj1455052720017/.
../_images/data-import-files.png

Loading Data

You can read your raw data into Spark directly. For example, if you uploaded a CSV, you can read your data using one of these examples.

Tip

For easier access, we recommend that you create a table from your uploaded data by clicking Preview Table. See the documentation on Databases and Tables for more information.

Scala:

val sparkDF = sqlContext.read.format("csv").load("/FileStore/tables/2esy8tnj1455052720017/")

Python:

sparkDF = sqlContext.read.format("csv").load("/FileStore/tables/2esy8tnj1455052720017/")

R:

sparkDF <- read.df(sqlContext, source = "csv", path = "/FileStore/tables/2esy8tnj1455052720017/")

Scala RDD:

val rdd = sc.textFile("/FileStore/tables/2esy8tnj1455052720017/")

Python RDD:

rdd = sc.textFile("/FileStore/tables/2esy8tnj1455052720017/")

If the data is small enough, you can also load this data directly onto the driver node. For example:

Python:

pandas_df = pd.read_csv("/dbfs/FileStore/tables/2esy8tnj1455052720017/part_001-86465.tsv", header=True)

R:

df = read.csv("/dbfs/FileStore/tables/2esy8tnj1455052720017/part_001-86465.tsv", header = TRUE)

Editing Data

You cannot edit data directly within Databricks but you can overwrite the data file using Databricks File System - DBFS via Databricks Utilities - dbutils.

Deleting Data

You can use the following Databricks File System - DBFS command to delete the data.

Warning

Deleting data cannot be undone.

dbutils.fs.rm("dbfs:/FileStore/tables/2esy8tnj1455052720017/", true)