The Create Table UI provides a simple way to upload small files into Databricks to get started.
We recommend using Databricks File System - DBFS instead of the Create Table UI to load your data into Databricks in a production manner. You can use a wide variety of Spark Data Sources to import data directly in your notebooks.
Datato open the data panel. Then click the at the top of the
- Upload your file either by dragging it to the dropzone or clicking on dropzone and choosing your files
- After upload you will see a path displayed. You can use this path in a notebook to read data into your cluster (see below). This path will be something like
You can read your raw data into Spark directly. For example, if you uploaded a CSV, you can read your data using one of these examples.
For easier access, we recommend that you create a table from your uploaded data by clicking
Preview Table. See the documentation on Databases and Tables for more information.
val sparkDF = sqlContext.read.format("csv").load("/FileStore/tables/2esy8tnj1455052720017/")
sparkDF = sqlContext.read.format("csv").load("/FileStore/tables/2esy8tnj1455052720017/")
sparkDF <- read.df(sqlContext, source = "csv", path = "/FileStore/tables/2esy8tnj1455052720017/")
val rdd = sc.textFile("/FileStore/tables/2esy8tnj1455052720017/")
rdd = sc.textFile("/FileStore/tables/2esy8tnj1455052720017/")
If the data is small enough, you can also load this data directly onto the driver node. For example:
pandas_df = pd.read_csv("/dbfs/FileStore/tables/2esy8tnj1455052720017/part_001-86465.tsv", header=True)
df = read.csv("/dbfs/FileStore/tables/2esy8tnj1455052720017/part_001-86465.tsv", header = TRUE)