DBFS - Databricks

Step 1: File location and type

Of note, this notebook is written in Python so the default cell type is Python. However, you can use different languages by using the %LANGUAGE syntax. Python, Scala, SQL, and R are all supported.

First we'll need to set the location and type of the file. You set the file location when you uploaded the file. We'll do this using widgets. Widgets allow us to parameterize the exectuion of this entire notebook. First we'll create them, then we'll be able to reference them throughout the notebook.

dbutils.widgets.text("file_location", "/uploads/data", "Upload Location")
dbutils.widgets.dropdown("file_type", "csv", ["csv", 'parquet', 'json'])
# this can be csv, parquet, json and or any Other Spark Data source.
# See: https://docs.databricks.com/spark/latest/data-sources/index.html
# for more information.

df = spark.read.format(dbutils.widgets.get("file_type")).option("inferSchema", "true").load(dbutils.widgets.get("file_location"))

display(df.select("EXAMPLE_COLUMN"))

df.createOrReplaceTempView("YOUR_TEMP_VIEW_NAME")

%sql

SELECT EXAMPLE_GROUP, SUM(EXAMPLE_AGG) FROM YOUR_TEMP_VIEW_NAME GROUP BY EXAMPLE_GROUP

df.write.format("delta").saveAsTable("MY_PERMANENT_TABLE_NAME")

Overview

Step 1: File location and type

Step 2: Reading the data

Step 3: Querying the data

Step 4: (Optional) Create a view or table