%md ## Overview This notebook will show you how to create and query a table or DataFrame that you uploaded to DBFS. [DBFS](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html) is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from.
%md ### Step 1: File location and type Of note, this notebook is written in **Python** so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported. First we'll need to set the location and type of the file. You set the file location when you uploaded the file. We'll do this using [widgets](https://docs.databricks.com/user-guide/notebooks/widgets.html). Widgets allow us to parameterize the exectuion of this entire notebook. First we'll create them, then we'll be able to reference them throughout the notebook.
Step 1: File location and type
Of note, this notebook is written in Python so the default cell type is Python. However, you can use different languages by using the %LANGUAGE
syntax. Python, Scala, SQL, and R are all supported.
First we'll need to set the location and type of the file. You set the file location when you uploaded the file. We'll do this using widgets. Widgets allow us to parameterize the exectuion of this entire notebook. First we'll create them, then we'll be able to reference them throughout the notebook.
Last refresh: Never
dbutils.widgets.text("file_location", "/uploads/data", "Upload Location") dbutils.widgets.dropdown("file_type", "csv", ["csv", 'parquet', 'json']) # this can be csv, parquet, json and or any Other Spark Data source. # See: https://docs.databricks.com/spark/latest/data-sources/index.html # for more information.
%md ### Step 2: Reading the data Now that we specified our file metadata, we can create a DataFrame. You'll notice that we use an *option* to specify that we'd like to infer the schema from the file. We can also explicitly set this to a particular schema if we have one already. First, let's create a DataFrame in Python, notice how we will programmatically reference the widget values we defined above.
Step 2: Reading the data
Now that we specified our file metadata, we can create a DataFrame. You'll notice that we use an option to specify that we'd like to infer the schema from the file. We can also explicitly set this to a particular schema if we have one already.
First, let's create a DataFrame in Python, notice how we will programmatically reference the widget values we defined above.
Last refresh: Never
df = spark.read.format(dbutils.widgets.get("file_type")).option("inferSchema", "true").load(dbutils.widgets.get("file_location"))
%md ### Step 3: Querying the data Now that we created our DataFrame. We can query it. For instance, you can select some particular columns to select and display within Databricks.
Step 3: Querying the data
Now that we created our DataFrame. We can query it. For instance, you can select some particular columns to select and display within Databricks.
Last refresh: Never
%md ### Step 4: (Optional) Create a view or table If you'd like to be able to use query this data as a table, it is simple to register it as a *view* or a table.
Step 4: (Optional) Create a view or table
If you'd like to be able to use query this data as a table, it is simple to register it as a view or a table.
Last refresh: Never
%md We can query this using Spark SQL. For instance, we can perform a simple aggregation. Notice how we can use `%sql` in order to query the view from SQL.
We can query this using Spark SQL. For instance, we can perform a simple aggregation. Notice how we can use %sql
in order to query the view from SQL.
Last refresh: Never
%md With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table. You can also create a table from the DataFrame.
With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table. You can also create a table from the DataFrame.
Last refresh: Never
Overview
This notebook will show you how to create and query a table or DataFrame that you uploaded to DBFS. DBFS is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from.
Last refresh: Never