Task 1: Explore Databricks fundamentals

Estimated time to complete: 35 minutes

In this first task, we cover the fundamental information you need to know about how the Databricks Unified Data Analytics Platform works and the types of problems it solves.

An introduction to Databricks

This video gives you a quick introduction to Databricks.

Databricks workspace user interface

Next is a tour of the Databricks workspace, which is the online web app that you and your colleagues will work in to access Databricks functionality. Before you get started, please have your credentials ready so that you can log into your Databricks workspace and follow along.

Now that you’ve toured the workspace, let’s review one component of the Workspace in greater detail: Databricks Notebooks.

Databricks notebooks

Databricks notebooks are the primary space where data practitioners perform their daily work. In this video, we review notebook basics: how to access them, how to use them, and how to manage them.

Hands-on practice with notebooks

Now it’s your turn to practice performing basic tasks in the Databricks workspace. Use your credentials to log into your Databricks workspace and follow this guide to:

  • Create a cluster
  • Create a notebook
  • Create a table
  • Query the table
  • Display data
  • Schedule a job

Tip

As a supplement to this guide, check out the Quickstart Tutorial notebook, available on your Databricks workspace landing page, for a 5-minute hands-on introduction to Databricks. Simply log into your Databricks workspace and click Explore the Quickstart Tutorial.

Step 1: Create a cluster

A cluster is a collection of Databricks computation resources. To create a cluster:

  1. In the sidebar, click compute icon Compute.

  2. On the Compute page, click Create Cluster.

    Create cluster
  3. On the Create Cluster page, enter the cluster name Quickstart and select 7.3 LTS (Scala 2.12, Spark 3.0.1) in the Databricks Runtime Version drop-down.

  4. Click Create Cluster.

Step 2: Create a notebook

A notebook is a collection of cells that run computations on a Databricks Runtime cluster. To create a notebook in the workspace:

  1. In the sidebar, click Workspace Icon Workspace.

  2. In the Workspace folder, select Down Caret Create > Notebook.

    Create notebook
  3. On the Create Notebook dialog, enter a name and select SQL in the Language drop-down. This selection determines the default language of the notebook.

  4. Click Create. The notebook opens with an empty cell at the top.

Step 3: Create a table

Create a Spark table using data from a sample CSV data file available in Databricks datasets, a collection of datasets mounted to Databricks File System (DBFS), a distributed file system installed on Databricks clusters.

Copy and paste this code snippet into a notebook cell:

DROP TABLE IF EXISTS diamonds;

CREATE TABLE diamonds USING CSV OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header "true")

Step 4: Query the table

Run a SQL statement to query the table for the average diamond price by color.

  1. To add a cell to the notebook, mouse over the cell bottom and click the Add Cell icon.

    Add cell
  2. Copy this snippet and paste it in the cell.

    SELECT color, avg(price) AS price FROM diamonds GROUP BY color ORDER BY COLOR
    
  3. Press SHIFT + ENTER. The notebook displays a table of diamond color and average price.

    Run command

Step 5: Display the data

Display a chart of the average diamond price by color.

  1. Click the Bar chart icon Chart Button.

  2. Click Plot Options.

    • Drag color into the Keys box.

    • Drag price into the Values box.

    • In the Aggregation drop-down, select AVG.

      Select aggregation
  3. Click Apply to display the bar chart.

    Apply chart type

Step 6: Create a job

  1. In the sidebar, click Jobs Icon Jobs.
  2. Enter a name in the text field to replace the placeholder text Untitled.
  3. Next to Task, click Select Notebook.
  4. Navigate to your user folder and select the notebook you were just working in.
  5. Next to Schedule, click Edit.
  6. Add the details for how often you want this job to run.

Continue onboarding