Quickstart

This quickstart gets you going with Databricks: you create a cluster and a notebook, create a table from a dataset, query the table, and display the query results.

Step 1: Log in to your Databricks account

  1. Go to your Databricks account URL.

  2. Log in with your Databricks username and password. You’ll see the Databricks landing page.

    ../_images/landing-aws.png

From the sidebar at the left and the New list, you access fundamental Databricks entities: the Workspace, clusters, tables, notebooks, jobs, and libraries. The Workspace is the special root folder that stores your Databricks assets, such as notebooks and libraries, and the data that you import.

To get help, click the question icon Question Icon at the top right-hand corner.

Help Menus

Step 2: Create a cluster

A cluster is a collection of Databricks computation resources. To create a cluster:

  1. In the sidebar, click the Clusters button Clusters Icon.

  2. On the Clusters page, click Create Cluster.

    ../_images/new-cluster.png
  3. On the New Cluster page, specify a cluster name and select 4.0 (includes Apache Spark 2.3.0, Scala 11) in the Databricks Runtime Version drop-down.

  4. Click Create Cluster.

Step 3: Create a notebook

A notebook is a collection of cells that run computations on a Spark cluster. To create a notebook in the Workspace:

  1. In the sidebar, click the Workspace button Workspace Icon.

  2. In the Workspace folder, select Down Caret Create > Notebook.

    ../_images/create-notebook.png
  3. On the Create Notebook dialog, enter a name and select Python in the Language drop-down.

  4. Click Create. The notebook opens with an empty cell at the top.

Step 4: Create a table

Run a SQL statement to create a temporary table using data from a sample CSV data file available in Databricks Datasets.

  1. Copy and paste this code snippet into the notebook cell. The path value denotes the location of the sample CSV file.

    %sql
    DROP TABLE IF EXISTS diamonds;
    
    CREATE TEMPORARY TABLE diamonds
      USING csv
      OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header "true")
    
  2. Press SHIFT + ENTER. The notebook automatically attaches to the cluster you created in Step 2, creates the table, loads the data, and returns OK.

    ../_images/quick-start-load-data.png

Step 5: Query the table

Run a SQL statement to query the table for the average diamond price by color.

  1. To add a cell to the notebook, mouse over the cell bottom and click the Add Cell icon.

    ../_images/quick-start-new-cell.png
  2. Copy this snippet and paste it in the cell.

    %sql
    select color, avg(price) as price from diamonds group by color
    
  3. Press SHIFT + ENTER. The notebook displays a table of diamond color and average price.

    ../_images/diamonds-table.png

Step 6: Display the data

Display a chart of the average diamond price by color.

  1. Click the Bar chart icon Chart Button.

  2. Click Plot Options.

    • Drag color into the Keys box.

    • Drag price into the Values box.

    • In the Aggregation drop-down, select AVG.

      ../_images/diamonds-plot-options.png
  3. Click Apply to display the bar chart.

    ../_images/diamonds-bar-chart.png

What’s Next

We’ve now covered the basics of Databricks, including creating a cluster and a notebook, running SQL commands in the notebook, and displaying results.

To see some interesting applications of the Databricks platform, watch the videos below:

Data Exploration —— Data Visualization

To read more about the primary tools you can use and tasks you can perform with the Databricks platform, see: