Get started with Databricks as a data engineer
The goal of a data engineer is to take data in its most raw form, enrich it, and make it easily available to other authorized users, typically data scientists and data analysts. This quickstart walks you through ingesting data, transforming it, and writing it to a table for easy consumption.
Before you begin
Before you can run through this quickstart, you must have:
Access to a Databricks workspace. For more information, see Sign up for a free trial and Set up your Databricks account and create a workspace.
Permission to create a cluster in that Databricks workspace.
Data Science & Engineering UI

From the sidebar at the left and the Common Tasks list on the landing page, you access fundamental Databricks Data Science & Engineering entities: the Workspace, clusters, tables, notebooks, jobs, and libraries. The Workspace is the special root folder that stores your Databricks assets, such as notebooks and libraries, and the data that you import.
Use the sidebar
You can access all of your Databricks assets using the sidebar. The sidebar’s contents depend on the selected persona: Data Science & Engineering, Machine Learning, or SQL.
By default, the sidebar appears in a collapsed state and only the icons are visible. Move your cursor over the sidebar to expand to the full view.
To change the persona, click the icon below the Databricks logo
, and select a persona.
To pin a persona so that it appears the next time you log in, click
next to the persona. Click it again to remove the pin.
Use Menu options at the bottom of the sidebar to set the sidebar mode to Auto (default behavior), Expand, or Collapse.
Step 1: Create a cluster
In order to do exploratory data analysis and data engineering, you must first create a cluster of computation resources to execute commands against.
Log into Databricks and make sure you’re in the Data Science & Engineering workspace.
In the sidebar, click
Compute.
On the Compute page, click Create Cluster.
On the Create Cluster page, specify the cluster name Quickstart, accept the remaining defaults, and click Create Cluster.
Step 2: Ingest data
The easiest way to ingest your data into Databricks is to use the Create Table Wizard. In the sidebar, click Data and then click the Create Table button.

On the Create New Table dialog, drag and drop a CSV file from your computer into the Files section. If you need an example file to test, download the diamonds dataset to your local computer and drag it to upload.

Click the Create Table with UI button.
Select the Quickstart cluster you created in step 2.
Click the Preview Table button.
Scroll down to see the Specify Table Attributes section and preview the data.
Select the First row is header option.
Select the Infer Schema option.
Click Create Table.
You have successfully created a Delta Lake table that can be queried.
Additional data ingestion options
Alternatively, you can click the Create Table in Notebook button to inspect and modify code in a notebook to create a table. You can use this technique to generate code for ingesting data from other data sources such as Redshift, Kinesis, or JDBC by clicking the Other Data Sources selector.
If there are other data sources to ingest data from, like Salesforce, you can easily leverage Databricks partner by clicking Partner Connect in the sidebar. When you select a partner from Partner Connect, you can connect the partner’s application to Databricks and even start a free trial if you are not already a customer of the partner. See Databricks Partner Connect guide.
Step 3: Query data
A notebook is a collection of cells that run computations on a cluster. To create a notebook in the workspace:
In the sidebar, click
Workspace.
In the Workspace folder, select
Create > Notebook.
On the Create Notebook dialog, enter a name and select Python in the Default Language drop-down.
Click Create. The notebook opens with an empty cell at the top.
Enter the following code in the first cell and run it by clicking SHIFT+ENTER.
df = table("diamonds_csv") display(df)
The notebook displays a table of diamond color and average price.
.
Create another cell, this time using the
%sql
magic command to enter a SQL query:%sql select * from diamonds_csv
You can use the %sql, %r, %python, or %scala magic commands at the beginning of a cell to override the notebook’s default language.
Click SHIFT+ENTER to run the command.
Step 4: Visualize data
Display a chart of the average diamond price by color.
Click the Bar chart icon
.
Click Plot Options.
Drag color into the Keys box.
Drag price into the Values box.
In the Aggregation drop-down, select AVG.
Click Apply to display the bar chart.
Step 5: Transform data
The best way to create trusted and scalable data pipelines is to use Delta Live Tables.
To learn how to build an effective pipeline and run it end to end, follow the steps in the Delta Live Table Quickstart.
Step 6: Set up data governance
To control access to a table in Databricks:
Use the persona switcher in the sidebar to switch to the Databricks SQL environment.
Click the icon below the Databricks logo
and select SQL.
Click the
Data in the sidebar.
#. In the drop-down list at the top right, select a SQL endpoint, such as Starter Endpoint. in the sidebar.
Filter for the diamondscsv_ table you created in Step 2.
Type
dia
in the text box following the default database.On the Permissions tab, click the Grant button
Give
All Users
the ability to SELECT and READ_METADATA for the table.Click OK.
Now all users can query the table that you created.
Step 9: Schedule a job
You can schedule a job to run a data processing task in a Databricks cluster with scalable resources. Your job can consist of a single task or be a large, multi-task application with complex dependencies.
To learn how to create a job that orchestrates tasks to read and process a sample dataset, follow the steps in the Jobs quickstart.