Welcome to Databricks¶
This page shows you around the Databricks UI and gets you started running code. You can copy and paste the code cells below into Databricks or you can import the Introduction to Apache Spark notebook series to get started.
To get help at any time, click the question mark button at the top right-hand corner.
Creating a Cluster¶
At the heart of Databricks sit Apache Spark clusters. In order to execute code (including Spark code) or import some data you’re going to need to create a cluster. Luckily doing so is simple!
To create a cluster Click on the Clusters Icon from the left side menu. The Clusters page allows you to manage clusters.
Once on the cluster page, click on in the upper left corner. Then enter a name for the cluster along with the configurations for that cluster that you’d like.
Once you’ve created a cluster, you can start executing code. To read more, see the Clusters documentation.
Creating a Notebook¶
On the left side, click on the Workspace Icon from the main menu to begin. Then click on down arrow on the right side of Workspace, select a folder, and choose Create > Notebook.
The Create Notebook dialog will appear:
- Enter a unique name for your notebook.
- For language, click the drop down and choose any language you like.
- For cluster, click the drop down and choose the cluster you created in the step above.
Using a Notebook¶
Now that you’ve created a notebook it’s time to start using it. First you’ll need to attach it to a cluster. You’ll see an option to do this right under the name of the notebook. Now that you created your notebook and attached it to a cluster, you can run some example commands.
To execute the command, type the script below and press
Shift+Enter to execute it.
In Databricks, notebooks already have some of the most useful Apache Spark variables that you’re going to need.
Do not create a SparkSession, SparkContext or SQLContext yourself in Databricks. Creating multiple contexts is not supported and can cause inconsistent behavior. Use the existing contexts provided with the notebook.
|SQL Context / Hive Context||
|SparkSession (2.0 Only)||
Running some Code¶
Perform addition by typing the command into the cell.
# A Spark Context is already created for you. # Do not create another or unspecified behavior may occur. spark
# A SQLContext is also already created for you. # Do not create another or unspecified behavior may occur. # As you can see below, the sqlContext provided is a HiveContext. sqlContext
# A Spark Context is already created for you. # Do not create another or unspecified behavior may occur. sc
Now that we’ve seen the pre-defined variables, let’s go ahead and run some real code!
1+1 # => 2
You should be able to see the answer to this immediately!
Creating a DataFrame¶
Now that we’ve executed some simple code, let’s go ahead and create a DataFrames and Datasets. The below code allows you to do this in python.
myDataFrame = sc.parallelize([('a', 1), ('b', 2), ('c', 3)]).toDF() display(myDataFrame)
Initially the result will be a table; click the and choose bar chart and you’ll see that we get a nice visualization of the values and their associated counts. Learn more about Visualizations.
Where to Go Next¶
We’ve now covered the basics of Databricks, including creating a cluster and using notebooks. To learn more about the platform, watch the videos below:
Getting Started Videos¶
- Introducing Databricks https://vimeo.com/130273206
- Cluster Manager and Jobs https://vimeo.com/156886719
- Collaboration https://vimeo.com/156886720
- Data Exploration https://vimeo.com/137874931
- Data Visualization https://vimeo.com/156886721
|Introducing Databricks||Cluster Manager and Jobs||Collaboration||Data Exploration||Data Visualization|