Notebooks are one interface for interacting with Databricks. If you are on the professional or enterprise tiers of Databricks, you may use Managing Access Control to control sharing of notebooks and folders in the workspace.
Creating a Notebook¶
Creating a notebook in Databricks is simple. First on the left side press the workspace button or the home icon. From there, next to any notebook in Databricks click the on the right side of the text and choose Create > Notebook. As seen below, it is not required that you create it from the root workspace. You can do this within other folders as well.
A dialog will appear and you’ll be able to enter a name as well as choose the notebook’s primary language. Notebooks support: Python, Scala, SQL, or R as their primary language.
Importing a notebook is easy. Depending on whether or not you’re importing from a URL or from a file you’ll need to follow the same basic steps.
In your Databricks workspace. Click on the workspace button on the left and under any folder hierarchy select the carat at the top. Then select import from the dropdown menu.
now that you’ve created a notebook it’s time to start using it. You’ll first need to attach your notebook to a cluster and can do so by clicking “Detached” under the notebook’s name at the top left. From the dropdown, select the cluster you’d like to attach to or create a new cluster.
Now that you created a notebook and attached it to a cluster, you can run some Spark code!
To execute the command, type the script below and press
Shift+Enter to execute it.
In order to add cells press the
+ icon or access the notebook cell menu at the far right by clicking the
Keyboard shortcuts shortcuts make it much easier to use notebooks and execute code. These are available at the top right under the
In Databricks, notebooks already have some of the most useful Apache Spark variables that you’re going to need. Do not create a
SQLContext yourself in Databricks. Doing so will lead to inconsistent behavior. Use the existing contexts provided within the notebook (and cluster). Here are the already pre-defined variables for you.
Description Variable Name Spark Context
SQL Context / Hive Context
SparkSession (2.0 Only)
In order to run code in a notebook, type the code you would like to execute in a cell and either click the “>” at the top right of the cell or press
shift+Enter. This will execute this code cell. For example, try executing the below python code.
# A Spark Context is already created for you. # Do not create another or unspecified behavior may occur. spark
# A SQLContext is also already created for you. # Do not create another or unspecified behavior may occur. # As you can see below, the sqlContext provided is a HiveContext. sqlContext
# A Spark Context is already created for you. # Do not create another or unspecified behavior may occur. sc
Now that we’ve seen the pre-defined variables, let’s go ahead and run some real code!
1+1 # => 2
Mixing Languages in a Notebook¶
While a notebook has a default language, in Databricks you can mix languages by using the language magic command.
For example, given a notebook you can execute code in any of our other supported languages by running any of the below by specifying the below string at the beginning of a cell.
%python- This allows you to execute python code in a notebook (even if that notebook is not python).
%sql- This allows you to execute sql code in a notebook (even if that notebook is not sql).
%r- This allows you to execute r code in a notebook (even if that notebook is not r).
%scala- This allows you to execute scala code in a notebook (even if that notebook is not scala).
%sh- This allows you to execute shell code in your notebook. Add the
-eoption in order to fail this cell (and subsequently a job or a run all command) if the shell command does not success. By default,
%shalone will not fail a job even if the
%shcommand does not completely succeed. Only
%sh -ewill fail if the shell command has a non-zero exit status.
%fs- This allows you to use Databricks Utilities - dbutils filesystem commands. Read more on the Databricks File System - DBFS and Databricks Utilities - dbutils pages.
Markdown and HTML in Notebooks¶
Another option is to include rendered markdown in your notebooks via the
%md magic command.
for example the below code will render as a markdown title.
%md # Hello This is a Title
Lastly you can also include raw HTML in your notebooks by using the function
displayHTML. Please check out the HTML, D3 & SVG notebook for an example of how to do this.
Once you’ve run your code, you may want to download those results to your local machine. The simplest way is to click the download all results button at the bottom of a cell that contains tabular output. You’ll see an option to download the preview of the results or the full results.
You can try this out by running
%sql SELECT 1
and downloading the results.
Running a Notebook from Another Notebook¶
You can run a notebook from another notebook by using
%run. This is roughly equivalent to a
:load command in a scala repl on your local machine or an
import statement in python. All variables defined in that other notebook with become available in your current notebook.
For example, given notebook A and notebook B.
NotebookA contains 1 cell that has the following python code:
x = 5
Running the code below in notebook B will work even though
x was never explicitly created.
print(x) # => 5
If you would like to specify a relative path, you need to preface it with
../. For example, if Notebook A and Notebook B are in the same directory then you can alternatively run them from a relative path.
print(x) # => 5
%run ../someDirectory/notebookA # up a directory and into another
print(x) # => 5
Exporting and Publishing Notebooks¶
You can also export notebooks from Databricks via the file menu. If you’re in the Community Edition tier, you can also publish a notebook so that you can share a URL. Any subsequent “publish” actions will update the notebook at that URL.
Variable and Class Isolation
In Databricks notebook, variables and classes that are not defined in a Scala package cell are only available in the current notebook. For example, two notebooks attached to the same cluster can define different variables and classes with the same name.
To define a class that is visible to all notebooks using the same cluster, you can define this class in a package cell. Then, you can access this class by using its fully qualified name, which is the same as accessing a class in an attached Scala/Java library.
Spark Session Isolation
Spark Session Isolation is available in Spark 2.0.2-db1 and higher versions.
For a cluster running Apache Spark 2.0.0 or a higher version, every notebook has a pre-defined variable called
SparkSession is the entry point for using different APIs in Spark as well as
setting different runtime configurations. For Spark 2.0.0 and Spark 2.0.1-db1,
notebooks attached to a cluster share the same
SparkSession. Starting from Spark 2.0.2-db1,
the creator of a cluster has the option to enable Spark Session Isolation to make every notebook attached to this cluster in
its own session, i.e. every notebook uses its own
To enable Spark Session Isolation, the creator of a cluster can set
false in the Spark Config field on the cluster creation page.
By default, users of Spark 2.0.2-db1 will share a single spark session
true by default).
Spark 2.1.0 and higher versions will have session isolation enabled by default.
false, every attached notebook is in its own session,
- Runtime configurations set using
spark.conf.setor using SQL’s set command only affect the current notebook. Please note that configurations for metastore connection are not runtime configurations and all notebooks attached to a cluster share these configurations.
- Setting the current database only affects the current notebook.
- Temporary views created by
CREATE TEMPORARY VIEWcommand are only visible in the current notebook.
Starting from Apache Spark 2.1, in order to share temporary views across notebooks, you can use global temporary views.
It is worth noting that cells that trigger commands in other languages
(i.e. cells using
%sql) and cells that include other notebooks
(i.e. cells using
%run) are part of the current notebook.
Thus, these cells are in the same session as other regular notebook cells.
In contrast, Notebook Workflows run a notebook with an isolated
SparkSession, which means temporary views defined
in such a notebook are not visible by other notebooks.
Databricks has basic version control for notebooks. To access version control, click the
Revision History menu on the top right of every notebook. You can specify revisions with comments and those will be permanently saved. For more explicit versioning, Databricks also integrates with GitHub Version Control to store these revision in GitHub. Please see that section for more details.
Deleting a Notebook¶
Since notebooks live inside of the workspace (and in folders in the workspace), they follow the same rules as folders. See the Accessing The Workspace Menu for more information about how to access the workspace menu and delete notebooks or other items in the workspace.
- GitHub Version Control
- Notebook Workflows
- Package Cells