Data analysis with notebooks
In SAP Databricks, users can use Databricks notebooks to perform data science and analytics tasks. Notebooks use Databricks' robust platform capabilities so you can work with SAP and external data.
Notebooks are a common tool in data science and machine learning for developing code and presenting results. In Databricks, notebooks are the primary tool for creating data science and machine learning workflows and collaborating with colleagues. Databricks notebooks provide real-time coauthoring in multiple languages, automatic versioning, and built-in data visualizations.
Databricks notebooks in SAP Databricks support Python and SQL, and allow users to embed visualizations alongside links, images, and commentary written in markdown.
This page details specific guidance for using notebooks in SAP Databricks.
Features
The following features related to notebooks are included in SAP Databricks:
- Databricks notebooks
- Serverless compute
- Visualizations in Databricks notebooks
- Interactive Python debugger
- Scheduled notebooks
- Git folders
- Databricks Assistant
- Web terminal
Create and edit a notebook
A Databricks notebook is a web-based code editor that allows you to write code and view results for interactive data analysis.
To create a new notebook in your default folder, click + New in the left sidebar and select Notebook from the menu.
Databricks creates and opens a new, blank notebook in your default folder. The default language is the language you most recently used, and the notebook is automatically attached to the compute resource that you most recently used.
Connect to serverless compute resources
In SAP Databricks, serverless compute allows you to quickly connect your notebook to on-demand computing resources.
To attach to the serverless compute, click the Connect drop-down menu in the notebook and select Serverless. You can also connect to any serverless SQL warehouses that you have access to. To learn more about the types of serverless compute in SAP Databricks, see Serverless compute.
Import SAP data into a notebook
An active SAP data product can be analyzed in a notebook once it is mounted to a catalog in Unity Catalog. To analyze this data, you need READ access to the catalog and schema that contain the target dataset.
Below is an example query:
- SQL
- Python
select * from sap_data.cashflow.cashflowforecast
display(spark.read.table("sap_data.cashflow.cashflowforecast"))
Create visualizations
Databricks has built-in support for charts and visualizations in both Databricks SQL and notebooks. Use the built-in visualization tool inside a Databricks notebook to quickly analyze your data and generate visualizations.
To create a visualization:
- After running a notebook cell with tabular data results, click + above a result and select Visualization. The visualization editor appears.
- Enter a visualization name as the new title in the visualization editor.
- In the Visualization Type drop-down, select your chart type.
- Customize and review the visualization properties. Select the columns you'd like to plot and how to group the data. Customize its appearance as desired. The fields available depend on the selected type.
- Click Save.
Debug notebooks
If you're working in Python, you can use the built-in interactive debugger in the Databricks notebook to help you debug your code. The interactive debugger provides breakpoints, step-by-step execution, variable inspection, and more tools to help you develop code in notebooks more efficiently.
Use the following steps to enable the debugger:
- Click your username at the upper-right of the workspace and select Settings from the dropdown list.
- In the Settings sidebar, select Developer.
- In the Editor settings section, toggle Python Notebook Interactive Debugger.
Schedule a notebook
You can create and manage notebook jobs directly in the notebook UI. If a notebook is already assigned to one or more jobs, you can create and manage schedules for those jobs. If a notebook is not assigned to a job, you can create a job and a schedule to run the notebook. See Schedule a notebook.
Git folders
Databricks Git folders is a visual Git client and API in Databricks. It supports common Git operations such as cloning a repository, committing and pushing, pulling, branch management, and visual comparison of diffs when committing.
Within Git folders you can develop code in notebooks or other files and follow data science and engineering code development best practices using Git for version control, collaboration, and CI/CD.
Databricks Assistant
Databricks Assistant is a context-aware AI assistant that can assist you with data and code. You can access the assistant in the SQL editor and in notebooks. The assistant offers:
- AI-based autocomplete.
- Data filtering with natural language prompts.
- Code debugging with Diagnose Error.
- Quick Fix, which presents automatic recommendations for fixing code errors that you can Accept and run.
Web terminal
The Databricks web terminal provides a convenient and highly interactive way to run shell commands. It's especially useful for advanced use cases, such as batch operations on multiple files, which existing user interfaces (UIs) might not fully support.
If the web terminal has been enabled by an account admin, you can launch the web terminal from notebooks running serverless compute environment version 2.
To launch the web terminal from a notebook:
- Connect the notebook to compute.
- At the bottom of the notebook's right sidebar, click the Open bottom panel icon
.
- Alternatively, click the attached compute drop-down, hover over the attached compute, then click Web Terminal.
The web terminal opens in a panel at the bottom of the screen. The buttons at the upper-right of the panel allow you to:
- Open a new terminal session in a new tab
.
- Reload a terminal session
.
- Close the bottom panel
. To reopen the panel, click
at the bottom of the right sidebar.