Skip to main content

Use Genie Code for data science

This page introduces Genie Code for data science. Designed specifically for Databricks notebooks and the SQL Editor, Genie Code in Agent mode can explore data, generate and run code, and fix errors—all from a single prompt.

What is Genie Code for data science?

Genie Code’s Agent mode can automate entire multi-step data science workflows in Databricks notebooks and the SQL Editor.

Use the Data Science Agent in a notebook.

Compared to Genie Code Chat mode, Agent mode has expanded capabilities: planning a solution, retrieving relevant assets, running code, using cell outputs to improve results, fixing errors automatically, and more.

Genie Code can plan and generate code to run in notebooks or queries to run in the SQL editor. Genie Code works with you to approve its plans and confirm its next steps before proceeding. With your approval, Genie Code can use tools to perform tasks like searching tables, editing a notebook, running cells, and reading cell outputs.

Genie Code's access and actions are governed by the user’s permissions. It can only access data that you have access to and perform operations that you have permissions for.

Requirements

This feature is in Public Preview.

To use Genie Code's agentic data science capabilities, your workspace needs the following:

Use Genie Code for data science

To use Genie Code for data science tasks:

  1. From a Databricks notebook or the SQL Editor, open Genie Code side panel.

  2. In the bottom right corner, select Agent. This toggles on Genie Code’s Agent mode, allowing you to interact with its agentic data science capabiltiies.

    Open Data Science Agent

  3. Enter a prompt for Genie Code. For example, "Analyze @sales_transactions from samples.bakehouse to identify the top-selling product."

    tip

    Reference specific tables by using @table_name. The agent will use that table and any associated metadata to curate its response. The agent respects the user’s Unity Catalog permissions, so it can only access the data that you have access to.

  4. As Genie Code generates its response, it often pauses to get your input:

    • For more complex tasks, Genie Code may create a step-by-step plan and ask clarifying questions. Answer its clarifying questions to help it hone its plan.

    • When Genie Code needs to run code, it asks for your approval before proceeding. Allow or Decline its request. You can also select Allow in this thread (referring to Genie Code conversation thread) or Always allow.

      important

      Genie Code can generate and run code in your notebook. While it has guardrails to prevent dangerous actions, there is still risk. You should only use it with code and data you trust

    • As Genie Code continues its work, you may be prompted to select Continue or Reject. Review Genie Code's existing work, then select Continue to allow it to continue to its next steps or Reject to tell it to try something else.

    • To stop Genie Code while it is working, click the red Stop icon..

Genie Code can create new notebook cells (or queries), generate text and code, run the notebook cells, and access the cell output to interpret the results.

note

In order for Genie Code to continue its work and take next steps, you need to stay on the current tab that its working in.

tip

You can add instructions for Genie Code in Agent mode to use in most responses. For example, if you have code conventions or preferred libraries to use, you can add these guidelines to instructions for Genie Code. You can also create skills to extend Genie Code with specialized capabilities for your domain-specific tasks. For more details and other tips, see Tips to improve Genie Code responses.

Use cases

In Agent mode, Genie Code has expanded capabilities, such as finding data, interpreting outputs, and performing cell actions.

Genie Code can help with complex data science tasks, including exploratory data analysis, forecasting, and machine learning. You can even use create a new data analysis notebook from scratch with Genie Code. For better results, provide the agent with context by referencing tables, pipelines, notebooks, queries, and files with @<resource_name>. You can also click At icon. Add context to manually select context to provide. Each reference asset persists in the chat context.

Try the following prompts to get started:

  • Data discovery:
    • "Which table contains bakehouse transaction data?"
    • "I want to see the weather data for the date 2025-01-01 in the city of Los Angeles, CA."
    • "Find a table that contains New York City taxi data and show me the first 10 rows."
  • Exploratory data analysis:
    • "Help me parse the JSON string in column A."
    • "Create a visualization of the data from this table."
    • "Interpret this bar chart."
    • "Describe the @sales_transactions dataset. Perform some EDA to help me understand the column statistics and visualize the distribution of values. Think like a data scientist."
    • “Analyze @workload_insights to find the top 5 customers for Databricks SQL workloads last week by revenue. Then plot how many users those customers had for Databricks SQL per week for the last 6 weeks.”
  • Forecasting:
    • "Using the @incidents dataset, build a forecast of the daily number of incidents for the next 2 weeks. When you’re done, give me a data table and an interactive chart to display the results."
    • "Using the @website_traffic dataset, predict daily visitor counts for the upcoming month. Highlight any seasonal patterns."
    • "Generate a forecast of product demand for the next 6 months from the @inventory dataset, including confidence intervals."
  • Machine learning:
    • "Perform some data preparation and feature engineering to prepare this dataset for model training."
    • "Train a classification model on the @customer_data dataset to predict churn. Evaluate the model with accuracy and AUC metrics."
    • "Perform hyperparameter tuning on a regression model using the @housing_prices dataset to improve prediction error."
    • "Build a clustering model on the @sales_leads dataset to identify customer segments and provide a summary of each cluster's characteristics."
  • Notebook organization:
    • "Create a new cell that summarizes the results from this notebook."
    • "Give this notebook a relevant name."

Exploratory data analysis

Use Genie Code to perform exploratory data analysis on a dataset. For example, try asking it to help you create a new notebook that analyzes the samples.bakehouse.sales_transactions dataset.

In an empty notebook tab, open Genie Code panel, select Agent mode, and enter the following prompt: "Describe the dataset, @sales_transactions from samples.bakehouse. I want to do some EDA so I can understand the column statistics and visualize the distribution of values. Think like a data scientist."

Data Science Agent creates a notebook for EDA.

The agent creates a plan to answer your prompt and might ask clarifying questions. With your approval, it generates new notebook cells that include code to explore the data and text that explains its process and findings.