Ingest or connect raw data

Preview

This feature is in Private Preview. To try it, reach out to your Databricks contact.

Looking for a different RAG Studio doc? Go to the RAG documentation index

The following guide walks you through ingesting data for your RAG Studio application.

Important

The default 📥 Data Ingestor downloads the Databricks documentation.

You can modify the code in src/notebooks/ingest_data.py to ingest from another source or adjust config/rag-config.yml to use data that already exists in a Unity Catalog Volume.

The default 🗃️ Data Processor that ships with RAG Studio only supports HTML files. If you have other file types in your Unity Catalog Volume, follow the steps in Creating a 🗃️ Data Processor version to adjust the 🗃️ Data Processor code.

  1. Run the following command to start the data ingestion process. This step will take approximately 10 minutes.

    ./rag ingest-data -e dev
    
  2. You will see the following message in your console when the ingestion completes.

    -------------------------
    Run URL: <URL to the deployment Databricks Job>
    
    <timestamp> "[dev e] [databricks-docs-bot][dev] ingest_data" RUNNING
    <timestamp> "[dev e] [databricks-docs-bot][dev] ingest_data" TERMINATED SUCCESS
    Successfully downloaded and uploaded Databricks documentation articles to UC Volume '`catalog`.`schema`.`raw_databricks_docs`'
    

Follow the next tutorial!

Deploy a version of a RAG Application