Skip to main content

Create a Google Analytics Raw Data ingestion pipeline

Preview

The Google Analytics Raw Data connector is in Public Preview.

This article describes how to create a Google Analytics Raw Data ingestion pipeline using Databricks Lakeflow Connect and Google BigQuery. You can create the pipeline using the Databricks UI or Databricks APIs.

Before you begin

To create an ingestion pipeline, you must meet the following requirements:

  • Your workspace is enabled for Unity Catalog.

  • Serverless compute is enabled for your workspace. See Enable serverless compute.

  • If you plan to create a connection: You have CREATE CONNECTION privileges on the metastore.

    If you plan to use an existing connection: You have USE CONNECTION privileges or ALL PRIVILEGES on the connection object.

  • You have USE CATALOG privileges on the target catalog.

  • You have USE SCHEMA and CREATE TABLE privileges on an existing schema or CREATE SCHEMA privileges on the target catalog.

To ingest from GA4 using BigQuery, see Set up Google Analytics 4 and Google BigQuery for Databricks ingestion.

Configure networking

If you have serverless egress control enabled, allowlist the following URLs. Otherwise, skip this step. See Managing network policies for serverless egress control.

  • bigquery.googleapis.com
  • oauth2.googleapis.com
  • bigquerystorage.googleapis.com
  • googleapis.com

Create the ingestion pipeline

Permissions required: USE CONNECTION or ALL PRIVILEGES on a connection.

This step describes how to create the ingestion pipeline. Each ingested table is written to a streaming table with the same name.

  1. In the sidebar of the Databricks workspace, click Data Ingestion.

  2. On the Add data page, under Databricks connectors, click Google Analytics 4.

    The ingestion wizard opens.

  3. On the Ingestion pipeline page of the wizard, enter a unique name for the pipeline.

  4. In the Destination catalog drop-down menu, select a catalog. Ingested data and event logs will be written to this catalog. You’ll select a destination schema later.

  5. Select the Unity Catalog connection that stores the credentials required to access the source data.

    If there are no existing connections to the source, click Create connection and enter the authentication details you obtained in Set up Google Analytics 4 and Google BigQuery for Databricks ingestion. You must have CREATE CONNECTION privileges on the metastore.

  6. Click Create pipeline and continue.

  7. On the Source page, select the tables to ingest into Databricks, and then click Next.

  8. On the Destination page, select the Unity Catalog catalog and schema to write to.

    If you don't want to use an existing schema, click Create schema. You must have USE CATALOG and CREATE SCHEMA privileges on the parent catalog.

  9. Click Save pipeline and continue.

  10. (Optional) On the Settings page, click Create schedule. Set the frequency to refresh the destination tables.

  11. (Optional) Set email notifications for pipeline operation success or failure.

  12. Click Save and run pipeline.

Update your pipeline schedule and notifications

  1. After the pipeline has been created, revisit the Databricks workspace, and then click Pipelines.

    The new pipeline appears in the pipeline list.

  2. To view the pipeline details, click the pipeline name.

  3. On the pipeline details page, you can schedule the pipeline by clicking Schedule.

  4. To set notifications on the pipeline, click Settings, and then add a notification.

Was this article helpful?