Skip to main content

Google Drive connector

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

The managed Google Drive connector in Lakeflow Connect allows you to ingest files from Google Drive into Databricks. Ingest unstructured files as binary data, parse structured formats (CSV, JSON, XML, EXCEL, and more) into Delta tables, or capture file metadata without loading file contents.

For the standard Google Drive connector that uses Spark reader APIs (read_files, spark.read, Auto Loader), see Ingest files from Google Drive.

What to know before you start

Topic

Why it matters

Databricks user persona

The workflow depends on your Databricks user persona:

  • Single-user: An administrator user creates a Unity Catalog connection and an ingestion pipeline.
  • Multi-user: An administrator user creates a connection for non-administrator users to create pipelines with.

Authentication method

The steps to create a connection depend on the authentication method you select.

Interface

The steps to create a pipeline depend on the interface.

Ingestion frequency

The pipeline schedule depends on your latency and cost requirements.

Common patterns

Depending on your ingestion needs, the pipeline might use configurations like history tracking, column selection, and row filtering. Supported configurations vary by connector. See Feature availability.

Start ingesting from Google Drive

The following table has an overview of the end-to-end Google Drive ingestion flow, based on user type:

User

Steps

Administrator

  1. Configure OAuth 2.0 and create a Unity Catalog connection. See Set up Google Drive for managed ingestion.
  2. Use Catalog Explorer to create a connection to Google Drive so that non-administrators can create pipelines. See Connect to managed ingestion sources.

Non-administrator

Use any supported interface to create a pipeline from an existing connection. See Ingest data from Google Drive.

Feature availability

Feature

Availability

UI-based pipeline authoring

Green check icon Supported

API-based pipeline authoring

Green check icon Supported

Declarative Automation Bundles

Green check icon Supported

Incremental ingestion

Green check icon Supported

Unity Catalog governance

Green check icon Supported

Orchestration using Databricks Workflows

Green check icon Supported

SCD type 2

Red X icon Not supported

Schema evolution

Green check icon Supported

Configurable via schema_evolution_mode. See Google Drive connector reference.

API-based column selection and deselection

Red X icon Not supported

API-based row filtering

Red X icon Not supported

Authentication methods

Authentication method

Availability

OAuth U2M

Green check icon Supported

OAuth M2M

Red X icon Not supported

OAuth (manual refresh token)

Red X icon Not supported

Basic authentication (username/password)

Red X icon Not supported

Basic authentication (API key)

Red X icon Not supported

Basic authentication (service account JSON key)

Red X icon Not supported