Ingest files from Google Drive

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

Compliance

The Google Drive connector supports use in workspaces with the Configure enhanced security and compliance settings enabled.

The standard Google Drive connector in Lakeflow Connect allows you to use Databricks Spark and SQL functions (read_files, spark.read, COPY INTO, and Auto Loader) to create Spark dataframes, materialized views, and streaming tables directly from files in Google Drive.

With this approach, you can create custom pipelines for common file ingestion use cases:

Streaming file ingestion (unstructured): Ingesting many source files (for example, PDFs, Google Docs, and Google Slides) into a single target table as binary data, ideal for RAG pipelines.
Streaming file ingestion (structured): Merging many source files (for example, CSV and JSON files) into a single, structured target table.
Batch file ingestion: Ingesting a single, specific file (such as a Google Sheet) or a batch of files into one target table.

These interfaces are supported:

Declarative Automation Bundles
Databricks APIs
Databricks SDKs
Databricks CLI

Choose your Google Drive connector

Lakeflow Connect offers two Google Drive connectors. They both access data in Google Drive, but differ in their level of management.

Connector	Description
Managed Google Drive connector	A fully-managed connector. Simple, low-maintenance connector for enterprise applications that ingests data into Delta tables and keeps them in sync with the source.
Standard Google Drive connector	Build custom ingestion pipelines with SQL, PySpark, or Lakeflow pipelines using batch and streaming APIs such as `read_files`, `spark.read`, `COPY INTO`, and Auto Loader. Offers the flexibility to perform complex transformations during ingestion, while giving you greater responsibility for managing and maintaining your pipelines.

Connector	Description
Managed Google Drive connector	A fully-managed connector. Simple, low-maintenance connector for enterprise applications that ingests data into Delta tables and keeps them in sync with the source.
Standard Google Drive connector	Build custom ingestion pipelines with SQL, PySpark, or Lakeflow pipelines using batch and streaming APIs such as `read_files`, `spark.read`, `COPY INTO`, and Auto Loader. Offers the flexibility to perform complex transformations during ingestion, while giving you greater responsibility for managing and maintaining your pipelines.

tip

Databricks recommends the managed Google Drive connector for most use cases. It provides a fully-managed experience with automatic incremental ingestion, broad format support, and flexible file and folder selection.

Limitations

The connector is API-only and does not support pipeline creation in the Databricks UI.
You can use the pathGlobFilter option to filter files by name (for example, pathGlobFilter => '*.csv'). Built-in Google formats (for example, Google Docs or Google Sheets) can't be filtered using this option. Folder path filtering is also not supported.
Unsupported formats include Google Forms, Google Sites, Google Jams, and Google Vids. The ingestion process skips unsupported formats.
Setting recursiveFileLookup=false is not supported for Spark batch reads. Using recursiveFileLookup=false will behave the same as recursiveFileLookup=true.

Requirements

Before you begin, verify that you have:

A Unity Catalog-enabled workspace.
CREATE CONNECTION privileges to create a Google Drive connection, or the appropriate privilege to use an existing one based on your cluster access mode:
- Dedicated Access Mode: MANAGE CONNECTION.
- Standard Access Mode: USE CONNECTION.
Databricks Runtime 17.3 or above.
The Excel file format Beta feature enabled, if you want to ingest Google Sheets or Excel files. See Read and stream Excel files.
A Google account with the necessary permissions to create a Google Cloud project.

Configure OAuth 2.0

Set up a Google Cloud project and enable the Google Drive API

Go to the Google Cloud console.
Create a new project. You might be prompted to set up two-factor authentication.
Go to APIs & Services > Library.
Search for "Google Drive API".
Select Google Drive API.
Select Enable.

On the Google Cloud console home screen, go to APIs & Services > OAuth Consent Screen. You will see a message that says "Google Auth Platform not configured yet".
Select Get Started.
Fill in the App Information section. Enter any name for the app name (for example, Databricks connection). The support email can be any email in your organization.
Select Next.
In the Audience section, select External, then select Next.
Fill in the Contact Information section, then select Next.
Review the Google API Services User Data Policy, then select Create.
Go back to Google Auth Platform > Data Access.
Select Add or remove scopes.
Add the following scope in the Manually add scopes section, select Add to table, then select Update: https://www.googleapis.com/auth/drive.readonly
Select Save.

Create OAuth 2.0 client credentials

On the Google Cloud console home screen, go to APIs & Services > Credentials.
Select Create Credentials > OAuth Client ID.
Choose Web Application and set a custom name.
In Authorized Redirect URIs, select Add URI.
Add a redirect URI to <databricks-instance-url>/login/oauth/google.html, replacing <databricks-instance-url> with your Databricks instance URL. For example: https://instance-name.databricks.com/login/oauth/google.html
Select Create. A dialog with your credentials appears.
Record the following values. Alternatively, you can download the OAuth Client JSON file, which contains this information:
- Client ID (format: 0123******-********************************.apps.googleusercontent.com)
- Client secret (format: ABCD**-****************************)

Add test users to your project

Go to Google Auth Platform > Audience.
Under Test users, select Add users.
Add the email address of the Google account you will use to create the connection.

Create a connection

In the Databricks workspace, select Catalog > External locations > Connections > Create connection.
On the Connection basics page of the Set up connection wizard, specify a unique Connection name.
In the Connection type dropdown menu, search for and select Google Drive.
(Optional) Add a comment.
Select Next.
On the Authentication page, enter the following:
- OAuth scope: https://www.googleapis.com/auth/drive.readonly
- Client secret: The client secret from Create OAuth 2.0 client credentials.
- Client ID: The client ID from Create OAuth 2.0 client credentials.
Select Sign in with Google and sign in with the Google account from Add test users to your project.
Select Continue, then select Continue again.
After you're redirected back to the Databricks workspace, select Create connection.

Ingest files from Google Drive

Ingest files from Google Drive using the connector and read_files (Databricks SQL), Auto Loader (.readStream with cloudFiles), COPY INTO, and spark.read. You must provide the following values:

The Google Drive URL as the path.
The Unity Catalog connection using the databricks.connection data source option.
The URL of the Google Drive resource you want to access. The URL can refer to a specific file, a folder, or an entire drive. For example:
- https://docs.google.com/spreadsheets/d/12345/edit?random_query_params_here
- https://drive.google.com/drive/u/0/folders/12345
- https://docs.google.com/document/d/12345/edit
- https://drive.google.com/file/d/1kiXnHmU4Y8X66ijULky5EPDNCGtT14Ps/view?usp=drive_link
- https://drive.google.com/drive/
- https://drive.google.com/drive/my-drive
- https://drive.google.com/

Stream Google Drive files using Auto Loader

Auto Loader is the most efficient way to incrementally ingest structured files from Google Drive. It automatically detects new files and processes them as they arrive. It can also ingest structured and semi-structured files such as CSV and JSON with automatic schema inference and evolution. For details about Auto Loader usage, see Common data loading patterns.

Python
# Incrementally ingest new PDF files
df = (spark.readStream.format("cloudFiles")
    .option("cloudFiles.format", "binaryFile")
    .option("databricks.connection", "my_gdrive_conn")
    .option("cloudFiles.schemaLocation", <path to a schema location>)
    .option("pathGlobFilter", "*.pdf")
    .load("https://drive.google.com/drive/folders/1a2b3c4d...")
    .select("*", "_metadata")
)

# Incrementally ingest CSV files with automatic schema inference and evolution
df = (spark.readStream.format("cloudFiles")
    .option("cloudFiles.format", "csv")
    .option("databricks.connection", "my_gdrive_conn")
    .option("pathGlobFilter", "*.csv")
    .option("inferColumnTypes", True)
    .option("header", True)
    .load("https://drive.google.com/drive/folders/1a2b3c4d...")
)

Read Google Drive files using Spark batch read

The following example shows how to ingest Google Drive files in Python using the spark.read function. For a list of supported file formats and Spark reader options, see Generic File Source Options in the Apache Spark documentation.

Setting recursiveFileLookup=false is not supported for Spark batch reads. Using recursiveFileLookup=false will behave the same as recursiveFileLookup=true.

Python
# Read unstructured data as binary files
df = (spark.read
        .format("binaryFile")
        .option("databricks.connection", "my_gdrive_conn")
        .option("recursiveFileLookup", True)
        .option("pathGlobFilter", "*.pdf")
        .load("https://drive.google.com/drive/folders/1a2b3c4d..."))

# Read a batch of CSV files, infer the schema, and load the data into a DataFrame
df = (spark.read
        .format("csv")
        .option("databricks.connection", "my_gdrive_conn")
        .option("pathGlobFilter", "*.csv")
        .option("recursiveFileLookup", True)
        .option("inferSchema", True)
        .option("header", True)
        .load("https://drive.google.com/drive/folders/1a2b3c4d..."))

Read a single structured Google Drive file

The following example reads a single tab of a Google Sheet and loads it into a DataFrame. It demonstrates a few common parsing options.

For a full list of supported parsing options for Excel files and Google Sheets, see Read and stream Excel files. For a full list of all other supported file formats and Spark reader options, see Generic File Source Options in the Apache Spark documentation.

Python
df = (spark.read
  .format("excel")  # use 'excel' for Google Sheets
  .option("databricks.connection", "my_gdrive_conn")
  .option("headerRows", 1) # optional
  .option("inferColumns", True) # optional
  .option("dataAddress", "Sheet1!A1:Z10")  # optional
  .load("https://docs.google.com/spreadsheets/d/9k8j7i6f..."))

Read Google Drive files using Spark SQL

The following example shows how to ingest Google Drive files in SQL using the read_files table-valued function. For details about read_files usage, see read_files table-valued function.

SQL
-- Read pdf files
CREATE TABLE my_table AS
SELECT * FROM read_files(
  "https://drive.google.com/drive/folders/1a2b3c4d...",
  `databricks.connection` => "my_gdrive_conn",
  format => "binaryFile",
  pathGlobFilter => "*.pdf", -- optional. Example: only ingest PDFs
  schemaEvolutionMode => "none"
);

-- Read a Google Sheet and range
CREATE TABLE my_sheet_table AS
SELECT * FROM read_files(
  "https://docs.google.com/spreadsheets/d/9k8j7i6f...",
  `databricks.connection` => "my_gdrive_conn",
  format => "excel",
  headerRows => 1,  -- optional
  dataAddress => "Sheet1!A2:D10", -- optional
  schemaEvolutionMode => "none"
);

Ingest Google Drive files using Lakeflow pipelines

note

The Google Drive connector requires Databricks Runtime 17.3 or later. To use the connector, set "CHANNEL" = "PREVIEW" in your pipeline settings. For more information on previews, see Pipeline properties reference.

This examples shows how to read Google Drive files using Auto Loader in Lakeflow pipelines. For more information, see Spark Declarative Pipelines

SQL
Python

SQL
-- Incrementally ingest new PDF files
CREATE OR REFRESH STREAMING TABLE gdrive_pdf_table
AS SELECT * FROM STREAM read_files(
  "https://drive.google.com/drive/folders/1a2b3c4d...",
  format => "binaryFile",
  `databricks.connection` => "my_gdrive_conn",
  pathGlobFilter => "*.pdf");

-- Incrementally ingest CSV files with automatic schema inference and evolution
CREATE OR REFRESH STREAMING TABLE gdrive_csv_table
AS SELECT * FROM STREAM read_files(
  "https://drive.google.com/drive/folders/1a2b3c4d...",
  format => "csv",
  `databricks.connection` => "my_gdrive_conn",
  pathGlobFilter => "*.csv",
  "header", "true");

-- Read a specific Excel file from Google Drive in a materialized view
CREATE OR REFRESH MATERIALIZED VIEW gdrive_excel_table
AS SELECT * FROM read_files(
  "https://docs.google.com/spreadsheets/d/9k8j7i6f...",
  `databricks.connection` => "my_gdrive_conn",
  format => "excel",
  headerRows => 1,  -- optional
  dataAddress => "Sheet1!A2:D10", -- optional
  `cloudFiles.schemaEvolutionMode` => "none"
);

Python
from pyspark import pipelines as dp

# Incrementally ingest new PDF files
@dp.table
def gdrive_pdf_table():
  return (spark.readStream.format("cloudFiles")
    .option("cloudFiles.format", "binaryFile")
    .option("databricks.connection", "my_gdrive_conn")
    .option("pathGlobFilter", "*.pdf")
    .load("https://drive.google.com/drive/folders/1a2b3c4d...")
  )

# Incrementally ingest CSV files with automatic schema inference and evolution
@dp.table
def gdrive_csv_table():
  return (spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .option("databricks.connection", "my_gdrive_conn")
      .option("pathGlobFilter", "*.csv")
      .option("inferColumnTypes", True)
      .option("header", True)
      .load("https://drive.google.com/drive/folders/1a2b3c4d...")
  )

# Read a specific Excel file from Google Drive in a materialized view
@dp.table
def gdrive_excel_table():
  return (spark.read.format("excel")
    .option("databricks.connection", "my_gdrive_conn")
    .option("headerRows", 1) # optional
    .option("inferColumnTypes", True) # optional
    .option("dataAddress", "Sheet1!A1:M20") # optional
    .load("https://docs.google.com/spreadsheets/d/9k8j7i6f..."))

Parse unstructured files with `ai_parse_document`

When you ingest unstructured files from Google Drive (such as PDFs, Word documents, or PowerPoint files) using the standard Google Drive connector with binaryFile format, the file contents are stored as raw binary data. To prepare these files for AI workloads—such as RAG, search, classification, or document understanding—you can parse the binary content into structured, queryable output using ai_parse_document.

The following example shows how to parse unstructured documents stored in a bronze Delta table named documents, adding a new column with parsed content:

SQL
CREATE TABLE documents AS
SELECT *, _metadata FROM read_files(
  "https://drive.google.com/drive/folders/1a2b3c4d...",
  `databricks.connection` => "my_gdrive_conn",
  format => "binaryFile",
  pathGlobFilter => "*.{pdf,jpeg}"
);
SELECT *, ai_parse_document(content) AS parsed_content
FROM documents;

The parsed_content column contains extracted text, tables, layout information, and metadata that can be directly used for downstream AI pipelines.

Incremental parsing with Lakeflow pipelines

You can also use ai_parse_document within Lakeflow pipelines to enable incremental parsing. As new files stream in from Google Drive, they are automatically parsed as your pipeline updates.

For example, you can define a materialized view that continuously parses newly ingested documents:

SQL
CREATE OR REFRESH STREAMING TABLE documents
AS SELECT *, "_metadata" FROM STREAM read_files(
  "https://drive.google.com/drive/folders/1a2b3c4d...",
  format => "binaryFile",
  `databricks.connection` => "my_gdrive_conn",
  pathGlobFilter => "*.{pdf,jpeg}");

CREATE OR REFRESH MATERIALIZED VIEW documents_parsed
AS
SELECT *, ai_parse_document(content) AS parsed_content
FROM documents;

This approach ensures that:

Newly ingested Google Drive files are parsed automatically whenever the materialized view refreshes
Parsed outputs stay in sync with incoming data
Downstream AI pipelines always operate on up-to-date document representations

For supported formats and advanced options, see ai_parse_document function.

How built-in Google formats are handled

You do not need to export your built-in Google files (Docs, Sheets) manually. The connector automatically exports them to an open format during ingestion.

Google format	Exported as (default)
Google Docs	`application/vnd.openxmlformats-officedocument.wordprocessingml.document` (DOCX)
Google Sheets	`application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` (XLSX)
Google Slides	`application/vnd.openxmlformats-officedocument.presentationml.presentation` (PPTX)
Google Drawings	`application/pdf` (PDF)

Google format	Exported as (default)
Google Docs	`application/vnd.openxmlformats-officedocument.wordprocessingml.document` (DOCX)
Google Sheets	`application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` (XLSX)
Google Slides	`application/vnd.openxmlformats-officedocument.presentationml.presentation` (PPTX)
Google Drawings	`application/pdf` (PDF)

Google Drive export format configuration

You can configure how Google Drive native files are exported by setting Spark configurations using spark.conf.set(). These configurations determine the MIME type used when exporting Google Docs, Sheets, Slides, and Drawings.

Configuration keys:

fs.gdrive.format.document.export: Google Docs export format.
fs.gdrive.format.spreadsheet.export: Google Sheets export format.
fs.gdrive.format.presentation.export: Google Slides export format.
fs.gdrive.format.drawing.export: Google Drawings export format.

For a complete list of supported export formats, see Export MIME types for Google Workspace documents in the Google Workspace documentation.

The following example exports a Google Docs file as TXT.

Python
spark.conf.set("fs.gdrive.format.document.export", "text/plain")
df = spark.read.text("https://docs.google.com/document/d/1a2b3c4d...")

The following example exports a Google Sheets file as CSV.

Python
spark.conf.set("fs.gdrive.format.spreadsheet.export", "text/csv")
df = spark.read.option("header", "true").csv("https://docs.google.com/spreadsheets/d/1a2b3c4d...")

Schemas

binaryFile format

When you use format => 'binaryFile', the resulting table has the following schema:

path (string): The full URL to the file.
modificationTime (timestamp): The last modified time of the file.
length (long): The size of the file in bytes.
content (binary): The raw binary content of the file.

_metadata column

You can get metadata information for input files with the _metadata column such as file_name, file_path, file_size, and file_modification_time. The _metadata column is a hidden column, and is available for all input file formats. To include the _metadata column in the returned DataFrame, you must explicitly select it in the read query where you specify the source. For more information, see File metadata column.

For Google Drive native files (such as Google Docs, Google Sheets, and Google Slides), the file_size field refers to the size of the file stored on Google Drive, not the size of the file in the exported format (for example, DOCX, XLSX, or PPTX).

Example of selecting _metadata:

SQL
SELECT *, _metadata FROM read_files(
  "https://drive.google.com/",
  `databricks.connection` => "my_connection",
  format => "binaryFile"
);

Google Drive metadata column

The _gdrive_metadata column is a hidden metadata column that provides access to Google Drive-specific properties of ingested files, sourced from the Google Drive files resource. It requires Databricks Runtime 18.1 or above and is available for all file formats when reading from Google Drive. To include the _gdrive_metadata column in the returned DataFrame, you must explicitly select it in the read query.

If the data source contains a column named _gdrive_metadata, the Google Drive metadata column is renamed to __gdrive_metadata (with an extra leading underscore) to deduplicate. Additional underscores are added until the name is unique.

Common file metadata like the file path or size can be queried using the _metadata column. For more information, see File metadata column.

warning

New fields might be added to the _gdrive_metadata column in future releases. To prevent schema evolution errors if the _gdrive_metadata column is updated, you can select specific fields from the column in your queries. See Examples.

Schema

The _gdrive_metadata column is a STRUCT containing the following fields. All fields are nullable.

Name	Type	Description	Example	Minimum Databricks Runtime release
id	`STRING`	The Google Drive file ID.	`1pCzwOApmvUJCtXtav265-i4E7mYf2feF`	18.1
drive_id	`STRING`	The ID of the shared drive that contains the file. `null` for files in a user's My Drive.	`0ABpL6n51HPGXUk9PVA`	18.1
parent_id	`STRING`	The ID of the file's parent folder. Google Drive files can have multiple parents; this returns the first.	`1a2b3c4d5e6f7g8h9i0j`	18.1
web_url	`STRING`	The browser URL of the file on Google Drive.	`https://drive.google.com/file/d/1pCzwOApmvUJCtXtav265-i4E7mYf2feF/view`	18.1
mime_type	`STRING`	The MIME type of the file. For Google Workspace files, this is the native Google type (for example, `application/vnd.google-apps.document`), not the exported type.	`text/csv`	18.1
md5_checksum	`STRING`	The MD5 checksum of the file's content. Only populated for binary files; `null` for Google Workspace files.	`06ffb3e392fc5459e5322aad81b4f78b`	18.1
version	`STRING`	A monotonically increasing version number for the file.	`12`	18.1
created_timestamp	`TIMESTAMP`	The time the file was created.	`2025-12-01 10:16:19`	18.1
last_modified_by_email	`STRING`	The email of the user who last modified the file.	`alice@example.com`	18.1
last_modified_by_name	`STRING`	The display name of the user who last modified the file.	`Alice Example`	18.1
shared	`BOOLEAN`	Whether the file has been shared with users other than the owner.	`true`	18.1
properties	`VARIANT`	Custom public properties set on the file. See Properties in the Drive API.	`{"department":"finance"}`	18.1
additional_metadata	`VARIANT`	Any other file resource fields returned by the Drive API but not extracted above.	`{"capabilities":{"canEdit":true},...}`	18.1

Name	Type	Description	Example	Minimum Databricks Runtime release
id	`STRING`	The Google Drive file ID.	`1pCzwOApmvUJCtXtav265-i4E7mYf2feF`	18.1
drive_id	`STRING`	The ID of the shared drive that contains the file. `null` for files in a user's My Drive.	`0ABpL6n51HPGXUk9PVA`	18.1
parent_id	`STRING`	The ID of the file's parent folder. Google Drive files can have multiple parents; this returns the first.	`1a2b3c4d5e6f7g8h9i0j`	18.1
web_url	`STRING`	The browser URL of the file on Google Drive.	`https://drive.google.com/file/d/1pCzwOApmvUJCtXtav265-i4E7mYf2feF/view`	18.1
mime_type	`STRING`	The MIME type of the file. For Google Workspace files, this is the native Google type (for example, `application/vnd.google-apps.document`), not the exported type.	`text/csv`	18.1
md5_checksum	`STRING`	The MD5 checksum of the file's content. Only populated for binary files; `null` for Google Workspace files.	`06ffb3e392fc5459e5322aad81b4f78b`	18.1
version	`STRING`	A monotonically increasing version number for the file.	`12`	18.1
created_timestamp	`TIMESTAMP`	The time the file was created.	`2025-12-01 10:16:19`	18.1
last_modified_by_email	`STRING`	The email of the user who last modified the file.	`alice@example.com`	18.1
last_modified_by_name	`STRING`	The display name of the user who last modified the file.	`Alice Example`	18.1
shared	`BOOLEAN`	Whether the file has been shared with users other than the owner.	`true`	18.1
properties	`VARIANT`	Custom public properties set on the file. See Properties in the Drive API.	`{"department":"finance"}`	18.1
additional_metadata	`VARIANT`	Any other file resource fields returned by the Drive API but not extracted above.	`{"capabilities":{"canEdit":true},...}`	18.1

note

The properties and additional_metadata fields are returned as VARIANT. See VARIANT type.

Examples

The following examples show how to include the _gdrive_metadata column in a read query, select specific fields from the column, and extract values from VARIANT fields.

Python
SQL

Python
df = (spark.read
  .format("binaryFile")
  .option("databricks.connection", "my_gdrive_conn")
  .load("https://drive.google.com/drive/folders/1a2b3c4d...")
  .select("*", "_metadata", "_gdrive_metadata"))

SQL
SELECT *, _gdrive_metadata
FROM read_files(
  "https://drive.google.com/drive/folders/1a2b3c4d...",
  `databricks.connection` => "my_gdrive_conn",
  format => "binaryFile"
);

Select specific fields from the _gdrive_metadata struct:

Python
df = (spark.read
  .format("binaryFile")
  .option("databricks.connection", "my_gdrive_conn")
  .load("https://drive.google.com/drive/folders/1a2b3c4d...")
  .select("_gdrive_metadata.id", "_gdrive_metadata.md5_checksum"))

Extract values from the properties or additional_metadata VARIANT fields using the :: cast operator:

SQL
SELECT
  *,
  _gdrive_metadata.properties:department::STRING AS department
FROM read_files(
  "https://drive.google.com/drive/folders/1a2b3c4d...",
  `databricks.connection` => "my_gdrive_conn",
  format => "binaryFile"
);

FAQ

I have a folder of multiple structured files (for example, multiple Google Sheets). How do I load each sheet or file as its own Delta table?

You must create a new ingestion query for each file to ingest into its own Delta table.

My files require custom parsing. How do I provide these parsing parameters to ensure my files are read correctly?

The connector supports all file format options available in Auto Loader, COPY INTO, and Spark. For details, see the following:

Are files in subfolders recursively ingested?

When you use Auto Loader APIs (spark.readStream and read_files), all subfolders are recursively discovered and ingested. This is also true for batch spark.read, where files in the provided folder path are always read recursively.

My Google Sheet has many irregularities and requires specific parsing and cell-range extraction (for example, multiple tables per sheet). The schema cannot be inferred automatically. How do I handle this?

You can use Excel file format parsing options to parse your Google sheet file to your desired format. See Read and stream Excel files.

Alternatively, you can disable schema inference on Auto Loader, PySpark, or read_files. The resulting table has default column names, all data types are cast to a string, and the table might be sparse. You can then perform any necessary transformations downstream.

Next steps

Explore Common data loading patterns
Review ai_parse_document function

Choose your Google Drive connector​

Limitations​

Requirements​

Configure OAuth 2.0​

Set up a Google Cloud project and enable the Google Drive API​

Set up the OAuth consent screen for your project​

Create OAuth 2.0 client credentials​

Add test users to your project​

Create a connection​

Ingest files from Google Drive​

Stream Google Drive files using Auto Loader​

Read Google Drive files using Spark batch read​

Read a single structured Google Drive file​

Read Google Drive files using Spark SQL​

Ingest Google Drive files using Lakeflow pipelines​

Parse unstructured files with ai_parse_document​

Incremental parsing with Lakeflow pipelines​

How built-in Google formats are handled​

Google Drive export format configuration​

Schemas​

Google Drive metadata column​

Schema​

Examples​

FAQ​

Next steps​

Choose your Google Drive connector

Limitations

Requirements

Configure OAuth 2.0

Set up a Google Cloud project and enable the Google Drive API

Set up the OAuth consent screen for your project

Create OAuth 2.0 client credentials

Add test users to your project

Create a connection

Ingest files from Google Drive

Stream Google Drive files using Auto Loader

Read Google Drive files using Spark batch read

Read a single structured Google Drive file

Read Google Drive files using Spark SQL

Ingest Google Drive files using Lakeflow pipelines

Parse unstructured files with `ai_parse_document`

Incremental parsing with Lakeflow pipelines

How built-in Google formats are handled

Google Drive export format configuration

Schemas

Google Drive metadata column

Schema

Examples

FAQ

Next steps