Google Drive connector reference

This page contains reference documentation for the Google Drive connector in Databricks Lakeflow Connect.

`gdrive_options` parameters

Set these options inside the connector_options.gdrive_options block of each table in your pipeline definition.

Parameter	Type	Required	Description
`entity_type`	String	Yes	The type of entity to ingest. Supported values: `FILE`: Ingest file content and metadata. `FILE_METADATA`: Ingest metadata only, without downloading file contents.
`url`	String	Yes	The URL of the Google Drive folder or shared drive to ingest from. Examples: `https://drive.google.com/drive/folders/<folder_id>` A shared drive URL
`file_ingestion_options`	Object	Yes	Controls file format and ingestion behavior. See `file_ingestion_options` parameters.

Parameter

Type

Required

Description

entity_type

String

Yes

The type of entity to ingest. Supported values:

FILE: Ingest file content and metadata.
FILE_METADATA: Ingest metadata only, without downloading file contents.

url

String

Yes

The URL of the Google Drive folder or shared drive to ingest from. Examples:

https://drive.google.com/drive/folders/<folder_id>
A shared drive URL

file_ingestion_options

Object

Yes

Controls file format and ingestion behavior. See file_ingestion_options parameters.

Parameter	Type	Required	Description
`entity_type`	String	Yes	The type of entity to ingest. Supported values: `FILE`: Ingest file content and metadata. `FILE_METADATA`: Ingest metadata only, without downloading file contents.
`url`	String	Yes	The URL of the Google Drive folder or shared drive to ingest from. Examples: `https://drive.google.com/drive/folders/<folder_id>` A shared drive URL
`file_ingestion_options`	Object	Yes	Controls file format and ingestion behavior. See `file_ingestion_options` parameters.

Parameter

Type

Required

Description

entity_type

String

Yes

The type of entity to ingest. Supported values:

FILE: Ingest file content and metadata.
FILE_METADATA: Ingest metadata only, without downloading file contents.

url

String

Yes

The URL of the Google Drive folder or shared drive to ingest from. Examples:

https://drive.google.com/drive/folders/<folder_id>
A shared drive URL

file_ingestion_options

Object

Yes

Controls file format and ingestion behavior. See file_ingestion_options parameters.

`file_ingestion_options` parameters

Set these options inside gdrive_options.file_ingestion_options.

Parameter	Type	Required	Description
`format`	String	Yes	The file format to ingest. Supported values: `BINARYFILE`, `CSV`, `JSON`, `XML`, `EXCEL`, `PARQUET`, `AVRO`, `ORC`. Use `BINARYFILE` for unstructured ingestion (PDFs, Office files, images). Use a structured format to parse file contents into rows.
`file_filters`	Array of objects	No	Filters that restrict which files are ingested. Each filter object can contain one of the following keys: `path_filter` (string): A glob pattern matched against file paths. Based on Spark path glob filter. `modified_before` (string): A timestamp in `YYYY-MM-DDTHH:mm:ss` format. Only files modified before this time are ingested. `modified_after` (string): A timestamp in `YYYY-MM-DDTHH:mm:ss` format. Only files modified after this time are ingested.
`schema_evolution_mode`	String	No	Controls how new columns in incoming files are handled. Modes match Auto Loader schema evolution modes. Supported values: `ADD_NEW_COLUMNS_WITH_TYPE_WIDENING` (default), `ADD_NEW_COLUMNS`, `RESCUE`, `FAIL_ON_NEW_COLUMNS`, `NONE`.
`schema_hints`	String	No	Overrides inferred column types. Specify as a comma-separated list of `column_name TYPE` pairs, for example `order_id INT, amount DOUBLE`. See Override schema inference with schema hints.
`format_options`	Object	No	Format-specific parsing options. Keys are standard Auto Loader format option names. See Format options.

Parameter	Type	Required	Description
`format`	String	Yes	The file format to ingest. Supported values: `BINARYFILE`, `CSV`, `JSON`, `XML`, `EXCEL`, `PARQUET`, `AVRO`, `ORC`. Use `BINARYFILE` for unstructured ingestion (PDFs, Office files, images). Use a structured format to parse file contents into rows.
`file_filters`	Array of objects	No	Filters that restrict which files are ingested. Each filter object can contain one of the following keys: `path_filter` (string): A glob pattern matched against file paths. Based on Spark path glob filter. `modified_before` (string): A timestamp in `YYYY-MM-DDTHH:mm:ss` format. Only files modified before this time are ingested. `modified_after` (string): A timestamp in `YYYY-MM-DDTHH:mm:ss` format. Only files modified after this time are ingested.
`schema_evolution_mode`	String	No	Controls how new columns in incoming files are handled. Modes match Auto Loader schema evolution modes. Supported values: `ADD_NEW_COLUMNS_WITH_TYPE_WIDENING` (default), `ADD_NEW_COLUMNS`, `RESCUE`, `FAIL_ON_NEW_COLUMNS`, `NONE`.
`schema_hints`	String	No	Overrides inferred column types. Specify as a comma-separated list of `column_name TYPE` pairs, for example `order_id INT, amount DOUBLE`. See Override schema inference with schema hints.
`format_options`	Object	No	Format-specific parsing options. Keys are standard Auto Loader format option names. See Format options.

`table_configuration` parameters

Set these options inside the table_configuration block of each table in your pipeline definition. table_configuration is a sibling of connector_options, not nested inside it.

Parameter	Type	Required	Description
`storage_mode`	String	No	The storage mode for the destination table. Supported values: `SCD_TYPE_1` (default for `BINARYFILE`): Overwrites records when files change or are deleted. `APPEND_ONLY` (default for structured formats): Appends new rows from new or updated files. Because these are the defaults and the only supported values, setting `storage_mode` explicitly is optional. Do not use the `scd_type` field — it throws an error.

Parameter

Type

Required

Description

storage_mode

String

The storage mode for the destination table. Supported values:

SCD_TYPE_1 (default for BINARYFILE): Overwrites records when files change or are deleted.
APPEND_ONLY (default for structured formats): Appends new rows from new or updated files.

Because these are the defaults and the only supported values, setting storage_mode explicitly is optional. Do not use the scd_type field — it throws an error.

Parameter	Type	Required	Description
`storage_mode`	String	No	The storage mode for the destination table. Supported values: `SCD_TYPE_1` (default for `BINARYFILE`): Overwrites records when files change or are deleted. `APPEND_ONLY` (default for structured formats): Appends new rows from new or updated files. Because these are the defaults and the only supported values, setting `storage_mode` explicitly is optional. Do not use the `scd_type` field — it throws an error.

Parameter

Type

Required

Description

storage_mode

String

The storage mode for the destination table. Supported values:

SCD_TYPE_1 (default for BINARYFILE): Overwrites records when files change or are deleted.
APPEND_ONLY (default for structured formats): Appends new rows from new or updated files.

Because these are the defaults and the only supported values, setting storage_mode explicitly is optional. Do not use the scd_type field — it throws an error.

Format options

The format_options block accepts standard Auto Loader format option keys, organized below by file format. For details, see Auto Loader.

JSON

Key	Description
`allowBackslashEscapingAnyCharacter`	Allows backslashes to escape any character.
`allowComments`	Allows Java- and C++-style comments in JSON content.
`allowNonNumericNumbers`	Allows `NaN` and `Infinity` as valid float values.
`allowNumericLeadingZeros`	Allows leading zeros in integer values.
`allowSingleQuotes`	Allows single quotes as string delimiters in addition to double quotes.
`allowUnquotedControlChars`	Allows unquoted control characters in JSON strings.
`allowUnquotedFieldNames`	Allows unquoted field names.
`badRecordsPath`	Path to store corrupt or unparseable records instead of failing the pipeline.
`charset` / `encoding`	Character encoding of the file (for example, `UTF-8`, `ISO-8859-1`).
`dateFormat`	Pattern for parsing date strings (for example, `yyyy-MM-dd`).
`dropFieldIfAllNull`	Ignores columns where all values are null or empty during schema inference.
`inferTimestamp`	Infers `TimestampType` for strings that match a timestamp pattern.
`lineSep`	Line separator character or string.
`locale`	Locale for parsing dates and numbers (for example, `en-US`).
`mode`	Behavior for malformed records: `PERMISSIVE` (default), `DROPMALFORMED`, or `FAILFAST`.
`multiLine`	Parses records that span multiple lines.
`prefersDecimal`	Infers `DecimalType` instead of `FloatType` or `DoubleType` where possible.
`primitivesAsString`	Infers all primitive values as `StringType`.
`readerCaseSensitive`	Enables case-sensitive column name matching against the schema.
`timestampFormat`	Pattern for parsing timestamp strings (for example, `yyyy-MM-dd'T'HH:mm:ss`).
`timeZone`	Time zone for parsing timestamps (for example, `UTC`, `America/New_York`).

Key	Description
`allowBackslashEscapingAnyCharacter`	Allows backslashes to escape any character.
`allowComments`	Allows Java- and C++-style comments in JSON content.
`allowNonNumericNumbers`	Allows `NaN` and `Infinity` as valid float values.
`allowNumericLeadingZeros`	Allows leading zeros in integer values.
`allowSingleQuotes`	Allows single quotes as string delimiters in addition to double quotes.
`allowUnquotedControlChars`	Allows unquoted control characters in JSON strings.
`allowUnquotedFieldNames`	Allows unquoted field names.
`badRecordsPath`	Path to store corrupt or unparseable records instead of failing the pipeline.
`charset` / `encoding`	Character encoding of the file (for example, `UTF-8`, `ISO-8859-1`).
`dateFormat`	Pattern for parsing date strings (for example, `yyyy-MM-dd`).
`dropFieldIfAllNull`	Ignores columns where all values are null or empty during schema inference.
`inferTimestamp`	Infers `TimestampType` for strings that match a timestamp pattern.
`lineSep`	Line separator character or string.
`locale`	Locale for parsing dates and numbers (for example, `en-US`).
`mode`	Behavior for malformed records: `PERMISSIVE` (default), `DROPMALFORMED`, or `FAILFAST`.
`multiLine`	Parses records that span multiple lines.
`prefersDecimal`	Infers `DecimalType` instead of `FloatType` or `DoubleType` where possible.
`primitivesAsString`	Infers all primitive values as `StringType`.
`readerCaseSensitive`	Enables case-sensitive column name matching against the schema.
`timestampFormat`	Pattern for parsing timestamp strings (for example, `yyyy-MM-dd'T'HH:mm:ss`).
`timeZone`	Time zone for parsing timestamps (for example, `UTC`, `America/New_York`).

CSV

Supports all JSON options above, plus the following CSV-specific options:

Key	Description
`charToEscapeQuoteEscaping`	Escape character used before a quote character inside a quoted field.
`comment`	Character that marks a line as a comment; lines beginning with this character are skipped.
`delimiter` / `sep`	Column delimiter character (default: `,`).
`emptyValue`	String to use for empty values when writing.
`enforceSchema`	Applies the declared schema to CSV data, ignoring header names.
`escape`	Escape character (default: `\`).
`header`	Whether the first row contains column names (default: `false`).
`ignoreLeadingWhiteSpace`	Trims leading whitespace from values.
`ignoreTrailingWhiteSpace`	Trims trailing whitespace from values.
`maxCharsPerColumn`	Maximum number of characters allowed per column value.
`maxColumns`	Maximum number of columns allowed in a record.
`mergeSchema`	Merges schema across multiple CSV files.
`nanValue`	String representation of `NaN`.
`negativeInf`	String representation of negative infinity.
`nullValue`	String that represents a null value.
`parserCaseSensitive`	Enables case-sensitive matching between header names and schema field names.
`positiveInf`	String representation of positive infinity.
`preferDate`	Infers `DateType` for date-like strings instead of `TimestampType`.
`quote`	Quote character used to enclose field values that contain the delimiter (default: `"`).
`skipRows`	Number of rows to skip at the beginning of the file before the header or data.
`unescapedQuoteHandling`	How to handle unescaped quote characters inside quoted fields.

Key	Description
`charToEscapeQuoteEscaping`	Escape character used before a quote character inside a quoted field.
`comment`	Character that marks a line as a comment; lines beginning with this character are skipped.
`delimiter` / `sep`	Column delimiter character (default: `,`).
`emptyValue`	String to use for empty values when writing.
`enforceSchema`	Applies the declared schema to CSV data, ignoring header names.
`escape`	Escape character (default: `\`).
`header`	Whether the first row contains column names (default: `false`).
`ignoreLeadingWhiteSpace`	Trims leading whitespace from values.
`ignoreTrailingWhiteSpace`	Trims trailing whitespace from values.
`maxCharsPerColumn`	Maximum number of characters allowed per column value.
`maxColumns`	Maximum number of columns allowed in a record.
`mergeSchema`	Merges schema across multiple CSV files.
`nanValue`	String representation of `NaN`.
`negativeInf`	String representation of negative infinity.
`nullValue`	String that represents a null value.
`parserCaseSensitive`	Enables case-sensitive matching between header names and schema field names.
`positiveInf`	String representation of positive infinity.
`preferDate`	Infers `DateType` for date-like strings instead of `TimestampType`.
`quote`	Quote character used to enclose field values that contain the delimiter (default: `"`).
`skipRows`	Number of rows to skip at the beginning of the file before the header or data.
`unescapedQuoteHandling`	How to handle unescaped quote characters inside quoted fields.

XML

Key	Description
`arrayElementName`	Name of the XML element wrapping each array item when writing.
`attributePrefix`	Prefix added to XML attribute names to distinguish them from element names (default: `_`).
`compression`	Compression codec for reading (for example, `gzip`, `bzip2`).
`declaration`	XML declaration string to prepend when writing.
`encoding`	Character encoding of the XML file.
`excludeAttribute`	Excludes XML element attributes from parsing.
`ignoreSurroundingSpaces`	Ignores whitespace surrounding element values.
`ignoreNamespace`	Ignores XML namespace prefixes during parsing.
`locale`	Locale for parsing dates and numbers.
`mode`	Behavior for malformed records: `PERMISSIVE`, `DROPMALFORMED`, or `FAILFAST`.
`nullValue`	String that represents a null value.
`rootTag`	Root element tag name.
`rowTag`	XML element tag that identifies each row (required).
`rowValidationXSDPath`	Path to an XSD schema file for validating each row element.
`samplingRatio`	Fraction of rows sampled for schema inference (default: `1.0`).
`timestampFormat`	Pattern for parsing timestamp strings.
`timestampNTZFormat`	Pattern for parsing timestamp-without-timezone strings.
`timeZone`	Time zone for parsing timestamps.
`validateName`	Validates that XML element names conform to the XML specification.
`valueTag`	Tag name used for text values in elements that also have attributes (default: `_VALUE`).

Key	Description
`arrayElementName`	Name of the XML element wrapping each array item when writing.
`attributePrefix`	Prefix added to XML attribute names to distinguish them from element names (default: `_`).
`compression`	Compression codec for reading (for example, `gzip`, `bzip2`).
`declaration`	XML declaration string to prepend when writing.
`encoding`	Character encoding of the XML file.
`excludeAttribute`	Excludes XML element attributes from parsing.
`ignoreSurroundingSpaces`	Ignores whitespace surrounding element values.
`ignoreNamespace`	Ignores XML namespace prefixes during parsing.
`locale`	Locale for parsing dates and numbers.
`mode`	Behavior for malformed records: `PERMISSIVE`, `DROPMALFORMED`, or `FAILFAST`.
`nullValue`	String that represents a null value.
`rootTag`	Root element tag name.
`rowTag`	XML element tag that identifies each row (required).
`rowValidationXSDPath`	Path to an XSD schema file for validating each row element.
`samplingRatio`	Fraction of rows sampled for schema inference (default: `1.0`).
`timestampFormat`	Pattern for parsing timestamp strings.
`timestampNTZFormat`	Pattern for parsing timestamp-without-timezone strings.
`timeZone`	Time zone for parsing timestamps.
`validateName`	Validates that XML element names conform to the XML specification.
`valueTag`	Tag name used for text values in elements that also have attributes (default: `_VALUE`).

Parquet

Key	Description
`datetimeRebaseMode`	Handling for dates and timestamps written in Julian calendar format: `EXCEPTION`, `CORRECTED`, or `LEGACY`.
`int96RebaseMode`	Handling for INT96 timestamps written in Julian calendar format: `EXCEPTION`, `CORRECTED`, or `LEGACY`.
`mergeSchema`	Merges schema across multiple Parquet files.

Key	Description
`datetimeRebaseMode`	Handling for dates and timestamps written in Julian calendar format: `EXCEPTION`, `CORRECTED`, or `LEGACY`.
`int96RebaseMode`	Handling for INT96 timestamps written in Julian calendar format: `EXCEPTION`, `CORRECTED`, or `LEGACY`.
`mergeSchema`	Merges schema across multiple Parquet files.

Avro

Key	Description
`avroSchema`	Avro schema in JSON string format. Use to enforce a specific schema during reads.
`datetimeRebaseMode`	Handling for dates and timestamps written in Julian calendar format: `EXCEPTION`, `CORRECTED`, or `LEGACY`.
`mergeSchema`	Merges schema across multiple Avro files.

Key	Description
`avroSchema`	Avro schema in JSON string format. Use to enforce a specific schema during reads.
`datetimeRebaseMode`	Handling for dates and timestamps written in Julian calendar format: `EXCEPTION`, `CORRECTED`, or `LEGACY`.
`mergeSchema`	Merges schema across multiple Avro files.

Ingested data format

The schema of the destination table depends on the entity_type and format you configure.

BINARYFILE entity type (FILE)

When entity_type is FILE and format is BINARYFILE, each ingested file becomes one row with the following columns:

Field	Type	Description
`path`	`string`	The path of the file.
`modificationTime`	`timestamp`	The time the file was last modified.
`length`	`bigint`	The size of the file in bytes.
`content`	`binary`	The file content.
`_file_id`	`string`	The Google Drive identifier of the file.
`_gdrive_metadata`	`struct`	Contains Google Drive-specific metadata for the file: `id` (`string`): The Google Drive identifier of the file. `drive_id` (`string`): The Google Drive identifier of the shared drive. Null for files in My Drive. `parent_id` (`string`): The identifier of the parent folder. `web_url` (`string`): A link to the file in Google Drive. `mime_type` (`string`): The MIME type of the file. `md5_checksum` (`string`): The MD5 checksum of the file content. `version` (`string`): The version of the file. `created_timestamp` (`timestamp`): The time the file was created in Google Drive. `last_modified_by_email` (`string`): The email address of the last user who modified the file. May be null if not available. `last_modified_by_name` (`string`): The name of the last user who modified the file. May be null if not available. `shared` (`boolean`): Whether the file is shared. `properties` (`variant`): Custom file properties. `additional_metadata` (`variant`): Additional file metadata.
`_file_metadata`	`struct`	Standard file metadata added by Databricks during ingestion: `file_path` (`string`): The source path of the file. `file_name` (`string`): The name of the file. `file_size` (`bigint`): The size of the file in bytes. `file_block_start` (`bigint`): The start byte offset of the file block. `file_block_length` (`bigint`): The length of the file block in bytes. `file_modification_time` (`timestamp`): The time the file was last modified.

Field	Type	Description
`path`	`string`	The path of the file.
`modificationTime`	`timestamp`	The time the file was last modified.
`length`	`bigint`	The size of the file in bytes.
`content`	`binary`	The file content.
`_file_id`	`string`	The Google Drive identifier of the file.
`_gdrive_metadata`	`struct`	Contains Google Drive-specific metadata for the file: `id` (`string`): The Google Drive identifier of the file. `drive_id` (`string`): The Google Drive identifier of the shared drive. Null for files in My Drive. `parent_id` (`string`): The identifier of the parent folder. `web_url` (`string`): A link to the file in Google Drive. `mime_type` (`string`): The MIME type of the file. `md5_checksum` (`string`): The MD5 checksum of the file content. `version` (`string`): The version of the file. `created_timestamp` (`timestamp`): The time the file was created in Google Drive. `last_modified_by_email` (`string`): The email address of the last user who modified the file. May be null if not available. `last_modified_by_name` (`string`): The name of the last user who modified the file. May be null if not available. `shared` (`boolean`): Whether the file is shared. `properties` (`variant`): Custom file properties. `additional_metadata` (`variant`): Additional file metadata.
`_file_metadata`	`struct`	Standard file metadata added by Databricks during ingestion: `file_path` (`string`): The source path of the file. `file_name` (`string`): The name of the file. `file_size` (`bigint`): The size of the file in bytes. `file_block_start` (`bigint`): The start byte offset of the file block. `file_block_length` (`bigint`): The length of the file block in bytes. `file_modification_time` (`timestamp`): The time the file was last modified.

Structured entity type (FILE with structured format)

When entity_type is FILE and format is a structured format (CSV, JSON, XML, EXCEL, PARQUET, AVRO, or ORC), the destination table schema matches the schema of the source files. Columns are inferred from the file contents, subject to the schema_evolution_mode and schema_hints settings.

FILE_METADATA entity type

When entity_type is FILE_METADATA, file content is not downloaded. The destination table contains only the metadata columns from the _gdrive_metadata and _file_metadata structs described above, plus _file_id.

gdrive_options parameters​

file_ingestion_options parameters​

table_configuration parameters​

Format options​

JSON​

CSV​

XML​

Parquet​

Avro​

Ingested data format​

BINARYFILE entity type (FILE)​

Structured entity type (FILE with structured format)​

FILE_METADATA entity type​

`gdrive_options` parameters

`file_ingestion_options` parameters

`table_configuration` parameters

Format options

JSON

CSV

XML

Parquet

Avro

Ingested data format

BINARYFILE entity type (FILE)

Structured entity type (FILE with structured format)

FILE_METADATA entity type