Skip to main content

Google Drive connector reference

This page contains reference documentation for the Google Drive connector in Databricks Lakeflow Connect.

gdrive_options parameters

Set these options inside the connector_options.gdrive_options block of each table in your pipeline definition.

Parameter

Type

Required

Description

entity_type

String

Yes

The type of entity to ingest. Supported values:

  • FILE: Ingest file content and metadata.
  • FILE_METADATA: Ingest metadata only, without downloading file contents.

url

String

Yes

The URL of the Google Drive folder or shared drive to ingest from. Examples:

  • https://drive.google.com/drive/folders/<folder_id>
  • A shared drive URL

file_ingestion_options

Object

Yes

Controls file format and ingestion behavior. See file_ingestion_options parameters.

file_ingestion_options parameters

Set these options inside gdrive_options.file_ingestion_options.

Parameter

Type

Required

Description

format

String

Yes

The file format to ingest. Supported values: BINARYFILE, CSV, JSON, XML, EXCEL, PARQUET, AVRO, ORC. Use BINARYFILE for unstructured ingestion (PDFs, Office files, images). Use a structured format to parse file contents into rows.

file_filters

Array of objects

No

Filters that restrict which files are ingested. Each filter object can contain one of the following keys:

  • path_filter (string): A glob pattern matched against file paths. Based on Spark path glob filter.
  • modified_before (string): A timestamp in YYYY-MM-DDTHH:mm:ss format. Only files modified before this time are ingested.
  • modified_after (string): A timestamp in YYYY-MM-DDTHH:mm:ss format. Only files modified after this time are ingested.

schema_evolution_mode

String

No

Controls how new columns in incoming files are handled. Modes match Auto Loader schema evolution modes. Supported values: ADD_NEW_COLUMNS_WITH_TYPE_WIDENING (default), ADD_NEW_COLUMNS, RESCUE, FAIL_ON_NEW_COLUMNS, NONE.

schema_hints

String

No

Overrides inferred column types. Specify as a comma-separated list of column_name TYPE pairs, for example order_id INT, amount DOUBLE. See Override schema inference with schema hints.

format_options

Object

No

Format-specific parsing options. Keys are standard Auto Loader format option names. See Format options.

table_configuration parameters

Set these options inside the table_configuration block of each table in your pipeline definition. table_configuration is a sibling of connector_options, not nested inside it.

Parameter

Type

Required

Description

storage_mode

String

No

The storage mode for the destination table. Supported values:

  • SCD_TYPE_1 (default for BINARYFILE): Overwrites records when files change or are deleted.
  • APPEND_ONLY (default for structured formats): Appends new rows from new or updated files.

Because these are the defaults and the only supported values, setting storage_mode explicitly is optional. Do not use the scd_type field — it throws an error.

Format options

The format_options block accepts standard Auto Loader format option keys, organized below by file format. For details, see Auto Loader.

JSON

Key

Description

allowBackslashEscapingAnyCharacter

Allows backslashes to escape any character.

allowComments

Allows Java- and C++-style comments in JSON content.

allowNonNumericNumbers

Allows NaN and Infinity as valid float values.

allowNumericLeadingZeros

Allows leading zeros in integer values.

allowSingleQuotes

Allows single quotes as string delimiters in addition to double quotes.

allowUnquotedControlChars

Allows unquoted control characters in JSON strings.

allowUnquotedFieldNames

Allows unquoted field names.

badRecordsPath

Path to store corrupt or unparseable records instead of failing the pipeline.

charset / encoding

Character encoding of the file (for example, UTF-8, ISO-8859-1).

dateFormat

Pattern for parsing date strings (for example, yyyy-MM-dd).

dropFieldIfAllNull

Ignores columns where all values are null or empty during schema inference.

inferTimestamp

Infers TimestampType for strings that match a timestamp pattern.

lineSep

Line separator character or string.

locale

Locale for parsing dates and numbers (for example, en-US).

mode

Behavior for malformed records: PERMISSIVE (default), DROPMALFORMED, or FAILFAST.

multiLine

Parses records that span multiple lines.

prefersDecimal

Infers DecimalType instead of FloatType or DoubleType where possible.

primitivesAsString

Infers all primitive values as StringType.

readerCaseSensitive

Enables case-sensitive column name matching against the schema.

timestampFormat

Pattern for parsing timestamp strings (for example, yyyy-MM-dd'T'HH:mm:ss).

timeZone

Time zone for parsing timestamps (for example, UTC, America/New_York).

CSV

Supports all JSON options above, plus the following CSV-specific options:

Key

Description

charToEscapeQuoteEscaping

Escape character used before a quote character inside a quoted field.

comment

Character that marks a line as a comment; lines beginning with this character are skipped.

delimiter / sep

Column delimiter character (default: ,).

emptyValue

String to use for empty values when writing.

enforceSchema

Applies the declared schema to CSV data, ignoring header names.

escape

Escape character (default: \).

header

Whether the first row contains column names (default: false).

ignoreLeadingWhiteSpace

Trims leading whitespace from values.

ignoreTrailingWhiteSpace

Trims trailing whitespace from values.

maxCharsPerColumn

Maximum number of characters allowed per column value.

maxColumns

Maximum number of columns allowed in a record.

mergeSchema

Merges schema across multiple CSV files.

nanValue

String representation of NaN.

negativeInf

String representation of negative infinity.

nullValue

String that represents a null value.

parserCaseSensitive

Enables case-sensitive matching between header names and schema field names.

positiveInf

String representation of positive infinity.

preferDate

Infers DateType for date-like strings instead of TimestampType.

quote

Quote character used to enclose field values that contain the delimiter (default: ").

skipRows

Number of rows to skip at the beginning of the file before the header or data.

unescapedQuoteHandling

How to handle unescaped quote characters inside quoted fields.

XML

Key

Description

arrayElementName

Name of the XML element wrapping each array item when writing.

attributePrefix

Prefix added to XML attribute names to distinguish them from element names (default: _).

compression

Compression codec for reading (for example, gzip, bzip2).

declaration

XML declaration string to prepend when writing.

encoding

Character encoding of the XML file.

excludeAttribute

Excludes XML element attributes from parsing.

ignoreSurroundingSpaces

Ignores whitespace surrounding element values.

ignoreNamespace

Ignores XML namespace prefixes during parsing.

locale

Locale for parsing dates and numbers.

mode

Behavior for malformed records: PERMISSIVE, DROPMALFORMED, or FAILFAST.

nullValue

String that represents a null value.

rootTag

Root element tag name.

rowTag

XML element tag that identifies each row (required).

rowValidationXSDPath

Path to an XSD schema file for validating each row element.

samplingRatio

Fraction of rows sampled for schema inference (default: 1.0).

timestampFormat

Pattern for parsing timestamp strings.

timestampNTZFormat

Pattern for parsing timestamp-without-timezone strings.

timeZone

Time zone for parsing timestamps.

validateName

Validates that XML element names conform to the XML specification.

valueTag

Tag name used for text values in elements that also have attributes (default: _VALUE).

Parquet

Key

Description

datetimeRebaseMode

Handling for dates and timestamps written in Julian calendar format: EXCEPTION, CORRECTED, or LEGACY.

int96RebaseMode

Handling for INT96 timestamps written in Julian calendar format: EXCEPTION, CORRECTED, or LEGACY.

mergeSchema

Merges schema across multiple Parquet files.

Avro

Key

Description

avroSchema

Avro schema in JSON string format. Use to enforce a specific schema during reads.

datetimeRebaseMode

Handling for dates and timestamps written in Julian calendar format: EXCEPTION, CORRECTED, or LEGACY.

mergeSchema

Merges schema across multiple Avro files.

Ingested data format

The schema of the destination table depends on the entity_type and format you configure.

BINARYFILE entity type (FILE)

When entity_type is FILE and format is BINARYFILE, each ingested file becomes one row with the following columns:

Field

Type

Description

file_id

String

The Google Drive identifier of the file.

file_metadata

Struct

Contains generic file metadata:

  • name (string): The name of the file, as it appears in Google Drive.
  • size_in_bytes (bigint): The size of the file.
  • created_timestamp (timestamp): The timestamp at which the file was created in Google Drive.
  • last_modified_timestamp (timestamp): The timestamp at which the file was last modified in Google Drive.
  • created_by_email (string): The email address of the user who created the file. May be null if not available.
  • last_modified_by_email (string): The email address of the last user who modified the file. May be null if not available.

_file_metadata

Struct

Contains Google Drive-specific metadata for the file:

  • drive_id (string): The Google Drive identifier of the shared drive. Null for files in My Drive.
  • file_folder_path (string): The file path of the file in Google Drive.
  • mime_type (string): The MIME type of the file.
  • web_url (string): A link to the file in Google Drive.

content

Struct

Contains file content.

_metadata

Struct

Standard file metadata added by Databricks during ingestion. Contains source file information such as path and modification time.

Structured entity type (FILE with structured format)

When entity_type is FILE and format is a structured format (CSV, JSON, XML, EXCEL, PARQUET, AVRO, or ORC), the destination table schema matches the schema of the source files. Columns are inferred from the file contents, subject to the schema_evolution_mode and schema_hints settings.

FILE_METADATA entity type

When entity_type is FILE_METADATA, file content is not downloaded. The destination table contains only the metadata columns from the file_metadata and _file_metadata structs described above, plus file_id.