Google Drive connector limitations

This page lists limitations and considerations for ingestion from Google Drive using Databricks Lakeflow Connect.

General SaaS connector limitations

The limitations in this section apply to all SaaS connectors in Lakeflow Connect.

When you run a scheduled pipeline, alerts don't trigger immediately. Instead, they trigger when the next update runs.
When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with Spark Declarative Pipelines on Lakeflow behavior.
During source maintenance periods, Databricks might not be able to access your data.
If a source table name conflicts with an existing destination table name, the pipeline update fails.
Multi-destination pipeline support is API-only.
You can optionally rename a table that you ingest. If you rename a table in your pipeline, it becomes an API-only pipeline, and you can no longer edit the pipeline in the UI.
If you select a column after a pipeline has already started, the connector does not automatically backfill data for the new column. To ingest historical data, manually run a full refresh on the table.
Databricks can't ingest two or more tables with the same name in the same pipeline, even if they come from different source schemas.
The source system assumes that the cursor columns are monotonically increasing.
The connector ingests raw data without transformations. Use downstream Spark Declarative Pipelines on Lakeflow pipelines for transformations.

During unstructured ingestion using binary file, each file's content is loaded into memory as a single record, so files larger than 100 MB can cause the update to fail (for example, with an out-of-memory error or by exceeding the 2 GB limit on binary columns in Delta). To prevent this, exclude large files using a row_filter on the length column in table_configuration. For example, "row_filter": "length <= 104857600" skips files larger than 100 MB. There is no file size limit for structured file formats.
Unstructured (BINARYFILE) ingestion supports only SCD_TYPE_1 storage mode. Structured ingestion (CSV, JSON, XML, EXCEL, and other formats) supports only APPEND_ONLY storage mode. SCD type 2 is not supported. When configuring storage mode, set storage_mode in table_configuration. Setting the scd_type field throws an error.
Individual file selection is not supported. The connector ingests all files in a configured folder or drive. To narrow which files are ingested, use file_filters with a path_filter glob pattern.
During unstructured (BINARYFILE) ingestion, file deletions are tracked only when ingesting from a shared drive. File deletions are not tracked when ingesting from a folder. File updates are tracked in both cases.
BINARYFILE, CSV, JSON, XML, EXCEL, PARQUET, AVRO, ORC are supported. Unsupported formats (for example, Google Forms, Google Sites) are skipped during ingestion.