Lakehouse Federation performance recommendations
This article provides guidance for improving the performance of Lakehouse Federation queries.
Set the JDBC fetchSize parameter
The JDBC fetch size determines the number of rows to fetch per round trip. By default, most JDBC connectors fetch data atomically. This might cause the amount of data to exceed the amount of available memory.
To avoid out-of-memory errors, set the fetchSize
parameter. When fetchSize
is set to a non-zero value, the connector reads data in batches. The maximum number of rows per batch is equal to the value of fetchSize
. Databricks recommends specifying a large fetchSize
value (for example 100,000
) because the overall query execution time can be prolonged if the number of rows in batches is too small.
This parameter allows the worker nodes to read the data in batches but not in parallel.
Compute requirements:
- You must use compute on Databricks Runtime 16.1 or above. SQL warehouses must be Pro or Serverless and must use 2024.50.
SELECT * FROM catalog.schema.jdbcTable WITH ('fetchSize' 100000)
Set the partition_size_in_mb parameter (Snowflake)
Set the partition_size_in_mb
parameter for large query result sets that need to be split into multiple DataFrame partitions. This option specifies the recommended uncompressed size for each DataFrame partition. To reduce the number of partitions, specify a larger value. The default is 100
(MB).
partition_size_in_mb
is used as a recommended size. The actual size of partitions might be smaller or larger.
Compute requirements:
- You must use compute on Databricks Runtime 16.1 or above. SQL warehouses must be Pro or Serverless and must use 2024.50.
SELECT * FROM catalog.schema.snowflakeTable WITH ('partition_size_in_mb' 1000)