Reduce files scanned and accelerate performance with predictive IO

Predictive IO is a new suite of capabilities that optimizes the scan and filter portion of a query. These improvements significantly reduce the scanning portion of a query. Predictive IO enables the Photon engine to query less data and serve results even faster.


Predictive IO is supported by the serverless and pro types of SQL warehouses, as well as Photon-accelerated clusters running Databricks Runtime 11.2 and above.

Predictive IO improves scanning performance by applying deep learning techniques to:

  • Determine the most efficient access pattern to read the data and only scanning the data that is actually needed.

  • Eliminate the decoding of columns and rows that are not required to generate query results.

  • Calculate the probabilities of the search criteria in selective queries matching a row. As queries run, we use these probabilities to anticipate where the next matching row would occur and only read that data from cloud storage.