• Databricks
  • Databricks
  • Support
  • Feedback
  • Try Databricks
  • Help Center
  • Documentation
  • Knowledge Base
Databricks on AWS

Getting started

  • Introduction
  • Get started
  • Tutorials and best practices

User guides

  • Data Science & Engineering
  • Machine Learning
  • Databricks SQL
  • Data
  • Delta Lake
    • Introduction
    • Delta Lake quickstart
    • Introductory notebooks
    • Ingest data into Delta Lake
    • Table batch reads and writes
    • Table streaming reads and writes
    • Table deletes, updates, and merges
    • Change data feed
    • Table utility commands
    • Constraints
    • Table protocol versioning
    • Delta column mapping
    • Delta Lake APIs
    • Concurrency control
    • Access Delta tables from external data processing engines
    • Migration guide
    • Best practices: Delta Lake
    • Frequently asked questions (FAQ)
    • Delta Lake resources
    • Optimizations
      • Optimize performance with file management
      • Auto Optimize
      • Optimize performance with caching
      • Dynamic file pruning
      • Isolation levels
      • Bloom filter indexes
      • Low Shuffle Merge
      • Optimize join performance
      • Optimized data transformation
      • Additional resources
    • Delta table properties reference
  • Developer tools
  • Integrations

Administration guides

  • Accounts and workspaces
  • Security
  • Data governance
  • Data sharing

Reference guides

  • API reference
  • SQL reference
  • CLI and utilities

Resources

  • Release notes
  • Other resources

Updated May 17, 2022

Send us feedback

  • Documentation
  • Delta Lake guide
  • Optimizations

Optimizations

Databricks provides optimizations for Delta Lake that accelerate data lake operations, supporting a variety of workloads ranging from large-scale ETL processing to ad-hoc, interactive queries. Many of these optimizations take place automatically; you get their benefits simply by using Databricks for your data lakes.

  • Optimize performance with file management
    • Compaction (bin-packing)
    • Data skipping
    • Z-Ordering (multi-dimensional clustering)
    • Tune file size
    • Notebooks
    • Improve interactive query performance
    • Frequently asked questions (FAQ)
  • Auto Optimize
    • How Auto Optimize works
    • Enable Auto Optimize
    • When to opt in and opt out
    • Example workflow: Streaming ingest with concurrent deletes or updates
    • Frequently asked questions (FAQ)
  • Optimize performance with caching
    • Delta and Apache Spark caching
    • Delta cache consistency
    • Use Delta caching
    • Cache a subset of the data
    • Monitor the Delta cache
    • Configure the Delta cache
  • Dynamic file pruning
  • Isolation levels
    • Set the isolation level
  • Bloom filter indexes
    • How Bloom filter indexes work
    • Configuration
    • Create a Bloom filter index
    • Drop a Bloom filter index
    • Display the list of Bloom filter indexes
    • Notebook
  • Low Shuffle Merge
    • Optimized performance
    • Optimized data layout
    • Availability
  • Optimize join performance
    • Range join optimization
    • Skew join optimization
  • Optimized data transformation
    • Higher-order functions
    • Transform complex data types

Additional resources

  • Cost-based optimizer


© Databricks 2022. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Policy | Terms of Use