File events FAQ
Find answers to frequently asked questions about file events for external locations.
What are file events?
File events let Databricks detect new or changed files via cloud notifications instead of repeatedly scanning your storage. File events reduce ingestion latency and cloud storage listing costs and are enabled by default on new external locations.

How do file events work?
When you enable file events with type Automatic, Databricks configures your S3 bucket to send file change notifications to a managed SNS topic (prefixed csms-*). An SQS queue subscribes to that topic, and the Databricks file events service reads file path metadata from the SQS queue to discover new and changed files. As a safety net, the service also performs periodic full directory listings to ensure no files are missed.
The notification infrastructure never transmits file contents.
Which Databricks features use file events?
The following features use file events when you enable them on an external location:
- Auto Loader: Detects new files for incremental ingestion without expensive directory listings. Starting with Databricks Runtime 18.1 and above, Auto Loader automatically uses file events when available (
useManagedFileEvents = if_available). - File arrival triggers: Automatically start your job when new files arrive, providing better resource utilization and cost efficiency because your cluster runs only when there are new files to process. File arrival triggers are significantly more scalable with file events enabled. See Trigger jobs when new files arrive.
- Table update triggers: Automatically start your job based on updates in a table. Table update triggers are significantly more scalable with file events enabled. See Trigger jobs when source tables are updated.
How can I enable file events in my pipelines and jobs?
First, enable file events for your external location. See Set up file events for an external location.
If you use file events with file arrival or table update triggers, you don't need to take additional action. The job automatically benefits from file events.
Also, if you use Auto Loader with Databricks Runtime 18.1 or above, you don't need to take additional action. The pipeline automatically benefits from file events. If you use an earlier runtime version, enable file events on the pipeline:
spark.readStream.option("cloudFiles.useManagedFileEvents", "true")...
What if I'm not using Auto Loader or triggers today?
You can turn off file events at any time and Databricks cleans up the notification resources for you. Databricks recommends keeping file events enabled.
Can I opt out of file events?
Databricks enables file events by default for new external locations because they reduce costs and improve performance for ingestion workloads.
To create an external location without file events:
- Catalog Explorer
- API
- In Catalog Explorer, begin creating a new external location.
- If the storage credential does not have file events permissions, you see a validation warning. Click Force create to continue.
- After creation, verify that file events are turned off by selecting the location and unchecking the file events setting.
Set enable_file_events to false in the create external location request.
To disable file events on an existing external location, see Set up file events for an external location.
Does Databricks create resources in my AWS account?
Yes. When you enable file events with type Automatic, Databricks creates one SNS topic and one SQS queue (prefixed csms-*) per external location in your AWS account and configures S3 bucket notifications to publish to the SNS topic. You can alternatively bring your own queue using Provided mode. See When should I use Provided mode instead of Automatic?.
Does Databricks manage the resources it creates?
Databricks manages subscription renewals and message consumption. The file events service reads from the SQS queue and periodically performs full directory listings to verify no files are missed.
When you turn off file events on an external location, Databricks removes the S3 bucket notification configuration and deletes the associated SNS topic and SQS queue. When you delete an external location with file events enabled, Databricks cleans up the associated notification resources.
To find Databricks-managed file event resources in your AWS account, search for SNS topics and SQS queues with the csms-* prefix.
How does Databricks get the permissions to create cloud resources and read and delete messages from the queue?
Databricks uses the permissions granted in the storage credential associated with the external location on which file events are enabled.
How many SNS topics and SQS queues are created?
One SNS topic and one SQS queue are created per external location. If you have multiple external locations on the same bucket, each gets its own pair of resources.
What data flows through SNS and SQS?
File event notifications contain standard S3 event notification fields, including the S3 object key (file path), event type (for example, s3:ObjectCreated:Put), timestamp, bucket name, and region. For the full schema, see Amazon S3 notification content structure.
File contents are never transmitted through the SNS topic or SQS queue.
What does this cost?
Enabling file events creates an SNS topic and SQS queue in your AWS account. These resources incur standard AWS messaging charges based on your file activity (creates, updates, deletes), not on the total amount of data stored.
For most workloads, the incremental cost is a small fraction of what you already pay for S3 storage on the same location. You can estimate your costs using the standard AWS SNS pricing and AWS SQS pricing pages, based on the number of file change events your buckets generate.
Can file paths contain PII or sensitive data?
That depends on how your organization names files. The S3 object key (path) is included in every event notification. If your file naming conventions embed PII or sensitive identifiers in paths, those values flow through the SNS topic and SQS queue. However, these are resources in your own AWS account, and Databricks already has read access to those same file paths through the storage credential. External locations with file events do not give Databricks more access to your data than external locations without file events.
What encryption and security controls are in place?
- Encryption at rest: Managed SNS topics and SQS queues use AWS-managed encryption at rest.
- Encryption in transit: All communication between S3, SNS, SQS, and the Databricks service uses TLS.
- Scope: Managed resources are scoped to the
csms-*namespace and accessible only through the IAM role in the storage credential. - Audit: Databricks-managed resources use the
csms-*naming prefix. You can use AWS CloudTrail to monitor all API calls to these resources.
Considerations for regulated environments
Organizations with strict cloud security policies should consider the following:
- The file events IAM policy is scoped to resources prefixed with
csms-*. It does not grant access to any existing SNS topics, SQS queues, or other resources outside this namespace. - If your policies prohibit third-party services from creating AWS resources, use Provided mode and supply your own SQS queue ARN.
- Notifications contain only S3 event metadata (such as object keys, event types, timestamps, bucket and region identifiers). No file contents flow through the infrastructure.
- Use AWS CloudTrail to monitor all API calls to
csms-*resources. - Managed resources use AWS-managed encryption at rest. All communication uses TLS in transit.
When should I use Provided mode instead of Automatic?
Automatic is recommended for most customers. Provided mode, where you create and manage the queue yourself, is available for organizations whose policies prohibit third-party resource creation. The setup for Provided mode is complex and self-service; Databricks does not provide support for provisioning those resources.
If your organization needs to restrict resource creation, consider using Provided mode. For setup instructions, see Set up file events for an external location.