How does the lakehouse improve data discovery and collaboration?

Databricks designed Unity Catalog to help organizations reduce time to insights by empowering a broader set of data users to discover and analyze data at scale. Data stewards can securely grant access to data assets for diverse teams of end users in Unity Catalog. These users can then use a variety of languages and tools, including SQL and Python, to create derivative datasets, models, and dashboards that can be shared across teams.

Manage permissions at scale

Unity Catalog provides administrators a unified location to assign permissions for catalogs, databases, tables, and views to groups of users. Privileges and metastores are shared across workspaces, allowing administrators to set secure permissions once against groups synced from identity providers and know that end users only have access to the proper data in any Databricks workspace they enter.

Unity Catalog also allows administrators to define storage credentials, a secure way to store and share permissions on cloud storage infrastructure. You can grant privileges on these securables to power users within the organization so they can define external locations against cloud object storage locations, allowing data engineers to self-service for new workloads without needing to provide elevated permissions in cloud account consoles.

Discover data on Databricks

Users can browse available data objects in Unity Catalog using the Data Explorer. Data Explorer uses the privileges configured by Unity Catalog administrators to ensure that users are only able to see catalogs, databases, tables, and views that they have permissions to query. Once users find a dataset of interest, they can review field names and types, read comments on tables and individual fields, and preview a sample of the data. Users can also review the full history of the table to understand when and how data has changed, and the lineage feature allows users to track how certain datasets are derived from upstream jobs and used in downstream jobs.

Storage credentials and external locations are also displayed in Data Explorer, allowing each user to fully grasp the privileges they have to read and write data across available locations and resources.

Accelerate time to production with the lakehouse

Databricks supports workloads in SQL, Python, Scala, and R, allowing users with diverse skill sets and technical backgrounds to leverage their knowledge to derive analytic insights. You can use all languages supported by Databricks to define production jobs, and notebooks can leverage a combination of languages. This means that you can promote queries written by SQL analysts for last mile ETL into production data engineering code with almost no effort. Queries and workloads defined by personas across the organization leverage the same datasets, so there’s no need to reconcile field names or make sure dashboards are up to date before sharing code and results with other teams. You can securely share code, notebooks, queries, and dashboards, all powered by the same scalable cloud infrastructure and defined against the same curated data sources.