This article describes tools and techniques for exploratory data analysis (EDA) on Databricks.
Exploratory data analysis (EDA) includes methods for exploring data sets to summarize their main characteristics and identify any problems with the data. Using statistical methods and visualizations, you can learn about a data set to determine its readiness for analysis and inform what techniques to apply for data preparation. EDA can also influence which algorithms you choose to apply for training ML models.
Databricks has built-in analysis and visualization tools for working with data.
The Databricks Runtime and Databricks Runtime ML provide pre-built environments that have popular data exploration libraries already installed. You can see the list of the built-in libraries in the release notes.
In addition, the following articles show examples of visualization tools in Databricks: