You can use a query profile to visualize the details of a query execution. The query profile helps you troubleshoot performance bottlenecks during the query’s execution. For example:
- You can visualize each query task and its related metrics, such as the time spent, number of rows processed, rows processed, and memory consumption.
- You can identify the slowest part of a query execution at a glance and assess the impacts of modifications to the query.
- You can discover and fix common mistakes in SQL statements, such as exploding joins or full table scans.
To view a query profile, you must either be the owner of the query or you must have the Can Manage permission on the SQL endpoint that executed the query.
To view the profile for a query:
View query history.
Click the name of a query.
Click View Execution Details. An overview of basic query metrics appears.
If View Execution Details is disabled, no profile is available for this query. A query profile is not available for queries that run from the query cache. To circumvent the query cache, make a trivial change to the query, such as changing or removing the
Click View full execution details at the bottom of the page to open the profile.
Click one of the tabs at the top of the page to view details about each of the query’s tasks. If a task has sub-tasks, you can click > to expand it.
- Duration: The sum of execution time spent by all tasks for each operation.
- Rows: The number and size of the rows affected by each of the query’s tasks.
- Memory: The peak memory each of the query’s tasks consumed.
Some non-Photon operations are executed as a group and share common metrics. In this case, all subtasks have the same value as the parent task for a given metric.
Each task’s operation is shown. By default, tasks and metrics for some operations are hidden. These operations are unlikely to be the cause of performance bottlenecks. To see information for all operations, and to see additional metrics, disable the Simple toggle at the top of the page to enable the advanced view. The most common operations are:
- Scan: Data was read from a datasource and output as rows.
- Join: Rows from multiple relations were combined (interleaved) into a single set of rows.
- Union: Rows from multiple relations that use the same schema were concatenated into a single set of rows.
- Shuffle: Data was redistributed or repartitioned. Shuffle operations are expensive with regard to resources because they move data between executors on the cluster.
- Hash / Sort: Rows were grouped by a key and evaluated using an aggregate function such as
MAXwithin each group.
- Filter: Input is filtered according to a criteria, such as by a
WHEREclause, and a subset of rows is returned.
- (Reused) Exchange: A Shuffle or Broadcast Exchange is used to redistribute the data among the cluster nodes based on the desired partitioning.
- Collect Limit: The number of rows returned was truncated by using a
- Take Ordered And Project: The top N rows of the query result were returned.
To view the query profile in the Apache Spark UI, click Open next to Spark UI.
For more details about the information available in the query profile, see View details about the query profile.
The query profile lists the query’s top-level tasks in reverse order, with the last task listed first. On the left, three columns show the task sequence, the name of the operation, and a graph of the selected metric for that task. Follow these steps to familiarize yourself with the different parts of the query profile.
- Click Duration to see the duration of each subtask.
- Click Rows to see the number and size of rows returned by the query.
- Click Memory to see the memory consumed by each query task. If the task has subtasks, you can click > to see details about each subtask.
- On the right, click Overview to see the query’s SQL statement, status, start and end times, duration, the user who ran the query, and the endpoint where the query was executed.
- Click Execution Details for more detailed metrics about the overall query.
- Click a task to view details about the task, such as the task’s description and metrics about the task’s duration, memory consumed, number and size of rows returned, and lineage.
- To close subtask details, click X.
- Click the name of the SQL endpoint to go to that endpoint’s properties.
- To open the query plan in the Apache Spark web UI, click Open next to Spark UI.