Query profile

You can use a query profile to visualize the details of a query execution. The query profile helps you troubleshoot performance bottlenecks during the query’s execution. For example:

  • You can visualize each query task and its related metrics, such as the time spent, number of rows processed, rows processed, and memory consumption.
  • You can identify the slowest part of a query execution at a glance and assess the impacts of modifications to the query.
  • You can discover and fix common mistakes in SQL statements, such as exploding joins or full table scans.

Requirements

To view a query profile, you must either be the owner of the query or you must have the Can Manage permission on the SQL endpoint that executed the query.

View a query profile

To view the profile for a query:

  1. View query history.

  2. Click the name of a query.

  3. Click View Execution Details. An overview of basic query metrics appears.

    Note

    If View Execution Details is disabled, no profile is available for this query. A query profile is not available for queries that run from the query cache. To circumvent the query cache, make a trivial change to the query, such as changing or removing the LIMIT.

  4. Click View full execution details at the bottom of the page to open the profile.

    Query execution summary simple view
  5. Click one of the tabs at the top of the page to view details about each of the query’s tasks. If a task has sub-tasks, you can click > to expand it.

    • Duration: The sum of execution time spent by all tasks for each operation.
    • Rows: The number and size of the rows affected by each of the query’s tasks.
    • Memory: The peak memory each of the query’s tasks consumed.

    Note

    Some non-Photon operations are executed as a group and share common metrics. In this case, all subtasks have the same value as the parent task for a given metric.

  6. Each task’s operation is shown. By default, tasks and metrics for some operations are hidden. These operations are unlikely to be the cause of performance bottlenecks. To see information for all operations, and to see additional metrics, disable the Simple toggle at the top of the page to enable the advanced view. The most common operations are:

    • Scan: Data was read from a datasource and output as rows.
    • Join: Rows from multiple relations were combined (interleaved) into a single set of rows.
    • Union: Rows from multiple relations that use the same schema were concatenated into a single set of rows.
    • Shuffle: Data was redistributed or repartitioned. Shuffle operations are expensive with regard to resources because they move data between executors on the cluster.
    • Hash / Sort: Rows were grouped by a key and evaluated using an aggregate function such as SUM, COUNT, or MAX within each group.
    • Filter: Input is filtered according to a criteria, such as by a WHERE clause, and a subset of rows is returned.
    • (Reused) Exchange: A Shuffle or Broadcast Exchange is used to redistribute the data among the cluster nodes based on the desired partitioning.
    • Collect Limit: The number of rows returned was truncated by using a LIMIT statement.
    • Take Ordered And Project: The top N rows of the query result were returned.

Tip

To view the query profile in the Apache Spark UI, click Open next to Spark UI.

For more details about the information available in the query profile, see View details about the query profile.

View details about the query profile

The query profile lists the query’s top-level tasks in reverse order, with the last task listed first. On the left, three columns show the task sequence, the name of the operation, and a graph of the selected metric for that task. Follow these steps to familiarize yourself with the different parts of the query profile.

  1. Click Duration to see the duration of each subtask.
  2. Click Rows to see the number and size of rows returned by the query.
  3. Click Memory to see the memory consumed by each query task. If the task has subtasks, you can click > to see details about each subtask.
  4. On the right, click Overview to see the query’s SQL statement, status, start and end times, duration, the user who ran the query, and the endpoint where the query was executed.
  5. Click Execution Details for more detailed metrics about the overall query.
  6. Click a task to view details about the task, such as the task’s description and metrics about the task’s duration, memory consumed, number and size of rows returned, and lineage.
  7. To close subtask details, click X.
  8. Click the name of the SQL endpoint to go to that endpoint’s properties.
  9. To open the query plan in the Apache Spark web UI, click Open next to Spark UI.

Share a query profile

To share a query profile with another user:

  1. View query history.
  2. Click the name of the query.
  3. To share the query, you have two choices:
    • If the other user has the Can Manage permission on the query, you can share the URL for the query profile with them. Either copy the URL from your browser or click Share and then click Copy URL.
    • Otherwise, if the other user does not have the Can Manage permission or is not a member of the workspace, you can download the query profile as a JSON object. Click Share and then click Download query details (JSON). The JSON file is downloaded to your local system.

To import the JSON for a query profile:

  1. View query history.

  2. Click the name of any query for which you have the Can Manage permission and that did not run from the query cache.

    Note

    If you don’t have a query that meets this criteria, you can executed a simple query like select 1;.

  3. Click View Execution Details.

  4. Click View full execution details at the bottom of the page to open the profile.

  5. Click the vertical ellipsis Vertical Ellipsis at the top-right of the page and select Import.

  6. In the file browser, select the JSON file that was shared with you and click Open. The JSON file is uploaded and the query profile you previously opened is replaced with the one you imported.

    When you import a query profile, it is dynamically loaded into your browser session and does not persist in your workspace. You need to re-import it each time you want to view it.

Next steps