SELECT

Composes a result set from one or more tables. The SELECT clause can be part of a query which also includes common table expressions (CTE), set operations, and various other clauses.

Syntax

SELECT [ hints ] [ ALL | DISTINCT ] { named_expression | star_clause } [, ...]
  FROM from_item [, ...]
  [ LATERAL VIEW clause ]
  [ PIVOT clause ]
  [ WHERE clause ]
  [ GROUP BY clause ]
  [ HAVING clause]
  [ QUALIFY clause ]

from_item
{ table_name [ table_alias ] |
  JOIN clause |
  [ LATERAL ] table_valued_function [ table_alias ] |
  VALUES clause |
  [ LATERAL ] ( query ) [ table_alias ] }

named_expression
   expression [ column_alias ]

star_clause
   [ { table_name | view_name } . ] *

Parameters

  • hints

    Hints help the Databricks Runtime optimizer make better planning decisions. Databricks Runtime supports hints that influence selection of join strategies and repartitioning of the data.

  • ALL

    Select all matching rows from the relation. Enabled by default.

  • DISTINCT

    Select all matching rows from the relation after removing duplicates in results.

  • named_expression

    An expression with an optional assigned name.

    • expression

      A combination of one or more values, operators, and SQL functions that evaluates to a value.

    • column_alias

      An optional column identifier naming the expression result. If no column_alias is provided Databricks Runtime derives one.

  • star_clause

    A short hand to name all the referencable column in the FROM clause. The list of columns is ordered by the order of from_items and the order of columns within each from_item.

    • table_name

      If present limits the columns to be named to those in the specified referencable table.

    • view_name

      If specified limits the columns to be expanded to those in the specified referencable view.

  • from_item

    A source of input for the SELECT. One of the following:

    • table_name

      Identifies a table that may contain a temporal specification. See Query an older snapshot of a table (time travel) for details.

    • view_name

      Identifies a view.

    • JOIN

      Combines two or more relations using a join.

    • [ LATERAL ] [table_valued_function](sql-ref-syntax-qry-select-tvf.md)

      Invokes a table function. To refer to columns exposed by a preceding from_item in the same FROM clause you must specify LATERAL.

    • VALUES

      Defines an inline table.

    • [ LATERAL ] ( query )

      Computes a relation using a query. A query prefixed by LATERAL may reference columns exposed by a preceding from_item in the same FROM clause. Such a construct is called a correlated or dependent query. LATERAL is supported since Databricks Runtime 9.0.

  • table_alias

    Optionally specifies a label for the from_item. If the table_alias includes column_identifiers their number must match the number of columns in the from_item.

  • PIVOT

    Used for data perspective; you can get the aggregated values based on specific column value.

  • LATERAL VIEW

    Used in conjunction with generator functions such as EXPLODE, which generates a virtual table containing one or more rows. LATERAL VIEW applies the rows to each original output row.

  • WHERE

    Filters the result of the FROM clause based on the supplied predicates.

  • GROUP BY

    The expressions that are used to group the rows. This is used in conjunction with aggregate functions (MIN, MAX, COUNT, SUM, AVG) to group rows based on the grouping expressions and aggregate values in each group. When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.

  • HAVING

    The predicates by which the rows produced by GROUP BY are filtered. The HAVING clause is used to filter rows after the grouping is performed. If you specify HAVING without GROUP BY, it indicates a GROUP BY without grouping expressions (global aggregate).

  • QUALIFY

    The predicates that are used to filter the results of window functions. To use QUALIFY, at least one window function is required to be present in the SELECT list or the QUALIFY clause.

Select on Delta table

In addition to the standard SELECT options, Delta tables support the time travel options described in this section. For details, see Query an older snapshot of a table (time travel).

AS OF syntax

table_identifier TIMESTAMP AS OF timestamp_expression

table_identifier VERSION AS OF version

where

  • timestamp_expression can be any one of:

    • '2018-10-18T22:15:12.013Z', that is, a string that can be cast to a timestamp

    • cast('2018-10-18 13:36:32 CEST' as timestamp)

    • '2018-10-18', that is, a date string

    • In Databricks Runtime 6.6 and above:

      • current_timestamp() - interval 12 hours

      • date_sub(current_date(), 1)

      • Any other expression that is or can be cast to a timestamp

  • version is a long value that can be obtained from the output of DESCRIBE HISTORY table_spec.

Neither timestamp_expression nor version can be subqueries.

Example

> SELECT * FROM events TIMESTAMP AS OF '2018-10-18T22:15:12.013Z'

> SELECT * FROM delta.`/mnt/delta/events` VERSION AS OF 123

@ syntax

Use the @ syntax to specify the timestamp or version. The timestamp must be in yyyyMMddHHmmssSSS format. You can specify a version after @ by prepending a v to the version. For example, to query version 123 for the table events, specify events@v123.

Example

> SELECT * FROM events@20190101000000000

> SELECT * FROM events@v123