SELECT

Composes a result set from one or more tables. The SELECT clause can be part of a query which also includes common table expressions (CTE), set operations, and various other clauses.

Syntax

SELECT [ hints ] [ ALL | DISTINCT ] { named_expression | star_clause } [, ...]
  FROM from_item [, ...]
  [ LATERAL VIEW clause ]
  [ PIVOT clause ]
  [ WHERE clause ]
  [ GROUP BY clause ]
  [ HAVING clause]
  [ QUALIFY clause ]

from_item
{ table_name [ TABLESAMPLE clause ] [ table_alias ] |
  JOIN clause |
  [ LATERAL ] table_valued_function [ table_alias ] |
  VALUES clause |
  [ LATERAL ] ( query ) [ TABLESAMPLE clause ] [ table_alias ] }

named_expression
   expression [ column_alias ]

star_clause
   [ { table_name | view_name } . ] * [ except_clause ]

except_clause
   EXCEPT ( { column_name | field_name } [, ...] )

Parameters

  • hints

    Hints help the Databricks Runtime optimizer make better planning decisions. Databricks Runtime supports hints that influence selection of join strategies and repartitioning of the data.

  • ALL

    Select all matching rows from the relation. Enabled by default.

  • DISTINCT

    Select all matching rows from the relation after removing duplicates in results.

  • named_expression

    An expression with an optional assigned name.

    • expression

      A combination of one or more values, operators, and SQL functions that evaluates to a value.

    • column_alias

      An optional column identifier naming the expression result. If no column_alias is provided Databricks Runtime derives one.

  • star_clause

    A shorthand to name all the referencable columns in the FROM clause. The list of columns is ordered by the order of from_items and the order of columns within each from_item.

    The _metadata column is not included this list. You must reference it explicitly.

    • table_name

      If present limits the columns to be named to those in the specified referencable table.

    • view_name

      If specified limits the columns to be expanded to those in the specified referencable view.

  • except_clause

    Since: Databricks Runtime 11.0

    Optionally prunes columns or fields from the referencable set of columns identified in the select_star clause.

    • column_name

      A column that is part of the set of columns that you can reference.

    • field_name

      A reference to a field in a column of the set of columns that you can reference. If you exclude all fields from a STRUCT, the result is an empty STRUCT.

    Each name must reference a column included in the set of columns that you can reference or their fields. Otherwise, Databricks Runtime raises a UNRESOLVED_COLUMN error. If names overlap or are not unique, Databricks Runtime raises an EXCEPT_OVERLAPPING_COLUMNS error.

  • from_item

    A source of input for the SELECT. One of the following:

    • table_name

      Identifies a table that may contain a temporal specification. See Query an older snapshot of a table (time travel) for details.

    • view_name

      Identifies a view.

    • JOIN

      Combines two or more relations using a join.

    • [LATERAL] table_valued_function

      Invokes a table function. To refer to columns exposed by a preceding from_item in the same FROM clause you must specify LATERAL.

    • VALUES

      Defines an inline table.

    • [LATERAL] ( query )

      Computes a relation using a query. A query prefixed by LATERAL may reference columns exposed by a preceding from_item in the same FROM clause. Such a construct is called a correlated or dependent query.

      LATERAL is supported since Databricks Runtime 9.0.

    • TABLESAMPLE

      Optionally reduce the size of the result set by only sampling a fraction of the rows.

    • table_alias

      Optionally specifies a label for the from_item. If the table_alias includes column_identifiers their number must match the number of columns in the from_item.

  • PIVOT

    Used for data perspective; you can get the aggregated values based on specific column value.

  • LATERAL VIEW

    Used in conjunction with generator functions such as EXPLODE, which generates a virtual table containing one or more rows. LATERAL VIEW applies the rows to each original output row.

  • WHERE

    Filters the result of the FROM clause based on the supplied predicates.

  • GROUP BY

    The expressions that are used to group the rows. This is used in conjunction with aggregate functions (MIN, MAX, COUNT, SUM, AVG) to group rows based on the grouping expressions and aggregate values in each group. When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.

  • HAVING

    The predicates by which the rows produced by GROUP BY are filtered. The HAVING clause is used to filter rows after the grouping is performed. If you specify HAVING without GROUP BY, it indicates a GROUP BY without grouping expressions (global aggregate).

  • QUALIFY

    The predicates that are used to filter the results of window functions. To use QUALIFY, at least one window function is required to be present in the SELECT list or the QUALIFY clause.

Select on Delta table

In addition to the standard SELECT options, Delta tables support the time travel options described in this section. For details, see Query an older snapshot of a table (time travel).

AS OF syntax

table_identifier TIMESTAMP AS OF timestamp_expression

table_identifier VERSION AS OF version
  • timestamp_expression can be any one of:

    • '2018-10-18T22:15:12.013Z', that is, a string that can be cast to a timestamp

    • cast('2018-10-18 13:36:32 CEST' as timestamp)

    • '2018-10-18', that is, a date string

    • current_timestamp() - interval 12 hours

    • date_sub(current_date(), 1)

    • Any other expression that is or can be cast to a timestamp

  • version is a long value that can be obtained from the output of DESCRIBE HISTORY table_spec.

Neither timestamp_expression nor version can be subqueries.

Example

> SELECT * FROM events TIMESTAMP AS OF '2018-10-18T22:15:12.013Z'

> SELECT * FROM delta.`/mnt/delta/events` VERSION AS OF 123

@ syntax

Use the @ syntax to specify the timestamp or version. The timestamp must be in yyyyMMddHHmmssSSS format. You can specify a version after @ by prepending a v to the version. For example, to query version 123 for the table events, specify events@v123.

Example

> SELECT * FROM events@20190101000000000

> SELECT * FROM events@v123

Examples

-- select all referencable columns from all tables
> SELECT * FROM VALUES(1, 2) AS t1(c1, c2), VALUES(3, 4) AS t2(c3, c4);
  1   2   3   4

-- select all referencable columns from one table
> SELECT t2.* FROM VALUES(1, 2) AS t1(c1, c2), VALUES(3, 4) AS t2(c3, c4);
  3   4

-- select all referencable columns from all tables except t2.c4
> SELECT * EXCEPT(c4) FROM VALUES(1, 2) AS t1(c1, c2), VALUES(3, 4) AS t2(c3, c4);
  1   2   3

-- select all referencable columns from a table, except a nested field.
> SELECT * EXCEPT(c2.b) FROM VALUES(1, named_struct('a', 2, 'b', 3)) AS t(c1, c2);
  1  { "a" : 2 }

-- Removing all fields results in an empty struct
> SELECT * EXCEPT(c2.b, c2.a) FROM VALUES(1, named_struct('a', 2, 'b', 3)) AS t(c1, c2);
  1  { }

-- Overlapping names result in an error
> SELECT * EXCEPT(c2, c2.a) FROM VALUES(1, named_struct('a', 2, 'b', 3)) AS t(c1, c2);
  Error: EXCEPT_OVERLAPPING_COLUMNS