ANALYZE TABLE
Applies to: Databricks SQL
Databricks Runtime
The ANALYZE TABLE
statement collects statistics about one specific table or all the tables in one specified schema,
that are to be used by the query optimizer to find a better query execution plan.
Syntax
ANALYZE TABLE table_name [ PARTITION clause ]
COMPUTE STATISTICS [ NOSCAN | FOR COLUMNS col1 [, ...] | FOR ALL COLUMNS ]
ANALYZE TABLES [ { FROM | IN } schema_name ] COMPUTE STATISTICS [ NOSCAN ]
Parameters
-
Identifies the table to be analyzed. The name must not include a temporal specification or path. If the table cannot be found Databricks raises a TABLE_OR_VIEW_NOT_FOUND error.
-
Optionally limits the command to a subset of partitions.
This clause is not supported for Delta Lake tables.
[ NOSCAN | FOR COLUMNS col [, …] | FOR ALL COLUMNS ]
If no analyze option is specified,
ANALYZE TABLE
collects the table’s number of rows and size in bytes.NOSCAN
Collect only the table’s size in bytes ( which does not require scanning the entire table ).
FOR COLUMNS col [, …] | FOR ALL COLUMNS
Collect column statistics for each column specified, or alternatively for every column, as well as table statistics.
Column statistics are not supported in combination with the
PARTITION
clause.
{ FROM
|
IN } schema_nameSpecifies the name of the schema to be analyzed. Without a schema name,
ANALYZE TABLES
collects all tables in the current schema that the current user has permission to analyze.
Examples
> CREATE TABLE students (name STRING, student_id INT) PARTITIONED BY (student_id);
> INSERT INTO students PARTITION (student_id = 111111) VALUES ('Mark');
> INSERT INTO students PARTITION (student_id = 222222) VALUES ('John');
> ANALYZE TABLE students COMPUTE STATISTICS NOSCAN;
> DESC EXTENDED students;
col_name data_type comment
-------------------- -------------------- -------
name string null
student_id int null
... ... ...
Statistics 864 bytes
... ... ...
> ANALYZE TABLE students COMPUTE STATISTICS;
> DESC EXTENDED students;
col_name data_type comment
-------------------- -------------------- -------
name string null
student_id int null
... ... ...
Statistics 864 bytes, 2 rows
... ... ...
> ANALYZE TABLE students PARTITION (student_id = 111111) COMPUTE STATISTICS;
> DESC EXTENDED students PARTITION (student_id = 111111);
col_name data_type comment
-------------------- -------------------- -------
name string null
student_id int null
... ... ...
Partition Statistics 432 bytes, 1 rows
... ... ...
OutputFormat org.apache.hadoop...
> ANALYZE TABLE students COMPUTE STATISTICS FOR COLUMNS name;
> DESC EXTENDED students name;
info_name info_value
-------------- ----------
col_name name
data_type string
comment NULL
min NULL
max NULL
num_nulls 0
distinct_count 2
avg_col_len 4
max_col_len 4
histogram NULL
> ANALYZE TABLES IN school_schema COMPUTE STATISTICS NOSCAN;
> DESC EXTENDED teachers;
col_name data_type comment
-------------------- -------------------- -------
name string null
teacher_id int null
... ... ...
Statistics 1382 bytes
... ... ...
> DESC EXTENDED students;
col_name data_type comment
-------------------- -------------------- -------
name string null
student_id int null
... ... ...
Statistics 864 bytes
... ... ...
> ANALYZE TABLES COMPUTE STATISTICS;
> DESC EXTENDED teachers;
col_name data_type comment
-------------------- -------------------- -------
name string null
teacher_id int null
... ... ...
Statistics 1382 bytes, 2 rows
... ... ...
> DESC EXTENDED students;
col_name data_type comment
-------------------- -------------------- -------
name string null
student_id int null
... ... ...
Statistics 864 bytes, 2 rows
... ... ...