Databricks Runtime HLS includes a variety of tools for variant quality control.
This topic uses the terms “variant” or “variant data” to refer to single nucleodite variants and short indels.
You can calculate quality control statistics on your variant data using Spark SQL functions, which can be expressed in Python, R, Scala, or SQL.
||A struct with two elements: the expected heterozygous frequency according to Hardy-Weinberg equilibrium and the associated p-value.|
A struct containing the following summary stats:
||A struct containing the min, max, mean, and sample standard deviation for genotype depth (DP in VCF v4.2 specificiation) across all samples|
||A struct containing the min, max, mean, and sample standard deviation for genotype quality (GQ in VCF v4.2 specification) across all samples|