Photon runtime
Photon is the native vectorized query engine on Databricks, written to be directly compatible with Apache Spark APIs so it works with your existing code. It is developed in C++ to take advantage of modern hardware, and uses the latest techniques in vectorized query processing to capitalize on data- and instruction-level parallelism in CPUs, enhancing performance on real-world data and applications-—all natively on your data lake. Photon is part of a high-performance runtime that runs your existing SQL and DataFrame API calls faster and reduces your total cost per workload. Photon is used by default in Databricks SQL warehouses.
Databricks clusters
Photon is available for clusters running Databricks Runtime 9.1 LTS and above.
To enable Photon acceleration, select the Use Photon Acceleration checkbox when you create the cluster. If you create the cluster using the clusters API, set runtime_engine
to PHOTON
.
Photon supports a number of instance types on the driver and worker nodes. Photon instance types consume DBUs at a different rate than the same instance type running the non-Photon runtime. For more information about Photon instances and DBU consumption, see the Databricks pricing page.
Photon advantages
Supports SQL and equivalent DataFrame operations against Delta and Parquet tables.
Accelerates queries that process a significant amount of data (100GB+) and include aggregations and joins.
Faster performance when data is accessed repeatedly from the disk cache.
More robust scan performance on tables with many columns and many small files.
Faster Delta and Parquet writing using
UPDATE
,DELETE
,MERGE INTO
,INSERT
, andCREATE TABLE AS SELECT
, especially for wide tables (hundreds to thousands of columns).Replaces sort-merge joins with hash-joins.
Photon coverage
Operators
Scan, Filter, Project
Hash Aggregate/Join/Shuffle
Nested-Loop Join
Null-Aware Anti Join
Union, Expand, ScalarSubquery
Delta/Parquet Write Sink
Sort
Window Function
Expressions
Comparison / Logic
Arithmetic / Math (most)
Conditional (IF, CASE, etc.)
String (common ones)
Casts
Aggregates(most common ones)
Date/Timestamp
Data types
Byte/Short/Int/Long
Boolean
String/Binary
Decimal
Float/Double
Date/Timestamp
Struct
Array
Map
The following table lists supported Databricks expressions, and the minimum Databricks Runtime release version that supports it.
Name |
Release |
---|---|
Abs |
Databricks Runtime 8.3 |
Acos |
Databricks Runtime 10.4 LTS |
Add |
Databricks Runtime 8.3 |
AddMonths |
Databricks Runtime 8.3 |
AesDecrypt |
Databricks Runtime 10.4 LTS |
AesEncrypt |
Databricks Runtime 10.4 LTS |
And |
Databricks Runtime 8.3 |
ArrayContains |
Databricks Runtime 8.3 |
ArrayDistinct |
Databricks Runtime 10.0 |
ArrayExcept |
Databricks Runtime 10.1 |
ArrayExists |
Databricks Runtime 10.4 LTS |
ArrayFilter |
Databricks Runtime 10.4 LTS |
ArrayForAll |
Databricks Runtime 10.4 LTS |
ArrayIntersect |
Databricks Runtime 10.1 |
ArrayJoin |
Databricks Runtime 10.4 LTS |
ArraySize |
Databricks Runtime 10.4 LTS |
ArrayTransform |
Databricks Runtime 10.4 LTS |
ArrayUnion |
Databricks Runtime 10.1 |
Atan |
Databricks Runtime 9.1 LTS |
Atan2 |
Databricks Runtime 9.1 LTS |
Average |
Databricks Runtime 8.3 |
Base64 |
Databricks Runtime 9.1 LTS |
Bin |
Databricks Runtime 10.0 |
BitAndAgg |
Databricks Runtime 8.3 |
BitLength |
Databricks Runtime 11.3 LTS |
BitOrAgg |
Databricks Runtime 8.3 |
BitwiseAnd |
Databricks Runtime 8.3 |
BitwiseNot |
Databricks Runtime 8.3 |
BitwiseOr |
Databricks Runtime 8.3 |
BitwiseReverse |
Databricks Runtime 8.3 |
BitwiseXor |
Databricks Runtime 8.3 |
BitXorAgg |
Databricks Runtime 8.3 |
BoundaryAsGeojson |
Databricks Runtime 11.3 LTS |
BoundaryAsWkb |
Databricks Runtime 11.3 LTS |
BoundaryAsWkt |
Databricks Runtime 11.3 LTS |
Cast |
Databricks Runtime 8.3 |
Cbrt |
Databricks Runtime 8.4 |
CeilExpressionBuilder |
Databricks Runtime 8.3 |
CenterAsGeojson |
Databricks Runtime 11.3 LTS |
CenterAsWkb |
Databricks Runtime 11.3 LTS |
CenterAsWkt |
Databricks Runtime 11.3 LTS |
Chr |
Databricks Runtime 10.1 |
Coalesce |
Databricks Runtime 8.3 |
CollectList |
Databricks Runtime 9.0 |
Concat |
Databricks Runtime 8.3 |
ConcatWs |
Databricks Runtime 8.3 |
Conv |
Databricks Runtime 8.3 |
Cos |
Databricks Runtime 10.4 LTS |
Count |
Databricks Runtime 8.3 |
CreateArray |
Databricks Runtime 8.3 |
CreateMap |
Databricks Runtime 8.4 |
CreateNamedStruct |
Databricks Runtime 8.3 |
CreateStruct |
Databricks Runtime 8.3 |
CurrentCatalog |
Databricks Runtime 8.3 |
CurrentDatabase |
Databricks Runtime 8.3 |
CurrentDate |
Databricks Runtime 8.3 |
CurrentTimestamp |
Databricks Runtime 8.3 |
CurrentTimeZone |
Databricks Runtime 8.3 |
CurrentUser |
Databricks Runtime 8.3 |
DateAdd |
Databricks Runtime 8.3 |
DateDiff |
Databricks Runtime 8.3 |
DateFormatClass |
Databricks Runtime 8.3 |
DateFromUnixDate |
Databricks Runtime 8.3 |
DateSub |
Databricks Runtime 8.3 |
DayOfMonth |
Databricks Runtime 8.3 |
DayOfWeek |
Databricks Runtime 8.3 |
DayOfYear |
Databricks Runtime 8.3 |
Decode |
Databricks Runtime 8.3 |
DenseRank |
Databricks Runtime 10.4 LTS |
Divide |
Databricks Runtime 8.3 |
ElementAt |
Databricks Runtime 8.3 |
EqualNullSafe |
Databricks Runtime 8.3 |
EqualTo |
Databricks Runtime 8.3 |
Exp |
Databricks Runtime 8.4 |
Explode |
Databricks Runtime 8.4 |
Extract |
Databricks Runtime 8.3 |
First |
Databricks Runtime 8.3 |
FloorExpressionBuilder |
Databricks Runtime 8.3 |
FromUnixTime |
Databricks Runtime 8.3 |
FromUTCTimestamp* |
Databricks Runtime 8.3 |
Get |
Databricks Runtime 11.3 LTS |
GetJsonObject |
Databricks Runtime 11.2 |
GreaterThan |
Databricks Runtime 8.3 |
GreaterThanOrEqual |
Databricks Runtime 8.3 |
Greatest |
Databricks Runtime 8.3 |
GridDistance |
Databricks Runtime 11.3 LTS |
H3ToString |
Databricks Runtime 11.3 LTS |
Hex |
Databricks Runtime 9.1 LTS |
Hour |
Databricks Runtime 8.3 |
If |
Databricks Runtime 8.3 |
In |
Databricks Runtime 8.3 |
InitCap |
Databricks Runtime 11.3 LTS |
InputFileBlockLength |
Databricks Runtime 8.3 |
InputFileBlockStart |
Databricks Runtime 8.3 |
InputFileName |
Databricks Runtime 8.3 |
InSet |
Databricks Runtime 8.3 |
IntegralDivide |
Databricks Runtime 8.3 |
IsChildOf |
Databricks Runtime 11.3 LTS |
IsNaN |
Databricks Runtime 8.3 |
IsNotNull |
Databricks Runtime 8.3 |
IsNull |
Databricks Runtime 8.3 |
IsPentagon |
Databricks Runtime 11.3 LTS |
IsValid |
Databricks Runtime 11.3 LTS |
JsonToStructs |
Databricks Runtime 11.2 |
Lag |
Databricks Runtime 10.4 LTS |
Last |
Databricks Runtime 10.4 LTS |
LastDay |
Databricks Runtime 8.3 |
Lead |
Databricks Runtime 10.4 LTS |
Least |
Databricks Runtime 8.3 |
Length |
Databricks Runtime 8.3 |
LengthOfJsonArray |
Databricks Runtime 11.1 |
LessThan |
Databricks Runtime 8.3 |
Levenshtein |
Databricks Runtime 10.1 |
Like |
Databricks Runtime 8.3 |
Log |
Databricks Runtime 8.3 |
Log2 |
Databricks Runtime 8.4 |
LongLatAsH3 |
Databricks Runtime 11.3 LTS |
LongLatAsH3String |
Databricks Runtime 11.3 LTS |
Lower |
Databricks Runtime 8.3 |
LPadExpressionBuilder |
Databricks Runtime 8.3 |
MakeDate |
Databricks Runtime 8.3 |
MakeTimestamp |
Databricks Runtime 8.3 |
Max |
Databricks Runtime 8.3 |
MaxChild |
Databricks Runtime 11.3 LTS |
Md5 |
Databricks Runtime 10.4 LTS |
MicrosToTimestamp |
Databricks Runtime 8.3 |
MillisToTimestamp |
Databricks Runtime 8.3 |
Min |
Databricks Runtime 8.3 |
MinChild |
Databricks Runtime 11.3 LTS |
Minute |
Databricks Runtime 8.3 |
MonotonicallyIncreasingID |
Databricks Runtime 8.3 |
Month |
Databricks Runtime 8.3 |
MonthsBetween |
Databricks Runtime 8.3 |
Multiply |
Databricks Runtime 8.3 |
Murmur3Hash |
Databricks Runtime 8.3 |
NaNvl |
Databricks Runtime 8.3 |
NextDay |
Databricks Runtime 8.3 |
Not |
Databricks Runtime 8.3 |
Now |
Databricks Runtime 8.3 |
NthValue |
Databricks Runtime 10.4 LTS |
NTile |
Databricks Runtime 10.4 LTS |
NullIf |
Databricks Runtime 8.3 |
Nvl |
Databricks Runtime 8.3 |
Nvl2 |
Databricks Runtime 8.3 |
OctetLength |
Databricks Runtime 8.3 |
ParseToDate |
Databricks Runtime 8.3 |
ParseToTimestamp |
Databricks Runtime 8.3 |
Percentile |
Databricks Runtime 10.4 LTS |
PercentRank |
Databricks Runtime 10.4 LTS |
Pi |
Databricks Runtime 8.3 |
Pmod |
Databricks Runtime 8.3 |
PosExplode |
Databricks Runtime 9.1 LTS |
Pow |
Databricks Runtime 8.3 |
Quarter |
Databricks Runtime 8.3 |
Rand |
Databricks Runtime 8.3 |
Rank |
Databricks Runtime 10.4 LTS |
RegExpExtract |
Databricks Runtime 8.3 |
RegExpExtractAll |
Databricks Runtime 11.1 |
RegExpReplace |
Databricks Runtime 9.1 LTS |
RegrAvgX |
Databricks Runtime 10.5 |
RegrAvgY |
Databricks Runtime 10.5 |
Remainder |
Databricks Runtime 8.3 |
Resolution |
Databricks Runtime 11.3 LTS |
Reverse |
Databricks Runtime 8.3 |
Reverse |
Databricks Runtime 8.3 |
RLike |
Databricks Runtime 8.3 |
Round |
Databricks Runtime 8.3 |
RowNumber |
Databricks Runtime 10.4 LTS |
RPadExpressionBuilder |
Databricks Runtime 8.3 |
Second |
Databricks Runtime 8.3 |
SecondsToTimestamp |
Databricks Runtime 8.3 |
Sha1 |
Databricks Runtime 10.4 LTS |
Sha2 |
Databricks Runtime 10.4 LTS |
ShiftLeft |
Databricks Runtime 8.3 |
ShiftRight |
Databricks Runtime 8.3 |
ShiftRightUnsigned |
Databricks Runtime 8.3 |
Sin |
Databricks Runtime 10.4 LTS |
Size |
Databricks Runtime 8.3 |
Slice |
Databricks Runtime 8.3 |
SoundEx |
Databricks Runtime 10.1 |
SparkVersion |
Databricks Runtime 8.3 |
Sqrt |
Databricks Runtime 8.4 |
StddevPop |
Databricks Runtime 8.3 |
StddevSamp |
Databricks Runtime 8.3 |
StringInstr |
Databricks Runtime 8.3 |
StringLocate |
Databricks Runtime 8.3 |
StringRepeat |
Databricks Runtime 11.2 |
StringSpace |
Databricks Runtime 8.3 |
StringSplit |
Databricks Runtime 8.3 |
StringToH3 |
Databricks Runtime 11.3 LTS |
StringTranslate |
Databricks Runtime 10.4 LTS |
StringTrim |
Databricks Runtime 8.3 |
StringTrimBoth |
Databricks Runtime 8.3 |
StringTrimLeft |
Databricks Runtime 8.3 |
StringTrimRight |
Databricks Runtime 8.3 |
StructsToJson |
Databricks Runtime 11.1 |
Substring |
Databricks Runtime 8.3 |
Subtract |
Databricks Runtime 8.3 |
Sum |
Databricks Runtime 8.3 |
Tan |
Databricks Runtime 9.1 LTS |
ToChildren |
Databricks Runtime 11.3 LTS |
ToParent |
Databricks Runtime 11.3 LTS |
ToRadians |
Databricks Runtime 10.1 |
ToUnixTimestamp |
Databricks Runtime 8.3 |
ToUTCTimestamp |
Databricks Runtime 8.3 |
TruncDate |
Databricks Runtime 8.3 |
TruncTimestamp |
Databricks Runtime 8.3 |
TryElementAt |
Databricks Runtime 10.0 |
TryValidate |
Databricks Runtime 11.3 LTS |
UnaryMinus |
Databricks Runtime 8.3 |
UnBase64 |
Databricks Runtime 9.1 LTS |
Unhex |
Databricks Runtime 9.1 LTS |
UnixDate |
Databricks Runtime 8.3 |
UnixMicros |
Databricks Runtime 8.3 |
UnixMillis |
Databricks Runtime 8.3 |
UnixSeconds |
Databricks Runtime 8.3 |
UnixTimestamp |
Databricks Runtime 8.3 |
Upper |
Databricks Runtime 8.3 |
Uuid |
Databricks Runtime 8.3 |
Validate |
Databricks Runtime 11.3 LTS |
VarianceSamp |
Databricks Runtime 10.1 |
WeekDay |
Databricks Runtime 8.3 |
WeekOfYear |
Databricks Runtime 8.3 |
XxHash64 |
Databricks Runtime 10.0 |
Year |
Databricks Runtime 8.3 |
*from_utc_timestamp is not fully supported by Photon. See from_utc_timestamp for more information.
Limitations
Structured Streaming: Photon currently supports stateless streaming with Delta, Parquet, and CSV. Kafka and Kinesis support is in Public Preview
Does not support UDFs.
Does not support RDD APIs.
Not expected to improve short-running queries (<2 seconds), for example, queries against small amounts of data.
Features not supported by Photon run the same way they would with Databricks Runtime; there is no performance advantage for those features.