Photon runtime

Photon is the native vectorized query engine on Databricks, written to be directly compatible with Apache Spark APIs so it works with your existing code. It is developed in C++ to take advantage of modern hardware, and uses the latest techniques in vectorized query processing to capitalize on data- and instruction-level parallelism in CPUs, enhancing performance on real-world data and applications-—all natively on your data lake. Photon is part of a high-performance runtime that runs your existing SQL and DataFrame API calls faster and reduces your total cost per workload. Photon is used by default in Databricks SQL warehouses.

Databricks clusters

Photon is available for clusters running Databricks Runtime 9.1 LTS and above.

To enable Photon acceleration, select the Use Photon Acceleration checkbox when you create the cluster. If you create the cluster using the clusters API, set runtime_engine to PHOTON.

Photon supports a number of instance types on the driver and worker nodes. Photon instance types consume DBUs at a different rate than the same instance type running the non-Photon runtime. For more information about Photon instances and DBU consumption, see the Databricks pricing page.

Photon advantages

  • Supports SQL and equivalent DataFrame operations against Delta and Parquet tables.

  • Accelerates queries that process a significant amount of data (100GB+) and include aggregations and joins.

  • Faster performance when data is accessed repeatedly from the disk cache.

  • More robust scan performance on tables with many columns and many small files.

  • Faster Delta and Parquet writing using UPDATE, DELETE, MERGE INTO, INSERT, and CREATE TABLE AS SELECT, especially for wide tables (hundreds to thousands of columns).

  • Replaces sort-merge joins with hash-joins.

Photon coverage

Operators

  • Scan, Filter, Project

  • Hash Aggregate/Join/Shuffle

  • Nested-Loop Join

  • Null-Aware Anti Join

  • Union, Expand, ScalarSubquery

  • Delta/Parquet Write Sink

  • Sort

  • Window Function

Expressions

  • Comparison / Logic

  • Arithmetic / Math (most)

  • Conditional (IF, CASE, etc.)

  • String (common ones)

  • Casts

  • Aggregates(most common ones)

  • Date/Timestamp

Data types

  • Byte/Short/Int/Long

  • Boolean

  • String/Binary

  • Decimal

  • Float/Double

  • Date/Timestamp

  • Struct

  • Array

  • Map

The following table lists supported Databricks expressions, and the minimum Databricks Runtime release version that supports it.

Name

Release

Abs

Databricks Runtime 8.3

Acos

Databricks Runtime 10.4 LTS

Add

Databricks Runtime 8.3

AddMonths

Databricks Runtime 8.3

AesDecrypt

Databricks Runtime 10.4 LTS

AesEncrypt

Databricks Runtime 10.4 LTS

And

Databricks Runtime 8.3

ArrayContains

Databricks Runtime 8.3

ArrayDistinct

Databricks Runtime 10.0

ArrayExcept

Databricks Runtime 10.1

ArrayExists

Databricks Runtime 10.4 LTS

ArrayFilter

Databricks Runtime 10.4 LTS

ArrayForAll

Databricks Runtime 10.4 LTS

ArrayIntersect

Databricks Runtime 10.1

ArrayJoin

Databricks Runtime 10.4 LTS

ArraySize

Databricks Runtime 10.4 LTS

ArrayTransform

Databricks Runtime 10.4 LTS

ArrayUnion

Databricks Runtime 10.1

Atan

Databricks Runtime 9.1 LTS

Atan2

Databricks Runtime 9.1 LTS

Average

Databricks Runtime 8.3

Base64

Databricks Runtime 9.1 LTS

Bin

Databricks Runtime 10.0

BitAndAgg

Databricks Runtime 8.3

BitLength

Databricks Runtime 11.3 LTS

BitOrAgg

Databricks Runtime 8.3

BitwiseAnd

Databricks Runtime 8.3

BitwiseNot

Databricks Runtime 8.3

BitwiseOr

Databricks Runtime 8.3

BitwiseReverse

Databricks Runtime 8.3

BitwiseXor

Databricks Runtime 8.3

BitXorAgg

Databricks Runtime 8.3

BoundaryAsGeojson

Databricks Runtime 11.3 LTS

BoundaryAsWkb

Databricks Runtime 11.3 LTS

BoundaryAsWkt

Databricks Runtime 11.3 LTS

Cast

Databricks Runtime 8.3

Cbrt

Databricks Runtime 8.4

CeilExpressionBuilder

Databricks Runtime 8.3

CenterAsGeojson

Databricks Runtime 11.3 LTS

CenterAsWkb

Databricks Runtime 11.3 LTS

CenterAsWkt

Databricks Runtime 11.3 LTS

Chr

Databricks Runtime 10.1

Coalesce

Databricks Runtime 8.3

CollectList

Databricks Runtime 9.0

Concat

Databricks Runtime 8.3

ConcatWs

Databricks Runtime 8.3

Conv

Databricks Runtime 8.3

Cos

Databricks Runtime 10.4 LTS

Count

Databricks Runtime 8.3

CreateArray

Databricks Runtime 8.3

CreateMap

Databricks Runtime 8.4

CreateNamedStruct

Databricks Runtime 8.3

CreateStruct

Databricks Runtime 8.3

CurrentCatalog

Databricks Runtime 8.3

CurrentDatabase

Databricks Runtime 8.3

CurrentDate

Databricks Runtime 8.3

CurrentTimestamp

Databricks Runtime 8.3

CurrentTimeZone

Databricks Runtime 8.3

CurrentUser

Databricks Runtime 8.3

DateAdd

Databricks Runtime 8.3

DateDiff

Databricks Runtime 8.3

DateFormatClass

Databricks Runtime 8.3

DateFromUnixDate

Databricks Runtime 8.3

DateSub

Databricks Runtime 8.3

DayOfMonth

Databricks Runtime 8.3

DayOfWeek

Databricks Runtime 8.3

DayOfYear

Databricks Runtime 8.3

Decode

Databricks Runtime 8.3

DenseRank

Databricks Runtime 10.4 LTS

Divide

Databricks Runtime 8.3

ElementAt

Databricks Runtime 8.3

EqualNullSafe

Databricks Runtime 8.3

EqualTo

Databricks Runtime 8.3

Exp

Databricks Runtime 8.4

Explode

Databricks Runtime 8.4

Extract

Databricks Runtime 8.3

First

Databricks Runtime 8.3

FloorExpressionBuilder

Databricks Runtime 8.3

FromUnixTime

Databricks Runtime 8.3

FromUTCTimestamp*

Databricks Runtime 8.3

Get

Databricks Runtime 11.3 LTS

GetJsonObject

Databricks Runtime 11.2

GreaterThan

Databricks Runtime 8.3

GreaterThanOrEqual

Databricks Runtime 8.3

Greatest

Databricks Runtime 8.3

GridDistance

Databricks Runtime 11.3 LTS

H3ToString

Databricks Runtime 11.3 LTS

Hex

Databricks Runtime 9.1 LTS

Hour

Databricks Runtime 8.3

If

Databricks Runtime 8.3

In

Databricks Runtime 8.3

InitCap

Databricks Runtime 11.3 LTS

InputFileBlockLength

Databricks Runtime 8.3

InputFileBlockStart

Databricks Runtime 8.3

InputFileName

Databricks Runtime 8.3

InSet

Databricks Runtime 8.3

IntegralDivide

Databricks Runtime 8.3

IsChildOf

Databricks Runtime 11.3 LTS

IsNaN

Databricks Runtime 8.3

IsNotNull

Databricks Runtime 8.3

IsNull

Databricks Runtime 8.3

IsPentagon

Databricks Runtime 11.3 LTS

IsValid

Databricks Runtime 11.3 LTS

JsonToStructs

Databricks Runtime 11.2

Lag

Databricks Runtime 10.4 LTS

Last

Databricks Runtime 10.4 LTS

LastDay

Databricks Runtime 8.3

Lead

Databricks Runtime 10.4 LTS

Least

Databricks Runtime 8.3

Length

Databricks Runtime 8.3

LengthOfJsonArray

Databricks Runtime 11.1

LessThan

Databricks Runtime 8.3

Levenshtein

Databricks Runtime 10.1

Like

Databricks Runtime 8.3

Log

Databricks Runtime 8.3

Log2

Databricks Runtime 8.4

LongLatAsH3

Databricks Runtime 11.3 LTS

LongLatAsH3String

Databricks Runtime 11.3 LTS

Lower

Databricks Runtime 8.3

LPadExpressionBuilder

Databricks Runtime 8.3

MakeDate

Databricks Runtime 8.3

MakeTimestamp

Databricks Runtime 8.3

Max

Databricks Runtime 8.3

MaxChild

Databricks Runtime 11.3 LTS

Md5

Databricks Runtime 10.4 LTS

MicrosToTimestamp

Databricks Runtime 8.3

MillisToTimestamp

Databricks Runtime 8.3

Min

Databricks Runtime 8.3

MinChild

Databricks Runtime 11.3 LTS

Minute

Databricks Runtime 8.3

MonotonicallyIncreasingID

Databricks Runtime 8.3

Month

Databricks Runtime 8.3

MonthsBetween

Databricks Runtime 8.3

Multiply

Databricks Runtime 8.3

Murmur3Hash

Databricks Runtime 8.3

NaNvl

Databricks Runtime 8.3

NextDay

Databricks Runtime 8.3

Not

Databricks Runtime 8.3

Now

Databricks Runtime 8.3

NthValue

Databricks Runtime 10.4 LTS

NTile

Databricks Runtime 10.4 LTS

NullIf

Databricks Runtime 8.3

Nvl

Databricks Runtime 8.3

Nvl2

Databricks Runtime 8.3

OctetLength

Databricks Runtime 8.3

ParseToDate

Databricks Runtime 8.3

ParseToTimestamp

Databricks Runtime 8.3

Percentile

Databricks Runtime 10.4 LTS

PercentRank

Databricks Runtime 10.4 LTS

Pi

Databricks Runtime 8.3

Pmod

Databricks Runtime 8.3

PosExplode

Databricks Runtime 9.1 LTS

Pow

Databricks Runtime 8.3

Quarter

Databricks Runtime 8.3

Rand

Databricks Runtime 8.3

Rank

Databricks Runtime 10.4 LTS

RegExpExtract

Databricks Runtime 8.3

RegExpExtractAll

Databricks Runtime 11.1

RegExpReplace

Databricks Runtime 9.1 LTS

RegrAvgX

Databricks Runtime 10.5

RegrAvgY

Databricks Runtime 10.5

Remainder

Databricks Runtime 8.3

Resolution

Databricks Runtime 11.3 LTS

Reverse

Databricks Runtime 8.3

Reverse

Databricks Runtime 8.3

RLike

Databricks Runtime 8.3

Round

Databricks Runtime 8.3

RowNumber

Databricks Runtime 10.4 LTS

RPadExpressionBuilder

Databricks Runtime 8.3

Second

Databricks Runtime 8.3

SecondsToTimestamp

Databricks Runtime 8.3

Sha1

Databricks Runtime 10.4 LTS

Sha2

Databricks Runtime 10.4 LTS

ShiftLeft

Databricks Runtime 8.3

ShiftRight

Databricks Runtime 8.3

ShiftRightUnsigned

Databricks Runtime 8.3

Sin

Databricks Runtime 10.4 LTS

Size

Databricks Runtime 8.3

Slice

Databricks Runtime 8.3

SoundEx

Databricks Runtime 10.1

SparkVersion

Databricks Runtime 8.3

Sqrt

Databricks Runtime 8.4

StddevPop

Databricks Runtime 8.3

StddevSamp

Databricks Runtime 8.3

StringInstr

Databricks Runtime 8.3

StringLocate

Databricks Runtime 8.3

StringRepeat

Databricks Runtime 11.2

StringSpace

Databricks Runtime 8.3

StringSplit

Databricks Runtime 8.3

StringToH3

Databricks Runtime 11.3 LTS

StringTranslate

Databricks Runtime 10.4 LTS

StringTrim

Databricks Runtime 8.3

StringTrimBoth

Databricks Runtime 8.3

StringTrimLeft

Databricks Runtime 8.3

StringTrimRight

Databricks Runtime 8.3

StructsToJson

Databricks Runtime 11.1

Substring

Databricks Runtime 8.3

Subtract

Databricks Runtime 8.3

Sum

Databricks Runtime 8.3

Tan

Databricks Runtime 9.1 LTS

ToChildren

Databricks Runtime 11.3 LTS

ToParent

Databricks Runtime 11.3 LTS

ToRadians

Databricks Runtime 10.1

ToUnixTimestamp

Databricks Runtime 8.3

ToUTCTimestamp

Databricks Runtime 8.3

TruncDate

Databricks Runtime 8.3

TruncTimestamp

Databricks Runtime 8.3

TryElementAt

Databricks Runtime 10.0

TryValidate

Databricks Runtime 11.3 LTS

UnaryMinus

Databricks Runtime 8.3

UnBase64

Databricks Runtime 9.1 LTS

Unhex

Databricks Runtime 9.1 LTS

UnixDate

Databricks Runtime 8.3

UnixMicros

Databricks Runtime 8.3

UnixMillis

Databricks Runtime 8.3

UnixSeconds

Databricks Runtime 8.3

UnixTimestamp

Databricks Runtime 8.3

Upper

Databricks Runtime 8.3

Uuid

Databricks Runtime 8.3

Validate

Databricks Runtime 11.3 LTS

VarianceSamp

Databricks Runtime 10.1

WeekDay

Databricks Runtime 8.3

WeekOfYear

Databricks Runtime 8.3

XxHash64

Databricks Runtime 10.0

Year

Databricks Runtime 8.3

*from_utc_timestamp is not fully supported by Photon. See from_utc_timestamp for more information.

Limitations

  • Structured Streaming: Photon currently supports stateless streaming with Delta, Parquet, and CSV. Kafka and Kinesis support is in Public Preview

  • Does not support UDFs.

  • Does not support RDD APIs.

  • Not expected to improve short-running queries (<2 seconds), for example, queries against small amounts of data.

Features not supported by Photon run the same way they would with Databricks Runtime; there is no performance advantage for those features.