Link Search Menu Expand Document

Apache Spark supports processing various types of data. Not all expressions support all data types. The RAPIDS Accelerator for Apache Spark has further restrictions on what types are supported for processing. This tries to document what operations are supported and what data types each operation supports. Because Apache Spark is under active development too and this document was generated against version 3.0.1 of Spark. Most of this should still apply to other versions of Spark, but there may be slight changes.

General limitations

Decimal

The Decimal type in Spark supports a precision up to 38 digits (128-bits). The RAPIDS Accelerator stores values up to 64-bits and as such only supports a precision up to 18 digits. Note that decimals are disabled by default in the plugin, because it is supported by a small number of operations presently, which can result in a lot of data movement to and from the GPU, slowing down processing in some cases. Result Decimal precision and scale follow the same rule as CPU mode in Apache Spark:

 * In particular, if we have expressions e1 and e2 with precision/scale p1/s1 and p2/s2
 * respectively, then the following operations have the following precision / scale:
 *
 *   Operation    Result Precision                        Result Scale
 *   ------------------------------------------------------------------------
 *   e1 + e2      max(s1, s2) + max(p1-s1, p2-s2) + 1     max(s1, s2)
 *   e1 - e2      max(s1, s2) + max(p1-s1, p2-s2) + 1     max(s1, s2)
 *   e1 * e2      p1 + p2 + 1                             s1 + s2
 *   e1 / e2      p1 - s1 + s2 + max(6, s1 + p2 + 1)      max(6, s1 + p2 + 1)
 *   e1 % e2      min(p1-s1, p2-s2) + max(s1, s2)         max(s1, s2)
 *   e1 union e2  max(s1, s2) + max(p1-s1, p2-s2)         max(s1, s2)

However Spark inserts PromotePrecision to CAST both sides to the same type. GPU mode may fall back to CPU even if the result Decimal precision is within 18 digits. For example, Decimal(8,2) x Decimal(6,3) resulting in Decimal (15,5) runs on CPU, because due to PromotePrecision, GPU mode assumes the result is Decimal(19,6).

Timestamp

Timestamps in Spark will all be converted to the local time zone before processing and are often converted to UTC before being stored, like in Parquet or ORC. The RAPIDS Accelerator only supports UTC as the time zone for timestamps.

CalendarInterval

In Spark CalendarIntervals store three values, months, days, and microseconds. Support for this type is still very limited in the accelerator. In some cases only a a subset of the type is supported, like window ranges only support days currently.

Configuration

There are lots of different configuration values that can impact if an operation is supported or not. Some of these are a part of the RAPIDS Accelerator and cover the level of compatibility with Apache Spark. Those are covered here. Others are a part of Apache Spark itself and those are a bit harder to document. The work of updating this to cover that support is still ongoing.

In general though if you ever have any question about why an operation is not running on the GPU you may set spark.rapids.sql.explain to ALL and it will try to give all of the reasons why this particular operator or expression is on the CPU or GPU.

Key

Types

Type Name Type Description
BOOLEAN Holds true or false values.
BYTE Signed 8-bit integer value.
SHORT Signed 16-bit integer value.
INT Signed 32-bit integer value.
LONG Signed 64-bit integer value.
FLOAT 32-bit floating point value.
DOUBLE 64-bit floating point value.
DATE A date with no time component. Stored as 32-bit integer with days since Jan 1, 1970.
TIMESTAMP A date and time. Stored as 64-bit integer with microseconds since Jan 1, 1970 in the current time zone.
STRING A text string. Stored as UTF-8 encoded bytes.
DECIMAL A fixed point decimal value with configurable precision and scale.
NULL Only stores null values and is typically only used when no other type can be determined from the SQL.
BINARY An array of non-nullable bytes.
CALENDAR Represents a period of time. Stored as months, days and microseconds.
ARRAY A sequence of elements.
MAP A set of key value pairs, the keys cannot be null.
STRUCT A series of named fields.
UDT User defined types and java Objects. These are not standard SQL types.

Support

Value Description
S (Supported) Both Apache Spark and the RAPIDS Accelerator support this type.
S* (Supported with limitations) Typically this refers to general limitations with Timestamp or Decimal
  (Not Applicable) Neither Spark not the RAPIDS Accelerator support this type in this situation.
PS (Partial Support) Apache Spark supports this type, but the RAPIDS Accelerator only partially supports it. An explanation for what is missing will be included with this.
PS* (Partial Support with limitations) Like regular Partial Support but with general limitations on Timestamp or Decimal types.
NS (Not Supported) Apache Spark supports this type but the RAPIDS Accelerator does not.

SparkPlan or Executor Nodes

Apache Spark uses a Directed Acyclic Graph(DAG) of processing to build a query. The nodes in this graph are instances of SparkPlan and represent various high level operations like doing a filter or project. The operations that the RAPIDS Accelerator supports are described below.

Executor Description Notes BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
CoalesceExec The backend for the dataframe coalesce method None S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
CollectLimitExec Reduce to single partition and apply limit This is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU S S S S S S S S S* S S* S NS NS NS NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
ExpandExec The backend for the expand operator None S S S S S S S S S* S S* S NS NS NS NS NS NS
FileSourceScanExec Reading data from files, often from Hive tables None S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
FilterExec The backend for most filter statements None S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
GenerateExec The backend for operations that generate more output rows than input rows like explode None S S S S S S S S S* S S* NS NS NS PS* (missing nested BINARY, CALENDAR, MAP, STRUCT, UDT) NS NS NS
GlobalLimitExec Limiting of results across partitions None S S S S S S S S S* S S* S NS NS NS NS NS NS
LocalLimitExec Per-partition limiting of results None S S S S S S S S S* S S* S NS NS NS NS NS NS
ProjectExec The backend for most select, withColumn and dropColumn statements None S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
RangeExec The backend for range operator None S
SortExec The backend for the sort operator None S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, MAP, UDT) NS PS* (missing nested BINARY, CALENDAR, MAP, UDT) NS
TakeOrderedAndProjectExec Take the first limit elements as defined by the sortOrder, and do projection if needed. None S S S S S S S S S* S S* S NS NS NS NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
UnionExec The backend for the union operator None S S S S S S S S S* S S* S NS NS NS NS PS* (unionByName will not optionally impute nulls for missing struct fields when the column is a struct and there are non-overlapping fields; missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
CustomShuffleReaderExec A wrapper of shuffle query stage None S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, MAP, UDT) NS PS* (missing nested BINARY, CALENDAR, MAP, UDT) NS
HashAggregateExec The backend for hash based aggregations None S S S S S S S S S* S S* S NS NS PS* (not allowed for grouping expressions; missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) PS* (not allowed for grouping expressions; missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) PS* (not allowed for grouping expressions; missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
Executor Description Notes BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
SortAggregateExec The backend for sort based aggregations None S S S S S S S S S* S S* S NS NS NS PS (missing nested BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS NS
DataWritingCommandExec Writing data None S S S S S S S S S* S PS* (Only supported for Parquet) NS NS NS NS NS PS* (Only supported for Parquet; missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT) NS
BatchScanExec The backend for most file input None S S S S S S S S S* S S* NS NS NS PS* (missing nested NULL, BINARY, CALENDAR, UDT) PS* (missing nested NULL, BINARY, CALENDAR, UDT) PS* (missing nested NULL, BINARY, CALENDAR, UDT) NS
BroadcastExchangeExec The backend for broadcast exchange of data None S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
ShuffleExchangeExec The backend for most data being exchanged between processes None S S S S S S S S S* S S* S NS NS PS* (Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true; missing nested BINARY, CALENDAR, MAP, UDT) PS* (Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true; missing nested BINARY, CALENDAR, MAP, UDT) PS* (Round-robin partitioning is not supported for nested structs if spark.sql.execution.sortBeforeRepartition is true; missing nested BINARY, CALENDAR, MAP, UDT) NS
BroadcastHashJoinExec Implementation of join using broadcast data None S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
BroadcastNestedLoopJoinExec Implementation of join using brute force None S S S S S S S S S* S S* S NS NS NS NS NS NS
CartesianProductExec Implementation of join using brute force None S S S S S S S S S* S S* S NS NS NS NS NS NS
ShuffledHashJoinExec Implementation of join using hashed shuffled data None S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
SortMergeJoinExec Sort merge join, replacing with shuffled hash join None S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
AggregateInPandasExec The backend for an Aggregation Pandas UDF, this accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. None S S S S S S S S S* S NS NS NS NS NS NS NS NS
ArrowEvalPythonExec The backend of the Scalar Pandas UDFs. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled None S S S S S S S S S* S NS NS NS NS PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) NS PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) NS
FlatMapGroupsInPandasExec The backend for Flat Map Groups Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. None S S S S S S S S S* S NS NS NS NS NS NS NS NS
MapInPandasExec The backend for Map Pandas Iterator UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. None S S S S S S S S S* S NS NS NS NS PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) NS PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) NS
WindowInPandasExec The backend for Window Aggregation Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. For now it only supports row based window frame. This is disabled by default because it only supports row based frame for now S S S S S S S S S* S NS NS NS NS PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS NS NS
Executor Description Notes BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
WindowExec Window-operator backend None S S S S S S S S S* S S* NS NS NS PS* (missing nested NULL, BINARY, CALENDAR, MAP, UDT) NS PS* (missing nested NULL, BINARY, CALENDAR, MAP, UDT) NS
  • As was stated previously Decimal is only supported up to a precision of 18 and Timestamp is only supported in the UTC time zone. Decimals are off by default due to performance impact in some cases.

Expression and SQL Functions

Inside each node in the DAG there can be one or more trees of expressions that describe various types of processing that happens in that part of the plan. These can be things like adding two numbers together or checking for null. These expressions can have multiple input parameters and one output value. These expressions also can happen in different contexts. Because of how the accelerator works different contexts have different levels of support.

The most common expression context is project. In this context values from a single input row go through the expression and the result will also be use to produce something in the same row. Be aware that even in the case of aggregation and window operations most of the processing is still done in the project context either before or after the other processing happens.

Aggregation operations like count or sum can take place in either the aggregation, reduction, or window context. aggregation is when the operation was done while grouping the data by one or more keys. reduction is when there is no group by and there is a single result for an entire column. window is for window operations.

The final expression context is lambda which happens primarily for higher order functions in SQL. Accelerator support is described below.

Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Abs `abs` Absolute value None project input S S S S S S S*
result S S S S S S S*
lambda input NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS
Acos `acos` Inverse cosine None project input S
result S
lambda input NS
result NS
Acosh `acosh` Inverse hyperbolic cosine None project input S
result S
lambda input NS
result NS
Add `+` Addition None project lhs S S S S S S S* NS
rhs S S S S S S S* NS
result S S S S S S S* NS
lambda lhs NS NS NS NS NS NS NS NS
rhs NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Alias Gives a column a name None project input S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
result S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
lambda input NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
And `and` Logical AND None project lhs S
rhs S
result S
lambda lhs NS
rhs NS
result NS
ArrayContains `array_contains` Returns a boolean if the array contains the passed in key None project array PS* (missing nested DECIMAL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)
key S S S S S S S S S* S NS NS NS NS NS NS NS NS
result S
lambda array NS
key NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Asin `asin` Inverse sine None project input S
result S
lambda input NS
result NS
Asinh `asinh` Inverse hyperbolic sine None project input S
result S
lambda input NS
result NS
AtLeastNNonNulls Checks if number of non null/Nan values is greater than a given value None project input S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
result S
lambda input NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS
Atan `atan` Inverse tangent None project input S
result S
lambda input NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Atanh `atanh` Inverse hyperbolic tangent None project input S
result S
lambda input NS
result NS
AttributeReference References an input column None project result S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
lambda result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
BRound `bround` Round an expression to d decimal places using HALF_EVEN rounding mode None project value S S S S PS (result may round slightly differently) PS (result may round slightly differently) S*
scale S
result S S S S S S S*
lambda value NS NS NS NS NS NS NS
scale NS
result NS NS NS NS NS NS NS
BitwiseAnd `&` Returns the bitwise AND of the operands None project lhs S S S S
rhs S S S S
result S S S S
lambda lhs NS NS NS NS
rhs NS NS NS NS
result NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
BitwiseNot `~` Returns the bitwise NOT of the operands None project input S S S S
result S S S S
lambda input NS NS NS NS
result NS NS NS NS
BitwiseOr `\|` Returns the bitwise OR of the operands None project lhs S S S S
rhs S S S S
result S S S S
lambda lhs NS NS NS NS
rhs NS NS NS NS
result NS NS NS NS
BitwiseXor `^` Returns the bitwise XOR of the operands None project lhs S S S S
rhs S S S S
result S S S S
lambda lhs NS NS NS NS
rhs NS NS NS NS
result NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
CaseWhen `when` CASE WHEN expression None project predicate S
value S S S S S S S S S* S S* S NS NS PS* (missing nested DECIMAL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS NS NS
result S S S S S S S S S* S S* S NS NS PS* (missing nested DECIMAL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS NS NS
lambda predicate NS
value NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Cbrt `cbrt` Cube root None project input S
result S
lambda input NS
result NS
Ceil `ceiling`, `ceil` Ceiling of a number None project input S S S*
result S S S*
lambda input NS NS NS
result NS NS NS
CheckOverflow CheckOverflow after arithmetic operations between DecimalType data None project input S*
result S*
lambda input NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Coalesce `coalesce` Returns the first non-null argument if exists. Otherwise, null None project param S S S S S S S S S* S S* S NS NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS NS NS NS NS
lambda param NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Concat `concat` List/String concatenate None project input S NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)
result S NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)
lambda input NS NS NS
result NS NS NS
ConcatWs `concat_ws` Concatenates multiple input strings or array of strings into a single string using a given separator None project input S S
result S
lambda input NS NS
result NS
Contains Contains None project src S
search PS (Literal value only)
result S
lambda src NS
search NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Cos `cos` Cosine None project input S
result S
lambda input NS
result NS
Cosh `cosh` Hyperbolic cosine None project input S
result S
lambda input NS
result NS
Cot `cot` Cotangent None project input S
result S
lambda input NS
result NS
CreateArray `array` Returns an array with the given elements None project arg S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, MAP, STRUCT, UDT) NS NS NS
result PS* (missing nested BINARY, CALENDAR, MAP, STRUCT, UDT)
lambda arg NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
CreateNamedStruct `named_struct`, `struct` Creates a struct with the given field names and values None project name S
value S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
result PS* (missing nested BINARY, CALENDAR, UDT)
lambda name NS
value NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS
CurrentRow$ Special boundary for a window frame, indicating stopping at the current row None project result S
DateAdd `date_add` Returns the date that is num_days after start_date None project startDate S
days S S S
result S
lambda startDate NS
days NS NS NS
result NS
DateAddInterval Adds interval to date None project start S
interval PS (month intervals are not supported; Literal value only)
result S
lambda start NS
interval NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
DateDiff `datediff` Returns the number of days from startDate to endDate None project lhs S
rhs S
result S
lambda lhs NS
rhs NS
result NS
DateFormatClass `date_format` Converts timestamp to a value of string in the format specified by the date format None project timestamp S*
strfmt PS (A limited number of formats are supported; Literal value only)
result S
lambda timestamp NS
strfmt NS
result NS
DateSub `date_sub` Returns the date that is num_days before start_date None project startDate S
days S S S
result S
lambda startDate NS
days NS NS NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
DayOfMonth `dayofmonth`, `day` Returns the day of the month from a date or timestamp None project input S
result S
lambda input NS
result NS
DayOfWeek `dayofweek` Returns the day of the week (1 = Sunday...7=Saturday) None project input S
result S
lambda input NS
result NS
DayOfYear `dayofyear` Returns the day of the year from a date or timestamp None project input S
result S
lambda input NS
result NS
Divide `/` Division None project lhs S S*
rhs S S*
result S PS* (Because of Spark's inner workings the full range of decimal precision (even for 64-bit values) is not supported.)
lambda lhs NS NS
rhs NS NS
result NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
ElementAt `element_at` Returns element of array at given(1-based) index in value if column is array. Returns value for the given key in value if column is map. None project array/map PS* (missing nested BINARY, CALENDAR, UDT) PS* (If it's map, only string is supported.; missing nested BINARY, CALENDAR, UDT)
index/key NS NS NS PS (ints are only supported as array indexes, not as maps keys; Literal value only) NS NS NS NS NS PS (strings are only supported as map keys, not array indexes; Literal value only) NS NS NS NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
lambda array/map NS NS
index/key NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
EndsWith Ends with None project src S
search PS (Literal value only)
result S
lambda src NS
search NS
result NS
EqualNullSafe `<=>` Check if the values are equal including nulls <=> None project lhs S S S S S S S S S* S S* S NS NS NS NS NS NS
rhs S S S S S S S S S* S S* S NS NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS NS NS NS NS
lambda lhs NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
rhs NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
EqualTo `=`, `==` Check if the values are equal None project lhs S S S S S S S S S* S S* S NS NS NS NS NS NS
rhs S S S S S S S S S* S S* S NS NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS NS NS NS NS
lambda lhs NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
rhs NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Exp `exp` Euler's number e raised to a power None project input S
result S
lambda input NS
result NS
Explode `explode`, `explode_outer` Given an input array produces a sequence of rows for each value in the array. None project input PS* (missing nested BINARY, CALENDAR, MAP, STRUCT, UDT) NS
result PS* (missing nested BINARY, CALENDAR, MAP, UDT)
Expm1 `expm1` Euler's number e raised to a power minus 1 None project input S
result S
lambda input NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Floor `floor` Floor of a number None project input S S S*
result S S S*
lambda input NS NS NS
result NS NS NS
FromUnixTime `from_unixtime` Get the string from a unix timestamp None project sec S
format PS (Only a limited number of formats are supported; Literal value only)
result S
lambda sec NS
format NS
result NS
GetArrayItem Gets the field at `ordinal` in the Array None project array PS* (missing nested BINARY, CALENDAR, UDT)
ordinal PS (Literal value only)
result S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
lambda array NS
ordinal NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
GetJsonObject `get_json_object` Extracts a json object from path None project json S
path PS (Literal value only)
result S
GetMapValue Gets Value from a Map based on a key None project map PS (missing nested BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)
key NS NS NS NS NS NS NS NS NS PS (Literal value only) NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS S NS NS NS NS NS NS NS NS
lambda map NS
key NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
GetStructField Gets the named field of the struct None project input PS* (missing nested BINARY, CALENDAR, UDT)
result S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
lambda input NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
GetTimestamp Gets timestamps from strings using given pattern. None project timeExp S S* S
format PS (A limited number of formats are supported; Literal value only)
result S*
lambda timeExp NS NS NS
format NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
GreaterThan `>` > operator None project lhs S S S S S S S S S* S S* S NS NS NS NS NS NS
rhs S S S S S S S S S* S S* S NS NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS NS NS NS NS
lambda lhs NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
rhs NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
GreaterThanOrEqual `>=` >= operator None project lhs S S S S S S S S S* S S* S NS NS NS NS NS NS
rhs S S S S S S S S S* S S* S NS NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS NS NS NS NS
lambda lhs NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
rhs NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Greatest `greatest` Returns the greatest value of all parameters, skipping null values None project param S S S S S S S S S* S S* S NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS NS NS NS
lambda param NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Hour `hour` Returns the hour component of the string/timestamp None project input S*
result S
lambda input NS
result NS
If `if` IF expression None project predicate PS (literal values are not supported)
trueValue S S S S S S S S S* S S* S NS NS NS NS NS NS
falseValue S S S S S S S S S* S S* S NS NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS NS NS NS NS
lambda predicate NS
trueValue NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
falseValue NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
In `in` IN operator None project value S S S S S S S S S* S S* S NS NS NS NS NS NS
list PS (Literal value only) PS (Literal value only) PS (Literal value only) PS (Literal value only) PS (Literal value only) PS (Literal value only) PS (Literal value only) PS (Literal value only) PS* (Literal value only) PS (Literal value only) PS* (Literal value only) NS NS NS NS NS NS NS
result S
lambda value NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
list NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
InSet INSET operator None project input S S S S S S S S S* S S* S NS NS NS NS NS NS
result S
lambda input NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS
InitCap `initcap` Returns str with the first letter of each word in uppercase. All other letters are in lowercase This is not 100% compatible with the Spark version because in some cases unicode characters change byte width when changing the case. The GPU string conversion does not support these characters. For a full list of unsupported characters see https://github.com/rapidsai/cudf/issues/3132 Spark also only sees the space character as a word deliminator, but this will capitalize any character after a non-alphabetic character. The behavior will be aligned to match Spark in the future per https://github.com/NVIDIA/spark-rapids/issues/2786. project input S
result S
lambda input NS
result NS
InputFileBlockLength `input_file_block_length` Returns the length of the block being read, or -1 if not available None project result S
lambda result NS
InputFileBlockStart `input_file_block_start` Returns the start offset of the block being read, or -1 if not available None project result S
lambda result NS
InputFileName `input_file_name` Returns the name of the file being read, or empty string if not available None project result S
lambda result NS
IntegralDivide `div` Division with a integer result None project lhs S S*
rhs S S*
result S
lambda lhs NS NS
rhs NS NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
IsNaN `isnan` Checks if a value is NaN None project input S S
result S
lambda input NS NS
result NS
IsNotNull `isnotnull` Checks if a value is not null None project input S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
result S
lambda input NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS
IsNull `isnull` Checks if a value is null None project input S S S S S S S S S* S S* S NS NS PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
result S
lambda input NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS
KnownFloatingPointNormalized Tag to prevent redundant normalization None project input S S
result S S
lambda input NS NS
result NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Lag `lag` Window function that returns N entries behind this one None window input S S S S S S S S S* NS S* NS NS NS PS* (missing nested STRING, NULL, BINARY, CALENDAR, MAP, STRUCT, UDT) NS NS NS
offset S
default S S S S S S S S S* NS S* S NS NS PS* (missing nested STRING, BINARY, CALENDAR, MAP, STRUCT, UDT) NS NS NS
result S S S S S S S S S* NS S* NS NS NS PS* (missing nested STRING, NULL, BINARY, CALENDAR, MAP, STRUCT, UDT) NS NS NS
LastDay `last_day` Returns the last day of the month which the date belongs to None project input S
result S
lambda input NS
result NS
Lead `lead` Window function that returns N entries ahead of this one None window input S S S S S S S S S* NS S* NS NS NS PS* (missing nested STRING, NULL, BINARY, CALENDAR, MAP, STRUCT, UDT) NS NS NS
offset S
default S S S S S S S S S* NS S* S NS NS PS* (missing nested STRING, BINARY, CALENDAR, MAP, STRUCT, UDT) NS NS NS
result S S S S S S S S S* NS S* NS NS NS PS* (missing nested STRING, NULL, BINARY, CALENDAR, MAP, STRUCT, UDT) NS NS NS
Least `least` Returns the least value of all parameters, skipping null values None project param S S S S S S S S S* S S* S NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS NS NS NS
lambda param NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Length `length`, `character_length`, `char_length` String character length or binary byte length None project input S NS
result S
lambda input NS NS
result NS
LessThan `<` < operator None project lhs S S S S S S S S S* S S* S NS NS NS NS NS NS
rhs S S S S S S S S S* S S* S NS NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS NS NS NS NS
lambda lhs NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
rhs NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
LessThanOrEqual `<=` <= operator None project lhs S S S S S S S S S* S S* S NS NS NS NS NS NS
rhs S S S S S S S S S* S S* S NS NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS NS NS NS NS
lambda lhs NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
rhs NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Like `like` Like None project src S
search PS (Literal value only)
result S
lambda src NS
search NS
result NS
Literal Holds a static value from the query None project result S S S S S S S S S* S S* S NS S PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT) NS
lambda result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Log `ln` Natural log None project input S
result S
lambda input NS
result NS
Log10 `log10` Log base 10 None project input S
result S
lambda input NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Log1p `log1p` Natural log 1 + expr None project input S
result S
lambda input NS
result NS
Log2 `log2` Log base 2 None project input S
result S
lambda input NS
result NS
Logarithm `log` Log variable base None project value S
base S
result S
lambda value NS
base NS
result NS
Lower `lower`, `lcase` String lowercase operator This is not 100% compatible with the Spark version because in some cases unicode characters change byte width when changing the case. The GPU string conversion does not support these characters. For a full list of unsupported characters see https://github.com/rapidsai/cudf/issues/3132 project input S
result S
lambda input NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
MakeDecimal Create a Decimal from an unscaled long value for some aggregation optimizations None project input S
result S*
Md5 `md5` MD5 hash operator None project input S
result S
lambda input NS
result NS
Minute `minute` Returns the minute component of the string/timestamp None project input S*
result S
lambda input NS
result NS
MonotonicallyIncreasingID `monotonically_increasing_id` Returns monotonically increasing 64-bit integers None project result S
lambda result NS
Month `month` Returns the month from a date or timestamp None project input S
result S
lambda input NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Multiply `*` Multiplication None project lhs S S S S S S S*
rhs S S S S S S S*
result S S S S S S PS* (Because of Spark's inner workings the full range of decimal precision (even for 64-bit values) is not supported.)
lambda lhs NS NS NS NS NS NS NS
rhs NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS
Murmur3Hash `hash` Murmur3 hash operator None project input S S S S S S S S S* S S* S NS NS NS NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, UDT) NS
result S
lambda input NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS
NaNvl `nanvl` Evaluates to `left` iff left is not NaN, `right` otherwise None project lhs S S
rhs S S
result S S
lambda lhs NS NS
rhs NS NS
result NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Not `!`, `not` Boolean not operator None project input S
result S
lambda input NS
result NS
Or `or` Logical OR None project lhs S
rhs S
result S
lambda lhs NS
rhs NS
result NS
Pmod `pmod` Pmod None project lhs S S S S S S NS
rhs S S S S S S NS
result S S S S S S NS
lambda lhs NS NS NS NS NS NS NS
rhs NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
PosExplode `posexplode_outer`, `posexplode` Given an input array produces a sequence of rows for each value in the array. None project input S*
result PS* (missing nested BINARY, CALENDAR, MAP, UDT)
Pow `pow`, `power` lhs ^ rhs None project lhs S
rhs S
result S
lambda lhs NS
rhs NS
result NS
PromotePrecision PromotePrecision before arithmetic operations between DecimalType data None project input S*
result S*
lambda input NS
result NS
PythonUDF UDF run in an external python process. Does not actually run on the GPU, but the transfer of data to/from it can be accelerated. None aggregation param S S S S S S S S S* S NS NS NS NS PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) NS PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) NS
result S S S S S S S S S* S NS NS NS PS* (missing nested DECIMAL, NULL, BINARY, MAP) NS PS* (missing nested DECIMAL, NULL, BINARY, MAP)
reduction param S S S S S S S S S* S NS NS NS NS PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) NS PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) NS
result S S S S S S S S S* S NS NS NS PS* (missing nested DECIMAL, NULL, BINARY, MAP) NS PS* (missing nested DECIMAL, NULL, BINARY, MAP)
window param S S S S S S S S S* S NS NS NS NS PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) NS PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) NS
result S S S S S S S S S* S NS NS NS PS* (missing nested DECIMAL, NULL, BINARY, MAP) NS PS* (missing nested DECIMAL, NULL, BINARY, MAP)
project param S S S S S S S S S* S NS NS NS NS PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) NS PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) NS
result S S S S S S S S S* S NS NS NS PS* (missing nested DECIMAL, NULL, BINARY, MAP) NS PS* (missing nested DECIMAL, NULL, BINARY, MAP)
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Quarter `quarter` Returns the quarter of the year for date, in the range 1 to 4 None project input S
result S
lambda input NS
result NS
Rand `random`, `rand` Generate a random column with i.i.d. uniformly distributed values in [0, 1) None project seed S S
result S
lambda seed NS NS
result NS
RegExpReplace `regexp_replace` RegExpReplace support for string literal input patterns None project str S
regex PS (very limited regex support; Literal value only)
rep PS (Literal value only)
result S
lambda str NS
regex NS
rep NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Remainder `%`, `mod` Remainder or modulo None project lhs S S S S S S NS
rhs S S S S S S NS
result S S S S S S NS
lambda lhs NS NS NS NS NS NS NS
rhs NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS
Rint `rint` Rounds up a double value to the nearest double equal to an integer None project input S
result S
lambda input NS
result NS
Round `round` Round an expression to d decimal places using HALF_UP rounding mode None project value S S S S PS (result may round slightly differently) PS (result may round slightly differently) S*
scale S
result S S S S S S S*
lambda value NS NS NS NS NS NS NS
scale NS
result NS NS NS NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
RowNumber `row_number` Window function that returns the index for the row within the aggregation window None window result S
ScalaUDF User Defined Function, support requires the UDF to implement a RAPIDS accelerated interface None project param S S S S S S S S S* S S* S S S PS* (missing nested UDT) PS* (missing nested UDT) PS* (missing nested UDT) NS
result S S S S S S S S S* S S* S S S PS* (missing nested UDT) PS* (missing nested UDT) PS* (missing nested UDT) NS
lambda param NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Second `second` Returns the second component of the string/timestamp None project input S*
result S
lambda input NS
result NS
ShiftLeft `shiftleft` Bitwise shift left (<<) None project value S S
amount S
result S S
lambda value NS NS
amount NS
result NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
ShiftRight `shiftright` Bitwise shift right (>>) None project value S S
amount S
result S S
lambda value NS NS
amount NS
result NS NS
ShiftRightUnsigned `shiftrightunsigned` Bitwise unsigned shift right (>>>) None project value S S
amount S
result S S
lambda value NS NS
amount NS
result NS NS
Signum `sign`, `signum` Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive None project input S
result S
lambda input NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Sin `sin` Sine None project input S
result S
lambda input NS
result NS
Sinh `sinh` Hyperbolic sine None project input S
result S
lambda input NS
result NS
Size `size`, `cardinality` The size of an array or a map None project input PS* (missing nested BINARY, CALENDAR, UDT) PS* (missing nested BINARY, CALENDAR, UDT)
result S
lambda input NS NS
result NS
SortOrder Sort order None project input S S S S S S S S S* S S* S NS NS NS PS* (missing nested BINARY, CALENDAR, ARRAY, STRUCT, UDT) NS
result S S S S S S S S S* S S* S NS NS NS PS* (missing nested BINARY, CALENDAR, ARRAY, STRUCT, UDT) NS
SparkPartitionID `spark_partition_id` Returns the current partition id None project result S
lambda result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
SpecifiedWindowFrame Specification of the width of the group (or "frame") of input rows around which a window function is evaluated None project lower S S S S NS NS NS S
upper S S S S NS NS NS S
result S S S S NS NS NS S
Sqrt `sqrt` Square root None project input S
result S
lambda input NS
result NS
StartsWith Starts with None project src S
search PS (Literal value only)
result S
lambda src NS
search NS
result NS
StringLPad `lpad` Pad a string on the left None project str S
len PS (Literal value only)
pad PS (Literal value only)
result S
lambda str NS
len NS
pad NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
StringLocate `position`, `locate` Substring search operator None project substr PS (Literal value only)
str S
start PS (Literal value only)
result S
lambda substr NS
str NS
start NS
result NS
StringRPad `rpad` Pad a string on the right None project str S
len PS (Literal value only)
pad PS (Literal value only)
result S
lambda str NS
len NS
pad NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
StringReplace `replace` StringReplace operator None project src S
search PS (Literal value only)
replace PS (Literal value only)
result S
lambda src NS
search NS
replace NS
result NS
StringSplit `split` Splits `str` around occurrences that match `regex` None project str S
regexp PS (very limited subset of regex supported; Literal value only)
limit PS (Literal value only)
result S
lambda str NS
regexp NS
limit NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
StringTrim `trim` StringTrim operator None project src S
trimStr PS (Literal value only)
result S
lambda src NS
trimStr NS
result NS
StringTrimLeft `ltrim` StringTrimLeft operator None project src S
trimStr PS (Literal value only)
result S
lambda src NS
trimStr NS
result NS
StringTrimRight `rtrim` StringTrimRight operator None project src S
trimStr PS (Literal value only)
result S
lambda src NS
trimStr NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Substring `substr`, `substring` Substring operator None project str S NS
pos PS (Literal value only)
len PS (Literal value only)
result S NS
lambda str NS NS
pos NS
len NS
result NS NS
SubstringIndex `substring_index` substring_index operator None project str S
delim PS (only a single character is allowed; Literal value only)
count PS (Literal value only)
result S
lambda str NS
delim NS
count NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Subtract `-` Subtraction None project lhs S S S S S S S* NS
rhs S S S S S S S* NS
result S S S S S S S* NS
lambda lhs NS NS NS NS NS NS NS NS
rhs NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS
Tan `tan` Tangent None project input S
result S
lambda input NS
result NS
Tanh `tanh` Hyperbolic tangent None project input S
result S
lambda input NS
result NS
TimeAdd Adds interval to timestamp None project start S*
interval PS (month intervals are not supported; Literal value only)
result S*
lambda start NS
interval NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
TimeSub Subtracts interval from timestamp None project start S*
interval PS (months not supported; Literal value only)
result S*
lambda start NS
interval NS
result NS
ToDegrees `degrees` Converts radians to degrees None project input S
result S
lambda input NS
result NS
ToRadians `radians` Converts degrees to radians None project input S
result S
lambda input NS
result NS
ToUnixTimestamp `to_unix_timestamp` Returns the UNIX timestamp of the given time None project timeExp S S* S
format PS (A limited number of formats are supported; Literal value only)
result S
lambda timeExp NS NS NS
format NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
UnaryMinus `negative` Negate a numeric value None project input S S S S S S S* NS
result S S S S S S S* NS
lambda input NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS
UnaryPositive `positive` A numeric value with a + in front of it None project input S S S S S S S* NS
result S S S S S S S* NS
lambda input NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS
UnboundedFollowing$ Special boundary for a window frame, indicating all rows preceding the current row None project result S
UnboundedPreceding$ Special boundary for a window frame, indicating all rows preceding the current row None project result S
UnixTimestamp `unix_timestamp` Returns the UNIX timestamp of current or specified time None project timeExp S S* S
format PS (A limited number of formats are supported; Literal value only)
result S
lambda timeExp NS NS NS
format NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
UnscaledValue Convert a Decimal to an unscaled long value for some aggregation optimizations None project input S*
result S
Upper `upper`, `ucase` String uppercase operator This is not 100% compatible with the Spark version because in some cases unicode characters change byte width when changing the case. The GPU string conversion does not support these characters. For a full list of unsupported characters see https://github.com/rapidsai/cudf/issues/3132 project input S
result S
lambda input NS
result NS
WeekDay `weekday` Returns the day of the week (0 = Monday...6=Sunday) None project input S
result S
lambda input NS
result NS
WindowExpression Calculates a return value for every input row of a table based on a group (or "window") of rows None window windowFunction S S S S S S S S S* S S* NS NS NS PS* (missing nested NULL, BINARY, CALENDAR, MAP, UDT) NS NS NS
windowSpec S S S S NS NS S* S
result S S S S S S S S S* S S* NS NS NS PS* (missing nested NULL, BINARY, CALENDAR, MAP, UDT) NS NS NS
WindowSpecDefinition Specification of a window function, indicating the partitioning-expression, the row ordering, and the width of the window None project partition S S S S S S S S S* S S* NS NS NS NS NS NS NS
value S S S S S S S S S* S S* NS NS NS NS NS NS NS
result S S S S S S S S S* S S* NS NS NS NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Year `year` Returns the year from a date or timestamp None project input S
result S
lambda input NS
result NS
AggregateExpression Aggregate expression None aggregation aggFunc S S S S S S S S S* S S* S NS NS PS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT) NS NS NS
filter S
result S S S S S S S S S* S S* S NS NS PS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT) NS NS NS
reduction aggFunc S S S S S S S S S* S S* S NS NS PS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT) NS NS NS
filter S
result S S S S S S S S S* S S* S NS NS PS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT) NS NS NS
window aggFunc S S S S S S S S S* S S* S NS NS PS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT) NS NS NS
filter S
result S S S S S S S S S* S S* S NS NS PS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT) NS NS NS
Average `avg`, `mean` Average aggregate operator None aggregation input S S S S S S NS
result S NS
reduction input S S S S S S NS
result S NS
window input S S S S S S NS
result S NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
CollectList `collect_list` Collect a list of elements, now only supported by windowing. None aggregation input NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS
reduction input NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS
window input S S S S S S S S S* S S* NS NS NS NS NS PS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
result PS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT)
Count `count` Count aggregate operator None aggregation input S S S S S S S S S* S S* S NS NS NS NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
result S
reduction input S S S S S S S S S* S S* S NS NS NS NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
result S
window input S S S S S S S S S* S S* S NS NS NS NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
result S
First `first_value`, `first` first aggregate operator None aggregation input S S S S S S S S S* S NS S NS NS NS NS NS NS
result S S S S S S S S S* S NS S NS NS NS NS NS NS
reduction input S S S S S S S S S* S NS S NS NS NS NS NS NS
result S S S S S S S S S* S NS S NS NS NS NS NS NS
window input NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Last `last`, `last_value` last aggregate operator None aggregation input S S S S S S S S S* S NS S NS NS NS NS NS NS
result S S S S S S S S S* S NS S NS NS NS NS NS NS
reduction input S S S S S S S S S* S NS S NS NS NS NS NS NS
result S S S S S S S S S* S NS S NS NS NS NS NS NS
window input NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Max `max` Max aggregate operator None aggregation input S S S S S S S S S* S NS S NS NS NS NS NS
result S S S S S S S S S* S NS S NS NS NS NS NS
reduction input S S S S S S S S S* S NS S NS NS NS NS NS
result S S S S S S S S S* S NS S NS NS NS NS NS
window input S S S S S S S S S* S S* S NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS NS NS NS
Min `min` Min aggregate operator None aggregation input S S S S S S S S S* S NS S NS NS NS NS NS
result S S S S S S S S S* S NS S NS NS NS NS NS
reduction input S S S S S S S S S* S NS S NS NS NS NS NS
result S S S S S S S S S* S NS S NS NS NS NS NS
window input S S S S S S S S S* S S* S NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
PivotFirst PivotFirst operator None aggregation pivotColumn S S S S S S S S S* S S* S NS NS NS NS NS NS
valueColumn S S S S S S S S S* S S* S NS NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS PS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS NS NS
reduction pivotColumn S S S S S S S S S* S S* S NS NS NS NS NS NS
valueColumn S S S S S S S S S* S S* S NS NS NS NS NS NS
result S S S S S S S S S* S S* S NS NS PS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS NS NS
Sum `sum` Sum aggregate operator None aggregation input S S S S S S NS
result S S NS
reduction input S S S S S S NS
result S S NS
window input S S S S S S S*
result S S S*
NormalizeNaNAndZero Normalize NaN and zero None project input S S
result S S
lambda input NS NS
result NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
ScalarSubquery Subquery that will return only one row and one column None project result S S S S S S S S S* S S* S
HiveGenericUDF Hive Generic UDF, support requires the UDF to implement a RAPIDS accelerated interface None project param S S S S S S S S S* S S* S S S PS* (missing nested UDT) PS* (missing nested UDT) PS* (missing nested UDT) NS
result S S S S S S S S S* S S* S S S PS* (missing nested UDT) PS* (missing nested UDT) PS* (missing nested UDT) NS
lambda param NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
HiveSimpleUDF Hive UDF, support requires the UDF to implement a RAPIDS accelerated interface None project param S S S S S S S S S* S S* S S S PS* (missing nested UDT) PS* (missing nested UDT) PS* (missing nested UDT) NS
result S S S S S S S S S* S S* S S S PS* (missing nested UDT) PS* (missing nested UDT) PS* (missing nested UDT) NS
lambda param NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
result NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
  • as was state previously Decimal is only supported up to a precision of 18 and Timestamp is only supported in the UTC time zone. Decimals are off by default due to performance impact in some cases.

Casting

The above table does not show what is and is not supported for cast. This table shows the matrix of supported casts. Nested types like MAP, Struct, and Array can only be cast if the child types can be cast.

Some of the casts to/from string on the GPU are not 100% the same and are disabled by default. Please see the configs for more details on these specific cases.

Please note that even though casting from one type to another is supported by Spark it does not mean they all produce usable results. For example casting from a date to a boolean always produces a null. This is for Hive compatibility and the accelerator produces the same result.

AnsiCast

TO
BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
FROM BOOLEAN S S S S S S S S* S NS
BYTE S S S S S S S S* S S* S
SHORT S S S S S S S S* S S* S
INT S S S S S S S S* S S* S
LONG S S S S S S S S* S S* S
FLOAT S S S S S S S S* S S*
DOUBLE S S S S S S S S* S S*
DATE S S S S S S S S S* S NS
TIMESTAMP S S S S S S S S S* S NS
STRING S S S S S S S S S* S S* S NS
DECIMAL NS NS NS NS NS NS NS NS S S*
NULL S S S S S S S S S* S NS S NS NS NS NS NS NS
BINARY NS NS
CALENDAR NS NS
ARRAY NS PS (missing nested BOOLEAN, BYTE, SHORT, LONG, DATE, TIMESTAMP, STRING, DECIMAL, NULL, BINARY, CALENDAR, MAP, STRUCT, UDT)
MAP NS NS
STRUCT PS (the struct's children must also support being cast to string) NS
UDT NS NS

Cast

TO
BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
FROM BOOLEAN S S S S S S S S* S NS
BYTE S S S S S S S S* S S* S
SHORT S S S S S S S S* S S* S
INT S S S S S S S S* S S* S
LONG S S S S S S S S* S S* S
FLOAT S S S S S S S S* S S*
DOUBLE S S S S S S S S* S S*
DATE S S S S S S S S S* S NS
TIMESTAMP S S S S S S S S S* S NS
STRING S S S S S S S S S* S S* S NS
DECIMAL NS NS NS NS NS NS NS NS S S*
NULL S S S S S S S S S* S NS S NS NS NS NS NS NS
BINARY NS NS
CALENDAR NS NS
ARRAY NS PS (missing nested BOOLEAN, BYTE, SHORT, LONG, DATE, TIMESTAMP, STRING, DECIMAL, NULL, BINARY, CALENDAR, MAP, STRUCT, UDT)
MAP NS NS
STRUCT PS (the struct's children must also support being cast to string) NS
UDT NS NS

Partitioning

When transferring data between different tasks the data is partitioned in specific ways depending on requirements in the plan. Be aware that the types included below are only for rows that impact where the data is partitioned. So for example if we are doing a join on the column a the data would be hash partitioned on a, but all of the other columns in the same data frame as a don’t show up in the table. They are controlled by the rules for ShuffleExchangeExec which uses the Partitioning.

</table> ## Input/Output For Input and Output it is not cleanly exposed what types are supported and which are not. This table tries to clarify that. Be aware that some types may be disabled in some cases for either reads or writes because of processing limitations, like rebasing dates or timestamps, or for a lack of type coercion support.
Partition Description Notes Param BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
HashPartitioning Hash based partitioning None hash_key S S S S S S S S S* S S* S NS NS NS NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, UDT) NS
RangePartitioning Range partitioning None order_key S S S S S S S S S* S S* S NS NS NS PS* (Only supported for a single partition; missing nested BINARY, CALENDAR, ARRAY, STRUCT, UDT) NS
RoundRobinPartitioning Round robin partitioning None
SinglePartition$ Single partitioning None
Format Direction BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
CSV Read S S S S S S S S S* S NS NS
Write NS NS NS NS NS NS NS NS NS NS NS NS
ORC Read S S S S S S S S S* S NS NS NS NS NS NS
Write S S S S S S S S S* S NS NS NS NS NS NS
Parquet Read S S S S S S S S S* S S* NS PS* (missing nested BINARY, UDT) PS* (missing nested BINARY, UDT) PS* (missing nested BINARY, UDT) NS
Write S S S S S S S S S* S S* NS NS NS PS* (missing nested BINARY, ARRAY, MAP, UDT) NS