Apache Spark supports processing various types of data. Not all expressions support all data types. The RAPIDS Accelerator for Apache Spark has further restrictions on what types are supported for processing. This tries to document what operations are supported and what data types each operation supports. Because Apache Spark is under active development too and this document was generated against version 3.1.1 of Spark. Most of this should still apply to other versions of Spark, but there may be slight changes.

General limitations

Decimal

The Decimal type in Spark supports a precision up to 38 digits (128-bits). The RAPIDS Accelerator supports 128-bit starting from version 21.12 and decimals are enabled by default. Please check Decimal Support for more details.

Decimal precision and scale follow the same rule as CPU mode in Apache Spark:

 * In particular, if we have expressions e1 and e2 with precision/scale p1/s1 and p2/s2
 * respectively, then the following operations have the following precision / scale:
 *
 *   Operation    Result Precision                        Result Scale
 *   ------------------------------------------------------------------------
 *   e1 + e2      max(s1, s2) + max(p1-s1, p2-s2) + 1     max(s1, s2)
 *   e1 - e2      max(s1, s2) + max(p1-s1, p2-s2) + 1     max(s1, s2)
 *   e1 * e2      p1 + p2 + 1                             s1 + s2
 *   e1 / e2      p1 - s1 + s2 + max(6, s1 + p2 + 1)      max(6, s1 + p2 + 1)
 *   e1 % e2      min(p1-s1, p2-s2) + max(s1, s2)         max(s1, s2)
 *   e1 union e2  max(s1, s2) + max(p1-s1, p2-s2)         max(s1, s2)

However, Spark inserts PromotePrecision to CAST both sides to the same type. GPU mode may fall back to CPU even if the result Decimal precision is within 18 digits. For example, Decimal(8,2) x Decimal(6,3) resulting in Decimal (15,5) runs on CPU, because due to PromotePrecision, GPU mode assumes the result is Decimal(19,6). There are even extreme cases where Spark can temporarily return a Decimal value larger than what can be stored in 128-bits and then uses the CheckOverflow operator to round it to a desired precision and scale. This means that even when the accelerator supports 128-bit decimal, we might not be able to support all operations that Spark can support.

Timestamp

Timestamps in Spark will all be converted to the local time zone before processing and are often converted to UTC before being stored, like in Parquet or ORC. The RAPIDS Accelerator only supports UTC as the time zone for timestamps.

CalendarInterval

In Spark CalendarIntervals store three values, months, days, and microseconds. Support for this type is still very limited in the accelerator. In some cases only a a subset of the type is supported, like window ranges only support days currently.

Configuration

There are lots of different configuration values that can impact if an operation is supported or not. Some of these are a part of the RAPIDS Accelerator and cover the level of compatibility with Apache Spark. Those are covered here. Others are a part of Apache Spark itself and those are a bit harder to document. The work of updating this to cover that support is still ongoing.

In general though if you ever have any question about why an operation is not running on the GPU you may set spark.rapids.sql.explain to ALL and it will try to give all of the reasons why this particular operator or expression is on the CPU or GPU.

Key

Types

Type Name Type Description
BOOLEAN Holds true or false values.
BYTE Signed 8-bit integer value.
SHORT Signed 16-bit integer value.
INT Signed 32-bit integer value.
LONG Signed 64-bit integer value.
FLOAT 32-bit floating point value.
DOUBLE 64-bit floating point value.
DATE A date with no time component. Stored as 32-bit integer with days since Jan 1, 1970.
TIMESTAMP A date and time. Stored as 64-bit integer with microseconds since Jan 1, 1970 in the current time zone.
STRING A text string. Stored as UTF-8 encoded bytes.
DECIMAL A fixed point decimal value with configurable precision and scale.
NULL Only stores null values and is typically only used when no other type can be determined from the SQL.
BINARY An array of non-nullable bytes.
CALENDAR Represents a period of time. Stored as months, days and microseconds.
ARRAY A sequence of elements.
MAP A set of key value pairs, the keys cannot be null.
STRUCT A series of named fields.
UDT User defined types and java Objects. These are not standard SQL types.

Support

Value Description
S (Supported) Both Apache Spark and the RAPIDS Accelerator support this type fully.
  (Not Applicable) Neither Spark not the RAPIDS Accelerator support this type in this situation.
PS (Partial Support) Apache Spark supports this type, but the RAPIDS Accelerator only partially supports it. An explanation for what is missing will be included with this.
NS (Not Supported) Apache Spark supports this type but the RAPIDS Accelerator does not.

SparkPlan or Executor Nodes

Apache Spark uses a Directed Acyclic Graph(DAG) of processing to build a query. The nodes in this graph are instances of SparkPlan and represent various high level operations like doing a filter or project. The operations that the RAPIDS Accelerator supports are described below.

Executor Description Notes Param(s) BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
CoalesceExec The backend for the dataframe coalesce method None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
CollectLimitExec Reduce to single partition and apply limit This is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
ExpandExec The backend for the expand operator None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
FileSourceScanExec Reading data from files, often from Hive tables None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
FilterExec The backend for most filter statements None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
GenerateExec The backend for operations that generate more output rows than input rows like explode None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
GlobalLimitExec Limiting of results across partitions None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
LocalLimitExec Per-partition limiting of results None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
ProjectExec The backend for most select, withColumn and dropColumn statements None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
RangeExec The backend for range operator None Input/Output S
SampleExec The backend for the sample operator None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
SortExec The backend for the sort operator None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
SubqueryBroadcastExec Plan to collect and transform the broadcast key values None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
TakeOrderedAndProjectExec Take the first limit elements as defined by the sortOrder, and do projection if needed None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
UnionExec The backend for the union operator None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
unionByName will not optionally impute nulls for missing struct fields when the column is a struct and there are non-overlapping fields;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
Executor Description Notes Param(s) BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
CustomShuffleReaderExec A wrapper of shuffle query stage None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
HashAggregateExec The backend for hash based aggregations None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S PS
not allowed for grouping expressions
NS PS
not allowed for grouping expressions if containing Struct as child;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
not allowed for grouping expressions;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
not allowed for grouping expressions if containing Array, Map, or Binary as child;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
ObjectHashAggregateExec The backend for hash based aggregations supporting TypedImperativeAggregate functions None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S PS
not allowed for grouping expressions and only allowed when aggregate buffers can be converted between CPU and GPU
NS PS
not allowed for grouping expressions;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
not allowed for grouping expressions;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
not allowed for grouping expressions if containing Array, Map, or Binary as child;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
SortAggregateExec The backend for sort based aggregations None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S PS
not allowed for grouping expressions and only allowed when aggregate buffers can be converted between CPU and GPU
NS PS
not allowed for grouping expressions;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
not allowed for grouping expressions;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
not allowed for grouping expressions if containing Array, Map, or Binary as child;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
InMemoryTableScanExec Implementation of InMemoryTableScanExec to use GPU accelerated caching None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, UDT
NS
DataWritingCommandExec Writing data None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S PS
128bit decimal only supported for Orc and Parquet
NS S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, CALENDAR, UDT
NS
ExecutedCommandExec Eagerly executed commands None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
BatchScanExec The backend for most file input None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S NS S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, CALENDAR, UDT
NS
BroadcastExchangeExec The backend for broadcast exchange of data None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT
NS
ShuffleExchangeExec The backend for most data being exchanged between processes None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
Round-robin partitioning is not supported for nested structs if spark.sql.execution.sortBeforeRepartition is true;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
BroadcastHashJoinExec Implementation of join using broadcast data None leftKeys S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
rightKeys S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
condition S
Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
BroadcastNestedLoopJoinExec Implementation of join using brute force. Full outer joins and joins where the broadcast side matches the join side (e.g.: LeftOuter with left broadcast) are not supported None condition
(A non-inner join only is supported if the condition expression can be converted to a GPU AST expression)
S
Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
Executor Description Notes Param(s) BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
CartesianProductExec Implementation of join using brute force None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
ShuffledHashJoinExec Implementation of join using hashed shuffled data None leftKeys S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
rightKeys S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
condition S
Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
SortMergeJoinExec Sort merge join, replacing with shuffled hash join None leftKeys S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
rightKeys S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
condition S
Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
AggregateInPandasExec The backend for an Aggregation Pandas UDF, this accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS NS
ArrowEvalPythonExec The backend of the Scalar Pandas UDFs. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS
FlatMapCoGroupsInPandasExec The backend for CoGrouped Aggregation Pandas UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. This is disabled by default because Performance is not ideal with many small groups Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS NS
FlatMapGroupsInPandasExec The backend for Flat Map Groups Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS NS
MapInPandasExec The backend for Map Pandas Iterator UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS
WindowInPandasExec The backend for Window Aggregation Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. For now it only supports row based window frame. This is disabled by default because it only supports row based frame for now Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS NS NS
Executor Description Notes Param(s) BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
WindowExec Window-operator backend None partitionSpec S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS
Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
HiveTableScanExec Scan Exec to read Hive delimited text tables None Input/Output S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S NS NS NS NS NS NS NS

Expression and SQL Functions

Inside each node in the DAG there can be one or more trees of expressions that describe various types of processing that happens in that part of the plan. These can be things like adding two numbers together or checking for null. These expressions can have multiple input parameters and one output value. These expressions also can happen in different contexts. Because of how the accelerator works different contexts have different levels of support.

The most common expression context is project. In this context values from a single input row go through the expression and the result will also be use to produce something in the same row. Be aware that even in the case of aggregation and window operations most of the processing is still done in the project context either before or after the other processing happens.

Aggregation operations like count or sum can take place in either the aggregation, reduction, or window context. aggregation is when the operation was done while grouping the data by one or more keys. reduction is when there is no group by and there is a single result for an entire column. window is for window operations.

The final expression context is AST or Abstract Syntax Tree. Before explaining AST we first need to explain in detail how project context operations work. Generally for a project context operation the plan Spark developed is read on the CPU and an appropriate set of GPU kernels are selected to do those operations. For example a >= b + 1. Would result in calling a GPU kernel to add 1 to b, followed by another kernel that is called to compare a to that result. The interpretation is happening on the CPU, and the GPU is used to do the processing. For AST the interpretation for some reason cannot happen on the CPU and instead must be done in the GPU kernel itself. An example of this is conditional joins. If you want to join on A.a >= B.b + 1 where A and B are separate tables or data frames, the + and >= operations cannot run as separate independent kernels because it is done on a combination of rows in both A and B. Instead part of the plan that Spark developed is turned into an abstract syntax tree and sent to the GPU where it can be interpreted. The number and types of operations supported in this are limited.

Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Abs `abs` Absolute value None project input S S S S S S S
result S S S S S S S
AST input NS NS S S S S NS
result NS NS S S S S NS
Acos `acos` Inverse cosine None project input S
result S
AST input S
result S
Acosh `acosh` Inverse hyperbolic cosine None project input S
result S
AST input S
result S
Add `+` Addition None project lhs S S S S S S S NS
rhs S S S S S S S NS
result S S S S S S S NS
AST lhs NS NS S S S S NS NS
rhs NS NS S S S S NS NS
result NS NS S S S S NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Alias Gives a column a name None project input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
AST input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS NS
And `and` Logical AND None project lhs S
rhs S
result S
AST lhs S
rhs S
result S
ArrayContains `array_contains` Returns a boolean if the array contains the passed in key None project array PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
key S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS NS
result S
ArrayExcept `array_except` Returns an array of the elements in array1 but not in array2, without duplicates This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal, but the CPU implementation currently does not (see SPARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ project array1 PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
array2 PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
ArrayExists `exists` Return true if any element satisfies the predicate LambdaFunction None project argument PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
function S
result S
ArrayFilter `filter` Filter an input array using a given predicate None project argument PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
function S
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
ArrayIntersect `array_intersect` Returns an array of the elements in the intersection of array1 and array2, without duplicates This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal, but the CPU implementation currently does not (see SPARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ project array1 PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
array2 PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
ArrayMax `array_max` Returns the maximum value in the array None project input PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS
ArrayMin `array_min` Returns the minimum value in the array None project input PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS
ArrayRemove `array_remove` Returns the array after removing all elements that equal to the input element (right) from the input array (left) None project array NS NS NS NS NS NS NS NS NS NS NS NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS NS NS
element S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
ArrayRepeat `array_repeat` Returns the array containing the given input value (left) count (right) times None project left S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
right S S S S
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
ArrayTransform `transform` Transform elements in an array using the transform function. This is similar to a `map` in functional programming None project argument PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
function S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
ArrayUnion `array_union` Returns an array of the elements in the union of array1 and array2, without duplicates. This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal, but the CPU implementation currently does not (see SPARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ project array1 PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
array2 PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
ArraysOverlap `arrays_overlap` Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise. This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal, but the CPU implementation currently does not (see SPARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ project array1 PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
array2 PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
result S
ArraysZip `arrays_zip` Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays. None project children PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
Ascii `ascii` The numeric value of the first character of string data. This is disabled by default because it only supports strings starting with ASCII or Latin-1 characters after Spark 3.2.3, 3.3.1 and 3.4.0. Otherwise the results will not match the CPU. project input S
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Asin `asin` Inverse sine None project input S
result S
AST input S
result S
Asinh `asinh` Inverse hyperbolic sine None project input S
result S
AST input S
result S
AtLeastNNonNulls Checks if number of non null/Nan values is greater than a given value None project input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result S
Atan `atan` Inverse tangent None project input S
result S
AST input S
result S
Atanh `atanh` Inverse hyperbolic tangent None project input S
result S
AST input S
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
AttributeReference References an input column None project result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
AST result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS NS
BRound `bround` Round an expression to d decimal places using HALF_EVEN rounding mode None project value S S S S PS
result may round slightly differently
PS
result may round slightly differently
S
scale S
result S S S S S S S
BitLength `bit_length` The bit length of string data None project input S NS
result S
BitwiseAnd `&` Returns the bitwise AND of the operands None project lhs S S S S
rhs S S S S
result S S S S
AST lhs NS NS S S
rhs NS NS S S
result NS NS S S
BitwiseNot `~` Returns the bitwise NOT of the operands None project input S S S S
result S S S S
AST input NS NS S S
result NS NS S S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
BitwiseOr `\|` Returns the bitwise OR of the operands None project lhs S S S S
rhs S S S S
result S S S S
AST lhs NS NS S S
rhs NS NS S S
result NS NS S S
BitwiseXor `^` Returns the bitwise XOR of the operands None project lhs S S S S
rhs S S S S
result S S S S
AST lhs NS NS S S
rhs NS NS S S
result NS NS S S
BoundReference Reference to a bound variable None project result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
AST result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS NS
CaseWhen `when` CASE WHEN expression None project predicate S
value S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Cbrt `cbrt` Cube root None project input S
result S
AST input S
result S
Ceil `ceil`, `ceiling` Ceiling of a number None project input S S S
result S S S
CheckOverflow CheckOverflow after arithmetic operations between DecimalType data None project input S
result S
Coalesce `coalesce` Returns the first non-null argument if exists. Otherwise, null None project param S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
Concat `concat` List/String concatenate None project input S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
result S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
ConcatWs `concat_ws` Concatenates multiple input strings or array of strings into a single string using a given separator None project input S S
result S
Contains Contains None project src S
search PS
Literal value only
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Conv `conv` Convert string representing a number from one base to another This is disabled by default because GPU implementation is incomplete. We currently only support from/to_base values of 10 and 16. We fall back on CPU if the signed conversion is signalled via a negative to_base. GPU implementation does not check for an 64-bit signed/unsigned int overflow when performing the conversion to return `FFFFFFFFFFFFFFFF` or `18446744073709551615` or to throw an error in the ANSI mode. It is safe to enable if the overflow is not possible or detected externally. For instance decimal strings not longer than 18 characters / hexadecimal strings not longer than 15 characters disregarding the sign cannot cause an overflow. project num S
from_base PS
only values 10 and 16 are supported;
Literal value only
PS
only values 10 and 16 are supported;
Literal value only
PS
only values 10 and 16 are supported;
Literal value only
PS
only values 10 and 16 are supported;
Literal value only
to_base PS
only values 10 and 16 are supported;
Literal value only
PS
only values 10 and 16 are supported;
Literal value only
PS
only values 10 and 16 are supported;
Literal value only
PS
only values 10 and 16 are supported;
Literal value only
result S
Cos `cos` Cosine None project input S
result S
AST input S
result S
Cosh `cosh` Hyperbolic cosine None project input S
result S
AST input S
result S
Cot `cot` Cotangent None project input S
result S
AST input S
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
CreateArray `array` Returns an array with the given elements None project arg S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, MAP, UDT
NS
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, MAP, UDT
CreateMap `map` Create a map None project key S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
value S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
CreateNamedStruct `named_struct`, `struct` Creates a struct with the given field names and values None project name S
value S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
CurrentRow$ Special boundary for a window frame, indicating stopping at the current row None project result S
DateAdd `date_add` Returns the date that is num_days after start_date None project startDate S
days S S S
result S
DateAddInterval Adds interval to date None project start S
interval PS
month intervals are not supported;
Literal value only
result S
DateDiff `datediff` Returns the number of days from startDate to endDate None project lhs S
rhs S
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
DateFormatClass `date_format` Converts timestamp to a value of string in the format specified by the date format None project timestamp PS
UTC is only supported TZ for TIMESTAMP
strfmt PS
A limited number of formats are supported;
Literal value only
result S
DateSub `date_sub` Returns the date that is num_days before start_date None project startDate S
days S S S
result S
DayOfMonth `day`, `dayofmonth` Returns the day of the month from a date or timestamp None project input S
result S
DayOfWeek `dayofweek` Returns the day of the week (1 = Sunday...7=Saturday) None project input S
result S
DayOfYear `dayofyear` Returns the day of the year from a date or timestamp None project input S
result S
DenseRank `dense_rank` Window function that returns the dense rank value within the aggregation window None window ordering S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS NS
result S
Divide `/` Division None project lhs S S
rhs S S
result S S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
DynamicPruningExpression Dynamic pruning expression marker None project input S
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
ElementAt `element_at` Returns element of array at given(1-based) index in value if column is array. Returns value for the given key in value if column is map. None project array/map PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
If it's map, only primitive key types are supported.;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
index/key PS
Unsupported as array index.
PS
Unsupported as array index.
PS
Unsupported as array index.
S PS
Unsupported as array index.
PS
Unsupported as array index.
PS
Unsupported as array index.
PS
Unsupported as array index.
PS
Unsupported as array index.;
UTC is only supported TZ for TIMESTAMP
PS
Unsupported as array index.
PS
Unsupported as array index.
NS NS NS NS NS NS NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
EndsWith Ends with None project src S
search PS
Literal value only
result S
EqualNullSafe `<=>` Check if the values are equal including nulls <=> None project lhs S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
rhs S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
result S
EqualTo `==`, `=` Check if the values are equal None project lhs S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
rhs S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
result S
AST lhs S S S S S NS NS S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS
rhs S S S S S NS NS S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Exp `exp` Euler's number e raised to a power None project input S
result S
AST input S
result S
Explode `explode_outer`, `explode` Given an input array produces a sequence of rows for each value in the array None project input PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
Expm1 `expm1` Euler's number e raised to a power minus 1 None project input S
result S
AST input S
result S
Flatten `flatten` Creates a single array from an array of arrays None project input PS
UTC is only supported TZ for child TIMESTAMP
result PS
UTC is only supported TZ for child TIMESTAMP
Floor `floor` Floor of a number None project input S S S
result S S S
FormatNumber `format_number` Formats the number x like '#,###,###.##', rounded to d decimal places. None project x S S S S S S S
d PS
Literal value only
NS
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
FromUTCTimestamp `from_utc_timestamp` Render the input UTC timestamp in the input timezone None project timestamp PS
UTC is only supported TZ for TIMESTAMP
timezone PS
Only non-DST(Daylight Savings Time) timezones are supported
result PS
UTC is only supported TZ for TIMESTAMP
FromUnixTime `from_unixtime` Get the string from a unix timestamp None project sec S
format PS
Only a limited number of formats are supported;
Literal value only
result S
GetArrayItem Gets the field at `ordinal` in the Array None project array PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
ordinal S S S S
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
GetArrayStructFields Extracts the `ordinal`-th fields of all array elements for the data with the type of array of struct None project input PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
GetJsonObject `get_json_object` Extracts a json object from path This is disabled by default because Experimental feature that could be unstable or have performance issues. project json S
path PS
Literal value only
result S
GetMapValue Gets Value from a Map based on a key None project map PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
key S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S NS NS NS NS NS NS NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
GetStructField Gets the named field of the struct None project input PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
GetTimestamp Gets timestamps from strings using given pattern. None project timeExp S PS
UTC is only supported TZ for TIMESTAMP
S
format PS
A limited number of formats are supported;
Literal value only
result PS
UTC is only supported TZ for TIMESTAMP
GreaterThan `>` > operator None project lhs S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
rhs S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
result S
AST lhs S S S S S NS NS S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS
rhs S S S S S NS NS S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS
result S
GreaterThanOrEqual `>=` >= operator None project lhs S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
rhs S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
result S
AST lhs S S S S S NS NS S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS
rhs S S S S S NS NS S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Greatest `greatest` Returns the greatest value of all parameters, skipping null values None project param S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS
Hour `hour` Returns the hour component of the string/timestamp None project input PS
UTC is only supported TZ for TIMESTAMP
result S
Hypot `hypot` Pythagorean addition (Hypotenuse) of real numbers None project lhs S
rhs S
result S
If `if` IF expression None project predicate S
trueValue S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
falseValue S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
In `in` IN operator None project value S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS
list PS
Literal value only
PS
Literal value only
PS
Literal value only
PS
Literal value only
PS
Literal value only
PS
Literal value only
PS
Literal value only
PS
Literal value only
PS
UTC is only supported TZ for TIMESTAMP;
Literal value only
PS
Literal value only
PS
Literal value only
NS NS NS NS NS NS
result S
InSet INSET operator None project input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
InitCap `initcap` Returns str with the first letter of each word in uppercase. All other letters are in lowercase This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. project input S
result S
InputFileBlockLength `input_file_block_length` Returns the length of the block being read, or -1 if not available None project result S
InputFileBlockStart `input_file_block_start` Returns the start offset of the block being read, or -1 if not available None project result S
InputFileName `input_file_name` Returns the name of the file being read, or empty string if not available None project result S
IntegralDivide `div` Division with a integer result None project lhs S S
rhs S S
result S
IsNaN `isnan` Checks if a value is NaN None project input S S
result S
IsNotNull `isnotnull` Checks if a value is not null None project input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result S
IsNull `isnull` Checks if a value is null None project input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result S
JsonToStructs `from_json` Returns a struct value with the given `jsonStr` and `schema` This is disabled by default because it is currently in beta and undergoes continuous enhancements. Please consult the [compatibility documentation](../compatibility.md#json-supporting-types) to determine whether you can enable this configuration for your use case project jsonStr S
result NS PS
MAP only supports keys and values that are of STRING type;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, MAP, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, MAP, UDT
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
JsonTuple `json_tuple` Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string. This is disabled by default because Experimental feature that could be unstable or have performance issues. project json S
field PS
Literal value only
result S
KnownFloatingPointNormalized Tag to prevent redundant normalization None project input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
KnownNotNull Tag an expression as known to not be null None project input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S NS S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S NS S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, UDT
NS
Lag `lag` Window function that returns N entries behind this one None window input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
offset S
default S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
LambdaFunction Holds a higher order SQL function None project function S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
arguments S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
LastDay `last_day` Returns the last day of the month which the date belongs to None project input S
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Lead `lead` Window function that returns N entries ahead of this one None window input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
offset S
default S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
Least `least` Returns the least value of all parameters, skipping null values None project param S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS
Length `char_length`, `character_length`, `length` String character length or binary byte length None project input S NS
result S
LessThan `<` < operator None project lhs S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
rhs S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
result S
AST lhs S S S S S NS NS S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS
rhs S S S S S NS NS S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS
result S
LessThanOrEqual `<=` <= operator None project lhs S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
rhs S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
result S
AST lhs S S S S S NS NS S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS
rhs S S S S S NS NS S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Like `like` Like None project src S
search PS
Literal value only
result S
Literal Holds a static value from the query None project result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
AST result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS NS NS NS NS
Log `ln` Natural log None project input S
result S
Log10 `log10` Log base 10 None project input S
result S
Log1p `log1p` Natural log 1 + expr None project input S
result S
Log2 `log2` Log base 2 None project input S
result S
Logarithm `log` Log variable base None project value S
base S
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Lower `lcase`, `lower` String lowercase operator This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. project input S
result S
MakeDecimal Create a Decimal from an unscaled long value for some aggregation optimizations None project input S
result PS
max DECIMAL precision of 18
MapConcat `map_concat` Returns the union of all the given maps None project input PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
MapEntries `map_entries` Returns an unordered array of all entries in the given map None project input PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
MapFilter `map_filter` Filters entries in a map using the function None project argument PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
function S
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
MapKeys `map_keys` Returns an unordered array containing the keys of the map None project input PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
MapValues `map_values` Returns an unordered array containing the values of the map None project input PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Md5 `md5` MD5 hash operator None project input S
result S
MicrosToTimestamp `timestamp_micros` Converts the number of microseconds from unix epoch to a timestamp None project input S S S S
result PS
UTC is only supported TZ for TIMESTAMP
MillisToTimestamp `timestamp_millis` Converts the number of milliseconds from unix epoch to a timestamp None project input S S S S
result PS
UTC is only supported TZ for TIMESTAMP
Minute `minute` Returns the minute component of the string/timestamp None project input PS
UTC is only supported TZ for TIMESTAMP
result S
MonotonicallyIncreasingID `monotonically_increasing_id` Returns monotonically increasing 64-bit integers None project result S
Month `month` Returns the month from a date or timestamp None project input S
result S
Multiply `*` Multiplication None project lhs S S S S S S S
rhs S S S S S S S
result S S S S S S S
AST lhs NS NS S S S S NS
rhs NS NS S S S S NS
result NS NS S S S S NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Murmur3Hash `hash` Murmur3 hash operator None project input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
Arrays of structs are not supported;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
result S
NaNvl `nanvl` Evaluates to `left` iff left is not NaN, `right` otherwise None project lhs S S
rhs S S
result S S
NamedLambdaVariable A parameter to a higher order SQL function None project result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
Not `!`, `not` Boolean not operator None project input S
result S
AST input S
result S
NthValue `nth_value` nth window operator None window input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
offset S
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
OctetLength `octet_length` The byte length of string data None project input S NS
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Or `or` Logical OR None project lhs S
rhs S
result S
AST lhs S
rhs S
result S
ParseUrl `parse_url` Extracts a part from a URL None project url S
partToExtract PS
only support partToExtract = PROTOCOL | HOST | QUERY | PATH;
Literal value only
key S
result S
PercentRank `percent_rank` Window function that returns the percent rank value within the aggregation window None window ordering S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS NS
result S
Pmod `pmod` Pmod None project lhs S S S S S S PS
decimals with precision 38 are not supported;
max DECIMAL precision of 18
rhs S S S S S S NS
result S S S S S S NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
PosExplode `posexplode_outer`, `posexplode` Given an input array produces a sequence of rows for each value in the array None project input PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
Pow `pow`, `power` lhs ^ rhs None project lhs S
rhs S
result S
AST lhs S
rhs S
result S
PreciseTimestampConversion Expression used internally to convert the TimestampType to Long and back without losing precision, i.e. in microseconds. Used in time windowing None project input S PS
UTC is only supported TZ for TIMESTAMP
result S PS
UTC is only supported TZ for TIMESTAMP
PromotePrecision PromotePrecision before arithmetic operations between DecimalType data None project input S
result S
PythonUDF UDF run in an external python process. Does not actually run on the GPU, but the transfer of data to/from it can be accelerated None aggregation param S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
reduction param S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
window param S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
project param S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Quarter `quarter` Returns the quarter of the year for date, in the range 1 to 4 None project input S
result S
RLike `rlike` Regular expression version of Like None project str S
regexp PS
Literal value only
result S
RaiseError `raise_error` Throw an exception None project input S
result S
Rand `rand`, `random` Generate a random column with i.i.d. uniformly distributed values in [0, 1) None project seed S S
result S
Rank `rank` Window function that returns the rank value within the aggregation window None window ordering S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS NS
result S
RegExpExtract `regexp_extract` Extract a specific group identified by a regular expression None project str S
regexp PS
Literal value only
idx S
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
RegExpExtractAll `regexp_extract_all` Extract all strings matching a regular expression corresponding to the regex group index None project str S
regexp PS
Literal value only
idx PS
Literal value only
result S
RegExpReplace `regexp_replace` String replace using a regular expression pattern None project regex PS
Literal value only
result S
pos PS
only a value of 1 is supported
str S
rep PS
Literal value only
Remainder `%`, `mod` Remainder or modulo None project lhs S S S S S S S
rhs S S S S S S S
result S S S S S S S
ReplicateRows Given an input row replicates the row N times None project input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
Reverse `reverse` Returns a reversed string or an array with reverse order of elements None project input S PS
UTC is only supported TZ for child TIMESTAMP
result S PS
UTC is only supported TZ for child TIMESTAMP
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Rint `rint` Rounds up a double value to the nearest double equal to an integer None project input S
result S
AST input S
result S
Round `round` Round an expression to d decimal places using HALF_UP rounding mode None project value S S S S PS
result may round slightly differently
PS
result may round slightly differently
S
scale S
result S S S S S S S
RowNumber `row_number` Window function that returns the index for the row within the aggregation window None window result S
ScalaUDF User Defined Function, the UDF can choose to implement a RAPIDS accelerated interface to get better performance. None project param S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
Second `second` Returns the second component of the string/timestamp None project input PS
UTC is only supported TZ for TIMESTAMP
result S
SecondsToTimestamp `timestamp_seconds` Converts the number of seconds from unix epoch to a timestamp None project input S S S S S S S
result PS
UTC is only supported TZ for TIMESTAMP
Sequence `sequence` Sequence None project start S S S S NS NS
stop S S S S NS NS
step S S S S NS
result PS
unsupported child types DATE, TIMESTAMP
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
ShiftLeft `shiftleft` Bitwise shift left (<<) None project value S S
amount S
result S S
ShiftRight `shiftright` Bitwise shift right (>>) None project value S S
amount S
result S S
ShiftRightUnsigned `shiftrightunsigned` Bitwise unsigned shift right (>>>) None project value S S
amount S
result S S
Signum `sign`, `signum` Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive None project input S
result S
Sin `sin` Sine None project input S
result S
AST input S
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Sinh `sinh` Hyperbolic sine None project input S
result S
AST input S
result S
Size `cardinality`, `size` The size of an array or a map None project input PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
result S
SortArray `sort_array` Returns a sorted array with the input array and the ascending / descending order None project array PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT
ascendingOrder S
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT
SortOrder Sort order None project input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
STRUCT is not supported as a child type for ARRAY;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
STRUCT is not supported as a child type for ARRAY;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
SparkPartitionID `spark_partition_id` Returns the current partition id None project result S
SpecifiedWindowFrame Specification of the width of the group (or "frame") of input rows around which a window function is evaluated None project lower S S S S S S S S
upper S S S S S S S S
result S S S S NS NS NS S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Sqrt `sqrt` Square root None project input S
result S
AST input S
result S
Stack `stack` Separates expr1, ..., exprk into n rows. None project n PS
Literal value only
expr S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
StartsWith Starts with None project src S
search PS
Literal value only
result S
StringInstr `instr` Instr string operator None project str S
substr PS
Literal value only
result S
StringLPad `lpad` Pad a string on the left None project str S
len PS
Literal value only
pad PS
Literal value only
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
StringLocate `locate`, `position` Substring search operator None project substr PS
Literal value only
str S
start PS
Literal value only
result S
StringRPad `rpad` Pad a string on the right None project str S
len PS
Literal value only
pad PS
Literal value only
result S
StringRepeat `repeat` StringRepeat operator that repeats the given strings with numbers of times given by repeatTimes None project input S
repeatTimes S
result S
StringReplace `replace` StringReplace operator None project src S
search PS
Literal value only
replace PS
Literal value only
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
StringSplit `split` Splits `str` around occurrences that match `regex` None project str S
regexp PS
very limited subset of regex supported;
Literal value only
limit PS
Literal value only
result S
StringToMap `str_to_map` Creates a map after splitting the input string into pairs of key-value strings None project str S
pairDelim S
keyValueDelim S
result S
StringTranslate `translate` StringTranslate operator This is not 100% compatible with the Spark version because the GPU implementation supports all unicode code points. In Spark versions < 3.2.0, translate() does not support unicode characters with code point >= U+10000 (See SPARK-34094) project input S
from PS
Literal value only
to PS
Literal value only
result S
StringTrim `trim` StringTrim operator None project src S
trimStr PS
Literal value only
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
StringTrimLeft `ltrim` StringTrimLeft operator None project src S
trimStr PS
Literal value only
result S
StringTrimRight `rtrim` StringTrimRight operator None project src S
trimStr PS
Literal value only
result S
StructsToJson `to_json` Converts structs to JSON text format This is disabled by default because it is currently in beta and undergoes continuous enhancements. Please consult the [compatibility documentation](../compatibility.md#json-supporting-types) to determine whether you can enable this configuration for your use case project struct S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
result S
Substring `substr`, `substring` Substring operator None project str S NS
pos S
len S
result S NS
SubstringIndex `substring_index` substring_index operator None project str S
delim PS
only a single character is allowed;
Literal value only
count PS
Literal value only
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Subtract `-` Subtraction None project lhs S S S S S S S NS
rhs S S S S S S S NS
result S S S S S S S NS
AST lhs NS NS S S S S NS NS
rhs NS NS S S S S NS NS
result NS NS S S S S NS NS
Tan `tan` Tangent None project input S
result S
AST input S
result S
Tanh `tanh` Hyperbolic tangent None project input S
result S
AST input S
result S
TimeAdd Adds interval to timestamp None project start PS
UTC is only supported TZ for TIMESTAMP
interval PS
month intervals are not supported;
Literal value only
result PS
UTC is only supported TZ for TIMESTAMP
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
ToDegrees `degrees` Converts radians to degrees None project input S
result S
ToRadians `radians` Converts degrees to radians None project input S
result S
ToUTCTimestamp `to_utc_timestamp` Render the input timestamp in UTC None project timestamp PS
UTC is only supported TZ for TIMESTAMP
timezone PS
Only non-DST(Daylight Savings Time) timezones are supported
result PS
UTC is only supported TZ for TIMESTAMP
ToUnixTimestamp `to_unix_timestamp` Returns the UNIX timestamp of the given time None project timeExp S PS
UTC is only supported TZ for TIMESTAMP
S
format PS
A limited number of formats are supported;
Literal value only
result S
TransformKeys `transform_keys` Transform keys in a map using a transform function None project argument PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
function S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
TransformValues `transform_values` Transform values in a map using a transform function None project argument PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
function S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
UnaryMinus `negative` Negate a numeric value None project input S S S S S S S NS
result S S S S S S S NS
AST input NS NS S S S S NS NS
result NS NS S S S S NS NS
UnaryPositive `positive` A numeric value with a + in front of it None project input S S S S S S S NS
result S S S S S S S NS
AST input S S S S S S NS NS
result S S S S S S NS NS
UnboundedFollowing$ Special boundary for a window frame, indicating all rows preceding the current row None project result S
UnboundedPreceding$ Special boundary for a window frame, indicating all rows preceding the current row None project result S
UnixTimestamp `unix_timestamp` Returns the UNIX timestamp of current or specified time None project timeExp S PS
UTC is only supported TZ for TIMESTAMP
S
format PS
A limited number of formats are supported;
Literal value only
result S
UnscaledValue Convert a Decimal to an unscaled long value for some aggregation optimizations None project input PS
max DECIMAL precision of 18
result S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Upper `ucase`, `upper` String uppercase operator This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. project input S
result S
WeekDay `weekday` Returns the day of the week (0 = Monday...6=Sunday) None project input S
result S
WindowExpression Calculates a return value for every input row of a table based on a group (or "window") of rows None window windowFunction S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
windowSpec S S S S NS NS PS
max DECIMAL precision of 18
S
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
WindowSpecDefinition Specification of a window function, indicating the partitioning-expression, the row ordering, and the width of the window None project partition S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS
value S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS
XxHash64 `xxhash64` xxhash64 hash operator None project input S S S S S NS NS S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS NS
result S
Year `year` Returns the year from a date or timestamp None project input S
result S
AggregateExpression Aggregate expression None aggregation aggFunc S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
filter S
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
reduction aggFunc S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
filter S
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
window aggFunc S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
filter S
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
ApproximatePercentile `approx_percentile`, `percentile_approx` Approximate percentile This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark aggregation input S S S S S S NS NS S
percentage S S
accuracy S
result S S S S S S NS NS S PS
unsupported child types DATE, TIMESTAMP
reduction input S S S S S S NS NS S
percentage S S
accuracy S
result S S S S S S NS NS S PS
unsupported child types DATE, TIMESTAMP
Average `avg`, `mean` Average aggregate operator None aggregation input S S S S S S S
result S S
reduction input S S S S S S S
result S S
window input S S S S S S S
result S S
CollectList `collect_list` Collect a list of non-unique elements, not supported in reduction None aggregation input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result PS
window operations are disabled by default due to extreme memory usage;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
reduction input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result PS
window operations are disabled by default due to extreme memory usage;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
window input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result PS
window operations are disabled by default due to extreme memory usage;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
CollectSet `collect_set` Collect a set of unique elements, not supported in reduction None aggregation input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
result PS
window operations are disabled by default due to extreme memory usage;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
reduction input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
result PS
window operations are disabled by default due to extreme memory usage;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
window input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
result PS
window operations are disabled by default due to extreme memory usage;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
Count `count` Count aggregate operator None aggregation input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
result S
reduction input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
result S
window input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
PS
UTC is only supported TZ for child TIMESTAMP
S
result S
First `first_value`, `first` first aggregate operator None aggregation input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
reduction input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
window input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Last `last_value`, `last` last aggregate operator None aggregation input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
reduction input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
window input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
Max `max` Max aggregate operator None aggregation input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
reduction input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
window input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS
Min `min` Min aggregate operator None aggregation input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
reduction input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
window input S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Percentile `percentile` Aggregation computing exact percentile None aggregation input S S S S S S
percentage PS
Literal value only
S
frequency S S
result S S
reduction input S S S S S S
percentage PS
Literal value only
S
frequency S S
result S S
PivotFirst PivotFirst operator None aggregation pivotColumn S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS NS
valueColumn S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS NS NS
reduction pivotColumn S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS NS
valueColumn S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS NS NS
StddevPop `stddev_pop` Aggregation computing population standard deviation None reduction input NS
result NS
aggregation input S
result S
window input NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
StddevSamp `std`, `stddev_samp`, `stddev` Aggregation computing sample standard deviation None aggregation input S
result S
reduction input S
result S
window input S
result S
Sum `sum` Sum aggregate operator None aggregation input S S S S S S S
result S S S
reduction input S S S S S S S
result S S S
window input S S S S S S S
result S S S
VariancePop `var_pop` Aggregation computing population variance None reduction input NS
result NS
aggregation input S
result S
window input NS
result NS
Expression SQL Functions(s) Description Notes Context Param/Output BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
VarianceSamp `var_samp`, `variance` Aggregation computing sample variance None reduction input NS
result NS
aggregation input S
result S
window input NS
result NS
NormalizeNaNAndZero Normalize NaN and zero None project input S S
result S S
ScalarSubquery Subquery that will return only one row and one column None project result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
NS
HiveGenericUDF Hive Generic UDF, the UDF can choose to implement a RAPIDS accelerated interface to get better performance None project param S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
HiveSimpleUDF Hive UDF, the UDF can choose to implement a RAPIDS accelerated interface to get better performance None project param S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
result S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS

Casting

The above table does not show what is and is not supported for cast. This table shows the matrix of supported casts. Nested types like MAP, Struct, and Array can only be cast if the child types can be cast.

Some of the casts to/from string on the GPU are not 100% the same and are disabled by default. Please see the configs for more details on these specific cases.

Please note that even though casting from one type to another is supported by Spark it does not mean they all produce usable results. For example casting from a date to a boolean always produces a null. This is for Hive compatibility and the accelerator produces the same result.

AnsiCast

TO
BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
FROM BOOLEAN S S S S S S S S S
BYTE S S S S S S S S S
SHORT S S S S S S S S S
INT S S S S S S S S S
LONG S S S S S S S S S
FLOAT S S S S S S S PS
Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true.
S
DOUBLE S S S S S S S PS
Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true.
S
DATE S PS
UTC is only supported TZ for TIMESTAMP
S
TIMESTAMP S PS
UTC is only supported TZ for TIMESTAMP
S
STRING S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS
DECIMAL NS S S S S S S S S
NULL S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS NS
BINARY S S
CALENDAR NS NS
ARRAY PS
The array's child type must also support being cast to the desired child type;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, MAP, UDT
MAP PS
the map's key and value must also support being cast to the desired child types;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
STRUCT PS
the struct's children must also support being cast to the desired child type(s);
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, MAP, UDT
UDT NS

Cast

TO
BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
FROM BOOLEAN S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S
BYTE S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S
SHORT S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S
INT S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S
LONG S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S
FLOAT S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
PS
Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true.
S
DOUBLE S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
PS
Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true.
S
DATE S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS
TIMESTAMP S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S NS
STRING S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS
DECIMAL NS S S S S S S NS S S
NULL S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS NS NS NS NS
BINARY S S
CALENDAR NS NS
ARRAY PS
the array's child type must also support being cast to string
PS
The array's child type must also support being cast to the desired child type(s);
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
MAP PS
the map's key and value must also support being cast to string
PS
the map's key and value must also support being cast to the desired child types;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
STRUCT PS
the struct's children must also support being cast to string
PS
the struct's children must also support being cast to the desired child type(s);
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
UDT NS NS

Partitioning

When transferring data between different tasks the data is partitioned in specific ways depending on requirements in the plan. Be aware that the types included below are only for rows that impact where the data is partitioned. So for example if we are doing a join on the column a the data would be hash partitioned on a, but all of the other columns in the same data frame as a don’t show up in the table. They are controlled by the rules for ShuffleExchangeExec which uses the Partitioning.

Partition Description Notes Param BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
HashPartitioning Hash based partitioning None hash_key S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
Arrays of structs are not supported;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
RangePartitioning Range partitioning None order_key S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S NS NS PS
STRUCT is not supported as a child type for ARRAY;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
RoundRobinPartitioning Round robin partitioning None
SinglePartition$ Single partitioning None

Input/Output

For Input and Output it is not cleanly exposed what types are supported and which are not. This table tries to clarify that. Be aware that some types may be disabled in some cases for either reads or writes because of processing limitations, like rebasing dates or timestamps, or for a lack of type coercion support.

Format Direction BOOLEAN BYTE SHORT INT LONG FLOAT DOUBLE DATE TIMESTAMP STRING DECIMAL NULL BINARY CALENDAR ARRAY MAP STRUCT UDT
Avro Read S S S S S S S NS NS S NS NS NS NS NS NS
Write NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
CSV Read S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S NS
Write NS NS NS NS NS NS NS NS NS NS NS NS
Delta Read S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
Write S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
HiveText Read S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S NS NS NS NS NS NS NS
Write S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S NS NS NS NS NS NS NS
Iceberg Read S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
Write NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
JSON Read S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, MAP, UDT
NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, MAP, UDT
NS
Write NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
ORC Read S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, UDT
NS
Write S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, MAP, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, MAP, UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, MAP, UDT
NS
Parquet Read S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
Write S S S S S S S S PS
UTC is only supported TZ for TIMESTAMP
S S S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS

Apache Iceberg Support

Support for Apache Iceberg has additional limitations. See the Apache Iceberg Support document.