UDF to Catalyst Expressions
To speedup the processing of user defined functions (UDFs), the RAPIDS Accelerator for Apache Spark introduces a UDF compiler extension to translate UDFs to Catalyst expressions.
To enable this operation on the GPU, set spark.rapids.sql.udfCompiler.enabled to true.
Be aware Spark may produce different results for a compiled UDF vs. the non-compiled. For example: a UDF of x/y where y happens to be 0, the compiled catalyst expressions will return NULL while the original UDF would fail the entire job with a java.lang.ArithmeticException: / by zero
When translating UDFs to Catalyst expressions, the supported UDF functions are limited:
| Operand type | Operation |
|---|---|
| Arithmetic Unary | +x |
| -x | |
| Arithmetic Binary | lhs + rhs |
| lhs - rhs | |
| lhs * rhs | |
| lhs / rhs | |
| lhs % rhs | |
| Logical | lhs && rhs |
| lhs || rhs | |
| !x | |
| Equality and Relational | lhs == rhs |
| lhs < rhs | |
| lhs <= rhs | |
| lhs > rhs | |
| lhs >= rhs | |
| Bitwise | lhs & rhs |
| lhs | rhs | |
| lhs ^ rhs | |
| ~x | |
| lhs « rhs | |
| lhs » rhs | |
| lhs »> rhs | |
| Conditional | if |
| case | |
| Math | abs(x) |
| cos(x) | |
| acos(x) | |
| asin(x) | |
| tan(x) | |
| atan(x) | |
| tanh(x) | |
| cosh(x) | |
| ceil(x) | |
| floor(x) | |
| exp(x) | |
| log(x) | |
| log10(x) | |
| sqrt(x) | |
| x.isNaN | |
| Type Cast | * |
| String | lhs + rhs |
| lhs.equalsIgnoreCase(String rhs) | |
| x.toUpperCase() | |
| x.trim() | |
| x.substring(int begin) | |
| x.substring(int begin, int end) | |
| x.replace(char oldChar, char newChar) | |
| x.replace(CharSequence target, CharSequence replacement) | |
| x.startsWith(String prefix) | |
| lhs.equals(Object rhs) | |
| x.toLowerCase() | |
| x.length() | |
| x.endsWith(String suffix) | |
| lhs.concat(String rhs) | |
| x.isEmpty() | |
| String.valueOf(boolean b) | |
| String.valueOf(char c) | |
| String.valueOf(double d) | |
| String.valueOf(float f) | |
| String.valueOf(int i) | |
| String.valueOf(long l) | |
| x.contains(CharSequence s) | |
| x.indexOf(String str) | |
| x.indexOf(String str, int fromIndex) | |
| x.replaceAll(String regex, String replacement) | |
| x.split(String regex) | |
| x.split(String regex, int limit) | |
| x.getBytes() | |
| x.getBytes(String charsetName) | |
| Date and Time | LocalDateTime.parse(x, DateTimeFormatter.ofPattern(pattern)).getYear |
| LocalDateTime.parse(x, DateTimeFormatter.ofPattern(pattern)).getMonthValue | |
| LocalDateTime.parse(x, DateTimeFormatter.ofPattern(pattern)).getDayOfMonth | |
| LocalDateTime.parse(x, DateTimeFormatter.ofPattern(pattern)).getHour | |
| LocalDateTime.parse(x, DateTimeFormatter.ofPattern(pattern)).getMinute | |
| LocalDateTime.parse(x, DateTimeFormatter.ofPattern(pattern)).getSecond | |
| Empty array creation | Array.empty[Boolean] |
| Array.empty[Byte] | |
| Array.empty[Short] | |
| Array.empty[Int] | |
| Array.empty[Long] | |
| Array.empty[Float] | |
| Array.empty[Double] | |
| Array.empty[String] | |
| Arraybuffer | new ArrayBuffer() |
| x.distinct | |
| x.toArray | |
| lhs += rhs | |
| lhs :+ rhs | |
| Method call | Only if the method being called 1. Consists of operations supported by the UDF compiler, and 2. is one of the folllowing: a final method, a method in a final class, or a method in a final object |
| Captured variables | Only primitive type variables captured from a method |
| Throwing exception | Only if the exception thrown is a SparkException. The exception is then convered to a RuntimeException at runtime |
All other expressions, including but not limited to try and catch, are unsupported and UDFs with such expressions cannot be compiled.