Configuration
The following configurations can be supplied as Spark properties.
Property name | Default | Meaning |
---|---|---|
spark.rapids.ml.uvm.enabled | false | if set to true, enables CUDA unified virtual memory (aka managed memory) during estimator.fit() operations to allow processing of larger datasets than would fit in GPU memory |
spark.rapids.ml.gpuMemRatioForData | None | If set to a float value between 0 and 1, Spark Rapids ML will reserve a portion of free GPU memory on each GPU and incrementally append PySpark data batches into this reserved space. This setting is recommended for large datasets, as it prevents duplicating the entire dataset in GPU memory and reduces the risk of out-of-memory errors. |
spark.rapids.ml.cpu.fallback.enabled | false | if set to true and spark-rapids-ml estimator.fit() is invoked with unsupported parameters or parameter values, the pyspark.ml cpu based estimator.fit() and model.transform() will be run; if set to false, an exception is raised in this case (default). |
Since the algorithms rely heavily on Pandas UDFs, we also require spark.sql.execution.arrow.pyspark.enabled=true
to ensure efficient data transfer between the JVM and Python processes.