Performance Tuning
Stage-level scheduling
Starting from spark-rapids-ml 23.10.0
, stage-level scheduling is automatically enabled. Therefore, if you are using Spark standalone cluster version 3.4.0
or higher, we strongly recommend configuring the "spark.task.resource.gpu.amount"
as a fractional value. This will enable running multiple tasks in parallel during the ETL phase to help the performance. An example configuration would be "spark.task.resource.gpu.amount=1/spark.executor.cores"
. For example,
spark-submit \
--master spark://<master-ip>:7077 \
--conf spark.executor.cores=12 \
--conf spark.task.cpus=1 \
--conf spark.executor.resource.gpu.amount=1 \
--conf spark.task.resource.gpu.amount=0.08 \
...
The above spark-submit command specifies a request for 1 GPU and 12 CPUs per executor. So you can see, a total of 12 tasks per executor will be executed concurrently during the ETL phase. And the stage-level scheduling is then used internally to the library to automatically carry out the ML training phases using the required 1 gpu per task.
However, if you are using a spark-rapids-ml version earlier than 23.10.0 or a Spark standalone cluster version below 3.4.0, you need to make sure there will be only 1 task running at any time per executor. You can set spark.task.cpus
equal to spark.executor.cores
, or "spark.task.resource.gpu.amount"=1
. For example,
spark-submit \
--master spark://<master-ip>:7077 \
--conf spark.executor.cores=12 \
--conf spark.task.cpus=1 \
--conf spark.executor.resource.gpu.amount=1 \
--conf spark.task.resource.gpu.amount=1 \
...