RAPIDS Accelerator for Apache Spark Nvtx Range Glossary
The following is the list of Nvtx ranges that are used throughout the plugin. To add your own Nvtx range to the code, create an NvtxId entry in NvtxRangeWithDoc.scala and create an NvtxRangeWithDoc in the code location that you want to cover, passing in the newly created NvtxId.
See nvtx_profiling.md for more info.
Nvtx Ranges
| Name | Description |
|---|---|
| getMapSizesByExecId | Call to internal Spark API for retrieving size and location of shuffle map output blocks |
| gpuKudoSerialize | Perform kudo serialization on the gpu |
| agg reduce | Reduction aggregation for operations without grouping keys |
| read python batch | Reading Python batch |
| join asymmetric fetch | Asymmetric join data fetch |
| agg post-process | Post-processing step for aggregation, including casting and struct decomposition |
| project tier | Executing tiered projection operation |
| first stream batch | Processing first stream batch |
| agg repartition | Repartitioning data for aggregation |
| DoubleBatchedWindow_PRE | Pre-processing for double-batched window operation |
| GpuGenerate project split | Splitting projection in generate operation |
| parquet parse filter footer | Parsing and filtering Parquet footer by range |
| Compile ASTs | Compiling abstract syntax trees for expression evaluation |
| parquet filter blocks | Filtering Parquet row group blocks based on predicates |
| PageableH2D | Copying from pageable host memory to device |
| TOP N | Computing top N rows |
| zstd post process | Post-processing ZSTD compressed data |
| split input batch | Splitting input batch for sorting |
| update tracking mask | Updating tracking mask for join operation |
| hash partition slice | Slicing partitioned table into individual partitions |
| probe right | Probing the right side of a join input iterator to get the data size for preparing the join |
| DeserializeBatch | Deserializing broadcast batch |
| Serialize Row Only Batch | Serializing row-only batch |
| alloc output bufs | Allocating output buffers for compression |
| existence join scatter map | Creating scatter map for existence join |
| Transport copy buffer | Copying buffer for Rapids shuffle transport |
| hash partition | Partitioning data based on hash values |
| fetch join stream | IO time on the stream side data for the following join |
| Sample Exec | Sampling rows from data |
| shuffled join stream | GpuShuffledHashJoinExec op is preparing build batches for join |
| cartesian product deserialize | Deserializing batch from cartesian product operation |
| doHandleMeta | Processing metadata handling |
| sort lower boundaries | Computing lower boundaries for sort operation |
| Release GPU | Releasing the GPU semaphore |
| Acquire GPU | Time waiting for GPU semaphore to be acquired |
| RapidsCachingReader.read | Reading shuffle data from cache or remote transport |
| computeAggregate | Computing aggregation on input batch |
| concat pending | Concatenating pending batches |
| RepartitionAggregateIterator.next | Fetching next batch from repartition aggregate iterator |
| copy compressed buffers | Copying compressed buffer data |
| GpuCoalesceBatches: collect | GPU combining of small batches post-kernel processing |
| Round robin partition | Partitioning data using round-robin strategy |
| Sub-join part | Hash partitioning for sub-join operation |
| write python batch | Writing Python batch |
| get map | Getting join map |
| GpuPartitioner | GPU partitioner operation |
| Sort next output batch | Fetching next sorted output batch |
| existence join batch | Processing batch for existence join |
| TOP N Offset | Computing top N rows with offset |
| sort_order | Computing sort order for data |
| broadcast collect | Collecting data for broadcast |
| join asymmetric probe fetch | Asymmetric join probe side data fetch |
| gather | Gathering sorted data based on indices |
| sort op | General sort operation |
| Avro decode | Decoding Avro data to columnar format |
| interleaveBits | Interleaving bits for Z-order |
| sort to unsafe row | Converting sorted data to unsafe row format |
| window | Computing window function results |
| cartesian product serialize | Serializing batch for cartesian product operation |
| Buffer file split | Buffering file split for reading |
| broadcast join stream | time it takes to materialize a broadcast batch on the host |
| GpuJsonToStructs | Converting JSON to structs |
| Buffer Callback | Processing buffer callback |
| concatenateBatches | Concatenating multiple batches into one |
| avro buffer file split | Splitting Avro file into buffer chunks for reading |
| parquet buffer file split | Splitting Parquet file into buffer chunks for reading |
| gpuKudoSliceBuffers | slice kudo serialized buffers on host into partitions |
| merge sort | Merge sorting multiple sorted batches |
| Parquet readBatch | Reading Parquet batch |
| GPU file format write | Writing batch of data to file format on GPU |
| get final batch | Getting final batch from join operation |
| ORC decode | Decoding ORC data to columnar format |
| Round robin partition slice | Slicing data for round-robin partitioning |
| BATCH RECEIVED | Processing received shuffle batch |
| parquet read filtered footer | Reading filtered Parquet footer with selected row groups |
| sort | Sorting columnar data |
| ColumnarToRow: batch | Converting columnar batch to row format |
| Avro readBatch | Reading Avro batch |
| Shuffle Concat CPU | Concatenating shuffle data on CPU |
| ColumnarToRow: fetch | Fetching data during columnar to row conversion |
| PartitionD2H | Copying partition data from device to host |
| HIVE decode | Decoding Hive table data |
| device spill | Spilling data from device memory to host |
| parquet read footer | Reading and parsing complete Parquet footer |
| CommitShuffle | After all temporary shuffle writes are done, produce a single file (shuffle_[map_id]_0) in the commit phase |
| agg pre-process | Pre-processing step for aggregation before calling cuDF aggregate, including casting and struct creation |
| ProjectExec | Executing projection operation on columnar batch |
| Columnar batch serialize row only | Serializing row-only batch (no GPU data) |
| disk spill | Spilling data from host memory to disk |
| Async Shuffle Buffer | Asynchronous shuffle buffering operation |
| CSV decode | Decoding CSV data |
| get batch | Getting join batch |
| Spark Task | Spark task execution range for stage and task tracking |
| reduction merge m2 | Merging M2 values during variance/stddev reduction |
| HILBERT INDEX | Computing Hilbert index |
| gpuAcquireC2C | Acquiring GPU for coalesce-to-coalesce operation |
| ParallelDeserializerIterator.next | Calling next on the MT shuffle reader iterator |
| Bring back to host | Copying GPU data back to host memory |
| limit and offset | Applying limit and offset to data |
| Parquet decode | Decoding Parquet data to columnar format |
| generate estimate repetition | Estimating repetition count for generate operation |
| Read Header | Reading serialized batch header |
| random expr | Generating random values in expression evaluation |
| queueFetched | MT shuffle manager is using the RapidsShuffleBlockFetcherIterator to queue the next set of fetched results |
| generate get row byte count | Computing byte count for generated rows |
| single build batch concat | Concatenating batches for single build batch |
| sort copy boundaries | Copying boundary data for sort operation |
| WaitingForWrites | Rapids Shuffle Manager (multi threaded) is waiting for any queued writes to finish before finalizing the map output writer |
| Project AST | Applying AST-based projection to batch |
| ORC readBatches | Reading ORC batches |
| Shuffle Transfer Request | Handling shuffle data transfer request |
| consumeWindow | Consuming transfer window |
| orc buffer file split | Splitting ORC file into buffer chunks for reading |
| parquet clip schema | Clipping Parquet schema to required columns |
| AbstractGpuCoalesceIterator | Default range for a code path in the AbstractGpuCoalesceIterator for an op which is not explicitly documented in its own range |
| Client.fetch | Fetching data from shuffle server |
| Columnar batch serialize | Serializing columnar batch for shuffle or storage |
| Read Batch | Reading serialized batch data |
| pinnedH2D | Copying from pinned host memory to device |
| sliceInternalOnGpu | Slicing partition data on GPU |
| file format readBatch | Reading batch of data from file format (Parquet/ORC/Avro/CSV/JSON) |
| batch decompress | Decompressing batch data |
| broadcast | Broadcasting data to executors |
| HostColumnarToGpu concat | Concatenating batches in HostColumnarToGpu |
| ThreadedWriter.write | Rapids Shuffle Manager (multi threaded) writing |
| ThreadedReader.read | Rapids Shuffle Manager (multi threaded) reading |
| RapidsShuffleIterator.next | Fetching next batch from Rapids shuffle iterator |
| post-process | Post-processing aggregation results |
| json convert datetime | Converting JSON datetime types to Spark datetime types |
| Buffer file split text | Buffering text file split |
| agg groupby | Group-by aggregation using cuDF groupBy operation |
| gpuKudoCopyToHost | copy gpu kudo serialized outputs back to the host |
| RapidsShuffleIterator.gotBatch | Processing batch received from Rapids shuffle |
| RapidsShuffleIterator prep | Preparing shuffle iterator with cached and remote blocks |
| JSON decode | Decoding JSON data |
| Handle Meta Request | Handling metadata request on shuffle server |
| filter batch | Filtering rows from a columnar batch |
| shuffle fetch first batch | Fetching first batch in shuffle coalesce operation |
| windowExec | Executing window operation on batch |
| build join table | Building hash table for join operation |
| join first stream batch | Fetching and processing first batch from stream side of join |
| GpuGenerateIterator | Iterating through generated data |
| finalize agg | Finalizing aggregation results |
| DoubleBatchedWindow_POST | Post-processing for double-batched window operation |
| broadcast build | Building broadcast data structure |
| getBroadcastBatch | Getting broadcast batch |
| hash join build | IO time on the build side data for the following join |
| row to columnar | Converting row-based data to columnar format |
| SerializeBatch | Serializing broadcast batch |
| build batch: collect | Perform a join where the build side fits in a single GPU batch |
| probe left | Probing the left side of a join input iterator to get the data size for preparing the join |
| spill map | Spilling join map |
| Async Shuffle Read | Asynchronous shuffle read operation |
| Client.handleMeta | Handling metadata from shuffle server |
| Serialize Batch | Serializing columnar batch |
| json convert table | Converting JSON table to desired schema type |
| RapidsCachingWriter.close | Closing Rapids caching writer and finalizing shuffle output |
| GpuRange | Generating range of values on GPU |
| Calculate part | Calculating hash partition assignments |
| RunningWindow | Computing running window aggregation |
| GpuGenerateExec | Executing generate operation on GPU |
| Join gather | Gathering join results |
| waitForCPU | Waiting for CPU batch in hybrid execution |
| parquet get blocks with filter | Retrieving Parquet blocks after applying filters |
| dynamic sort heuristic | Applying dynamic sort heuristic for aggregation |
| shuffle concat load batch | Concatenating and loading batch in shuffle operation |
| parquet read footer bytes | Reading raw footer bytes from Parquet file |
| spill batch | Spilling join batch |
| Fast Sample Exec | Fast sampling rows from data |
| partition for join | Hash partitioning data for join operation |
| BatchWait | Rapids Shuffle Manager (multi threaded) reader blocked waiting for batches to finish decoding |
| lz4 post process | Post-processing LZ4 compressed data |
| GpuCoalesceBatches concat | Concatenating batches in GpuCoalesceBatches |
| full hash join gather map | Gathering full hash join results |
| readNConcat | Reading and concatenating N batches |
| RapidsCachingWriter.write | Rapids Shuffle Manager (ucx) writing |
| hash join gather map | Gathering hash join results using gather map |
| calc gather size | Calculating gather operation size |
| sliceInternalOnCpu | Slicing partition data on CPU |
| broadcast manifest batch | Creating broadcast manifest batch |
| batch compress | Compressing batch data |
| ExpandExec projections | Projecting expand operation on batch |
| HostDeserializeBatch | Deserializing batch on host |
| RapidsCachingReader read local | Reading shuffle blocks from local cache |