Processing Steps Submodule
This module contains the classes which represent individual processing steps as well as the respective
base classes, which can be used to implement custom processing steps (see PipelineStepBase) as well
as access modifier wrapper steps (see GroupToApplyToSelectedStepBase).
The individual processing steps are the building blocks of the pipeline, which is defined by a sequence of
processing steps (in addition to the input callable/iterable, see the inputs sub-module).
- class accvlab.dali_pipeline_framework.processing_steps.PipelineStepBase[source]
Bases:
ABCBase class for pipeline processing steps.
Pipeline processing steps are the building blocks of the pipeline and represent individual operations applied to input data in sequence to produce outputs.
Provides the common interface and common functionality shared by all processing steps:
Checking the input data format for compatibility and setting the output data format (blueprint) (see
check_input_data_format_and_set_output_data_format()).Applying the step via
__call__(). ThisInvokes
_process()to perform the actual processing.Validates the resulting data format against a reference blueprint from
check_input_data_format_and_set_output_data_format()to ensure that the resulting format is “as advertised”, i.e. as obtained by independent calls tocheck_input_data_format_and_set_output_data_format(). Note that this check is performed at DALI graph construction time and therefore does not affect runtime during training.
Support for operating on sub-trees of input data (through specialized wrapper steps, see
GroupToApplyToSelectedStepBase).
Consistent & Independent Data Processing
Many of the included processing steps can be configured to operate on more than one field in the input
SampleDataGroupobject. For some steps (e.g. those which apply random transformations), the question arises whether these steps should apply consistent processing across all fields they process (e.g. same augmentation transformation for all images), or if the processing should happen independently for different fields (e.g. different transformations for different images). The answer to this question depends on the use-case.By default, the processing steps are designed to apply consistent processing. For example,
AffineTransformerapplies the same spatial transform to all processed images, as well as corresponding fields such as point sets defined on the image or projection matrices. This ensures that:Consistent randomization is possible if needed (e.g., between an image, a corresponding segmentation mask, projection matrix, and points defined on the image).
No correspondences between multiple fields need to be explicitly maintained. For example, if multiple images and projection matrices are present, there is no need to know which projection matrix corresponds to which image, as the same transformation is applied to all of them. This is useful when processing multiple fields which are related to one another.
To ensure that independent processing (e.g. different randomizations) can be applied to different parts of the data (e.g., different randomizations for data from different cameras), sub-classes of
GroupToApplyToSelectedStepBasecan be used to select one or more parts (sub-trees) of the input data to process independently of each other. The selection of the sub-trees also allows to establish field correspondences (e.g., process the image and projection matrix from one camera consistently) in a natural way, i.e. by grouping all related fields in one sub-tree (e.g. one sub-tree per camera).The available wrappers include
DataGroupInPathAppliedStep,DataGroupsWithNameAppliedStep,DataGroupArrayInPathElementsAppliedStep, andDataGroupArrayWithNameElementsAppliedStep. Please see the documentation of these classes for more details. If necessary, new wrappers can be added by subclassingGroupToApplyToSelectedStepBase.Having both options (e.g. consistent or different randomizations for different parts of the data) available, as well as the ability to group related data (e.g. all images and projection matrices for one camera) allows for flexible pipeline design which can be tailored to the specific use-case by configuration.
- abstract _check_and_adjust_data_format_input_to_output(data_empty)[source]
Check the input data format for compatibility and return the output data format (blueprint).
If the input data format is incompatible, raise an exception describing the problem.
Please see
check_input_data_format_and_set_output_data_format()for a description of typical checks and format changes that need to be performed here.This method may or may not modify
data_emptydirectly, but in any case has to return an object representing the modified format (i.e., either the modifieddata_emptyor a new object).Note
Override this method in each (non-abstract) derived class to define the actual functionality.
This method is called by
check_input_data_format_and_set_output_data_format()and should not be called directly.
- Parameters:
data_empty (
SampleDataGroup) – Input data format (blueprint)- Returns:
SampleDataGroup– Resulting data format (blueprint)
- abstract _process(data)[source]
Apply the processing step to the input, or to a selected sub-tree when wrapped accordingly.
Individual processing steps need to override this method and implement the actual functionality.
The method may mutate the input data; callers must not rely on the input remaining unchanged or corresponding to the output after the call.
Note
Override this method in each (non-abstract) derived class to define the actual functionality.
This method is called by
__call__()and should not be called directly.
- Parameters:
data (
SampleDataGroup) – Data to be processed by the step.- Returns:
SampleDataGroup– Resulting processed data.
- __call__(data)[source]
Apply the processing step and validate its output format.
Important
To define the actual functionality of a processing step, override
_process(), not this method.- Parameters:
data (
SampleDataGroup) – Input data to process.- Returns:
SampleDataGroup– Processed output data.
- check_input_data_format_and_set_output_data_format(data_empty)[source]
Check the input data format for compatibility and return the output data format (blueprint).
Compatibility typically means that expected data fields are present and types are compatible, and that the output data fields can be added (are not already present). Typical changes to the data format include additions/removals of fields or changes to data types (e.g., an image may change from
types.DALIDataType.UINT8totypes.DALIDataType.FLOATin a normalization step).This method does not modify
data_emptyin place; it returns a newSampleDataGroupdescribing the modified format.If the input data format is incompatible, an exception is raised.
Important
To define the actual functionality of the check, override
_check_and_adjust_data_format_input_to_output(), not this method.- Parameters:
data_empty (
SampleDataGroup) – Input data format (blueprint),- Returns:
SampleDataGroup– Resulting data format (blueprint).
- class accvlab.dali_pipeline_framework.processing_steps.GroupToApplyToSelectedStepBase(processing_step_to_apply)[source]
Bases:
PipelineStepBaseBase class for wrappers that apply a contained processing step to selected parts (sub-trees) of the input.
The wrapper forwards only the selected parts (sub-tree(s)) to the contained step, which then operates as if the sub-tree were the full input. If multiple sub-trees are selected (e.g. each sub-tree corresponding to data of one step in time out of a sequence), the contained step is called multiple times, executing independently for each sub-tree. If joint processing is required, design the contained step to consume the full tree (or a larger sub-tree) instead of using a wrapper.
- Parameters:
processing_step_to_apply – Processing step to apply to the selected sub-trees.
Important
Ensure that the constructor of this class is called by any derived class.
- abstract _check_and_get_paths_to_apply_to(data)[source]
Check input and return paths to all sub-trees to process.
Requirements on the input include that at least one sub-path is found and that paths match the expected type (e.g., array data group fields when iterating over elements). See
SampleDataGroupfor what constitutes an array and how to check whether a field is an array.If the requirements are not satisfied, an error shall be raised.
Note
Override this method in each (non-abstract) derived class to define the actual selection of sub-trees to process. Note that this is the only method which needs to be overridden, and is used by the other methods of this class, which perform the actual processing.
- class accvlab.dali_pipeline_framework.processing_steps.DataGroupInPathAppliedStep(processing_step_to_apply, path_to_apply_to)[source]
Bases:
GroupToApplyToSelectedStepBaseApply a contained processing step to the sub-tree rooted at a given path.
- Parameters:
processing_step_to_apply – The contained processing step
path_to_apply_to – Path to the root of the sub-tree to apply processing_step_to_apply to
- class accvlab.dali_pipeline_framework.processing_steps.DataGroupsWithNameAppliedStep(processing_step_to_apply, names_of_groups_to_apply_to, check_minimum_one_name_match=True)[source]
Bases:
GroupToApplyToSelectedStepBaseApply a contained processing step to all sub-trees whose root is a data group field with a given name.
The name is defined at construction; all matching data group fields are located and the contained step is applied to each corresponding sub-tree.
- Parameters:
processing_step_to_apply – Contained processing step to apply.
names_of_groups_to_apply_to – Name or list of names of data group fields to select as sub-tree roots.
check_minimum_one_name_match (default:
True) – IfTrue, require that at least one field is found for each provided name, and an error is raised otherwise when checking the input.
- class accvlab.dali_pipeline_framework.processing_steps.DataGroupArrayInPathElementsAppliedStep(processing_step_to_apply, path_to_array_to_apply_to)[source]
Bases:
DataGroupInPathAppliedStepApply a contained processing step independently to each element of an array data group field at a path.
The path of the array data group field is defined at construction. Each element of that array is processed independently by the contained step.
- Parameters:
processing_step_to_apply – Contained processing step to apply.
path_to_array_to_apply_to – Path to the array data group field whose children should be processed.
- class accvlab.dali_pipeline_framework.processing_steps.DataGroupArrayWithNameElementsAppliedStep(processing_step_to_apply, name_of_arrays_to_apply_to, check_minimum_one_name_match=True)[source]
Bases:
DataGroupsWithNameAppliedStepApply a contained processing step independently to each element of all array data group fields with a given name.
The name is defined at construction. All fields with that name must be array data group fields (see
SampleDataGroup). Each element of each found array is processed independently by the contained step.- Parameters:
processing_step_to_apply (
PipelineStepBase) – Contained processing step to apply.name_of_arrays_to_apply_to (
Union[str,int]) – Name of the array data group fields whose elements should be processed.check_minimum_one_name_match (default:
True) – IfTrue, require that at least one array is found; otherwise an error is raised when checking the input.
- class accvlab.dali_pipeline_framework.processing_steps.ImageDecoder(image_name, use_device_mixed, hw_decoder_load=0.65, as_bgr=False)[source]
Bases:
PipelineStepBaseDecode images.
- Behavior:
Finds all images by name, decodes them (to RGB or BGR), and replaces the encoded image data by the decoded version in place.
Image search happens at DALI graph construction time; only the actual decoding operator is part of the DALI graph. This means that the runtime performance is not affected by the search for images.
- Parameters:
image_name (
str) – Name of the image data field(s) to decodeuse_device_mixed (
bool) – IfTrue, decoding will be partially performed on the GPU and the resulting images will be located in GPU memory. IfFalse, only the CPU is used.hw_decoder_load (
float, default:0.65) – In case ofuse_device_mixed==True, this parameter sets the fraction of the workload to be performed by decoding hardware (as opposed to software CUDA kernels).as_bgr (
bool, default:False) – Whether to output BGR images (instead of RGB images).
- class accvlab.dali_pipeline_framework.processing_steps.ImageToTileSizePadder(image_name, tile_size_to_pad_to)[source]
Bases:
PipelineStepBasePad images so height and width are multiples of a given tile size.
The image is padded with zeros and the image size field is updated to the padded size.
- class accvlab.dali_pipeline_framework.processing_steps.ImageRange01Normalizer(image_name)[source]
Bases:
PipelineStepBaseConvert RGB or BGR image from UINT8 to FLOAT and scale to [0.0, 1.0].
Each matching image is cast to
types.DALIDataType.FLOATand divided by 255.0 per channel.
- class accvlab.dali_pipeline_framework.processing_steps.ImageMeanStdDevNormalizer(image_name, mean, std_dev, output_type=<DALIDataType.FLOAT: 9>)[source]
Bases:
PipelineStepBaseNormalize RGB or BGR images by mean and standard deviation, using pre-defined mean & standard deviation values.
Normalization subtracts the mean and divides by the standard deviation per channel over spatial axes. Scalars broadcast to all channels; For 3‑vectors, each element corresponds to a channel; No distinction between RGB and BGR is made. This means that the mean and standard deviation values need to be provided for the channels in the order corresponding to the image format.
Note
The mean and standard deviation values need to be provided on construction. They are not computed from the images at runtime.
- Parameters:
image_name (
Union[str,int]) – Name of the image data fields to normalize.mean (
Union[Sequence[float],float]) – Mean value used as basis for the normalization. Can be a single value (applied to all color channels) or a vector, containing the values for all channels.std_dev (
Union[Sequence[float],float]) – Standard deviation used as basis for the normalization. Can be a single value (applied to all color channels) or a vector, containing the values for all channels.output_type (
DALIDataType, default:<DALIDataType.FLOAT: 9>) – Data type for the output image. Default value istypes.DALIDataType.FLOAT(i.e. 32-bit float).
- class accvlab.dali_pipeline_framework.processing_steps.PhotoMetricDistorter(image_name, min_max_brightness, min_max_hue, min_max_contrast, min_max_saturation, prob_brightness_aug=0.5, prob_hue_aug=0.5, prob_contrast_aug=0.5, prob_saturation_aug=0.5, prob_swap_channels=0.5, is_bgr=False, enforce_process_on_gpu=True)[source]
Bases:
PipelineStepBaseApply photometric augmentations to images (brightness, contrast, saturation, hue, channel swap).
The same random decision & parametrization for each augmentation is shared across all matched images to keep consistency (e.g., across multi-view inputs).
- Parameters:
image_name (
Union[str,int]) – Name of the image data fields to augment.min_max_brightness (
Sequence[float]) – Minimum and maximum biases to apply to the brightness. Note that as the image may be in different ranges ([0; 1] for float images, [0; 255] for uint8 images), the values provided here are expected to be in the corresponding range.min_max_hue (
Sequence[float]) – Minimum and maximum change in hue (degrees).min_max_contrast (
Sequence[float]) – Minimum and maximum contrast factor (multiplicative).min_max_saturation (
Sequence[float]) – Minimum and maximum saturation factor (multiplicative in HSV space).prob_brightness_aug (
float, default:0.5) – Probability to apply brightness augmentation. Default value is 0.5.prob_hue_aug (
float, default:0.5) – Probability to apply hue change augmentation. Default value is 0.5.prob_contrast_aug (
float, default:0.5) – Probability to apply contrast augmentation. Default value is 0.5.prob_saturation_aug (
float, default:0.5) – Probability to apply saturation augmentation. Default value is 0.5.prob_swap_channels (
float, default:0.5) – Probability to randomly permute color channels.is_bgr (
bool, default:False) – Whether the image is in BGR format (RGB otherwise).enforce_process_on_gpu (
bool, default:True) – Whether to enforce the augmentation to happen on the GPU, even if the input image is stored on the CPU. Default value isTrue.
- class accvlab.dali_pipeline_framework.processing_steps.AffineTransformer(output_hw, resizing_mode, resizing_anchor=None, image_field_names=None, image_hw_field_names=None, projection_matrix_field_names=None, point_field_names=None, transformation_steps=None, transform_image_on_gpu=True)[source]
Bases:
PipelineStepBaseApply affine augmentations (translation, scaling, rotation, shearing) to images, and update associated geometry (points, projection matrices) consistently.
This step can process one or multiple images, as well as point sets and projection matrices. It expects image data fields and sibling image-size fields in the input (see
SampleDataGroup). Optionally, names of point-set and projection-matrix fields can be provided. Multiple instances may be present; all matching occurrences are processed. If multiple images are found, each must have a sibling size field, and the sizes must match.The same transformation is applied to all matched images. If different images require different transformations, create multiple instances of this step and apply them to different sub-trees (see
GroupToApplyToSelectedStepBase).Projection geometry represented as intrinsics and extrinsics should be handled by passing only the intrinsics matrix to this step; extrinsics are unaffected by an image-plane affine transform. Note that apart from true projection matrices, any matrices can be handled which transform points from a different coordinate system into the image coordinate system.
The affine transform conceptually moves image content within a fixed viewport. For example, a translation to the right shifts the content rightward and exposes a border on the left. Scaling does not change the viewport size (pixel resolution), so upscaling reveals only the center region, while downscaling fills only part of the viewport.
After augmentation, a resize to the requested output resolution is applied if needed. When aspect ratios differ, the adjustment is controlled by
AffineTransformer.ResizingModeandAffineTransformer.ResizingAnchor. Note that this resizing is independent of the affine transformation (where scaling leaves the viewport unchanged), and can be used to change the resolution and aspect ratio of the image.The overall transform is built as a chain of steps (see
AffineTransformer.TransformationStepand subclasses).AffineTransformer.Selectionallows probabilistic branching. Some steps that depend on alignments cannot follow incompatible steps (e.g., rotation or shearing). These constraints are validated at construction, and include incompatible steps anywhere in the chain before the step (including potentially applied probabilistic branches).All steps that require a reference point (e.g., rotation, scaling) use the viewport center.
The composed augmentation and resize are combined to a single image resampling step to minimize, which is advantageous both for quality of the final image and runtime.
- Parameters:
output_hw (
Sequence[int]) – Output resolution[height, width]. The input image is resized to this size.resizing_mode (
ResizingMode) – How to resolve aspect-ratio differences. SeeAffineTransformer.ResizingMode.resizing_anchor (
Optional[ResizingAnchor], default:None) – Anchor to use whenresizing_modeis notSTRETCH. SeeAffineTransformer.ResizingAnchor. Must beNonewhenresizing_modeisSTRETCHand set otherwise.image_field_names (
Union[str,int,List[Union[str,int]],Tuple[Union[str,int],...],None], default:None) – Names of image fields to transform (seeSampleDataGroup). Set toNoneto not process images (e.g., only projection matrices or point sets). Cannot be set ifimage_hw_field_namesis set.image_hw_field_names (
Union[str,int,List[Union[str,int]],Tuple[Union[str,int],...],None], default:None) – Names of the fields containing image size[height, width]. All listed fields must have identical values. If not, call this step separately per image (e.g., by name or by selecting a sub-tree, seeGroupToApplyToSelectedStepBase). Cannot be set ifimage_field_namesis set. One ofimage_field_namesorimage_hw_field_namesmust be provided (single source of truth for image size).projection_matrix_field_names (
Union[str,int,List[Union[str,int]],Tuple[Union[str,int],...],None], default:None) – Names of fields with projection matrices that map to pixel coordinates. These matrices are updated to project correctly in the output image. Set toNoneto skip. If projection geometry is represented by extrinsics and intrinsics, only pass the intrinsics here; extrinsics are unaffected by an image-plane affine transform. Note that apart from true projection matrices, any matrices can be handled which transform points from a different coordinate system into the image coordinate system.point_field_names (
Union[str,int,List[Union[str,int]],Tuple[Union[str,int],...],None], default:None) – Names of fields containing 2D point sets (e.g., landmarks). Points are transformed to remain consistent with the output images. Points are expected as rows; A row may contain multiple points, in which case consecutive pairs are treated as individual points and stored in the same format (e.g.[x1, y1, x2, y2]).transformation_steps (
Optional[Sequence[TransformationStep]], default:None) – Sequence of steps to perform. IfNone, only resizing to the output resolution & handling of changed aspect ratio is performed (no augmentation).transform_image_on_gpu (
bool, default:True) – Whether to transform images on the GPU. Must beTrueif images are already on GPU. Default:True.
- class TransformationStep(prob)[source]
Bases:
ABCStep used to build up the overall affine transformation to apply. Each step is processed in sequence and with a given probability.
Probabilistic branching possible by using the
AffineTransformer.Selection(also see documentation for that step).- Parameters:
prob (
float) – Probability with which this step is applied
- class Translation(prob, min_xy, max_xy=None)[source]
Bases:
TransformationStepPerform a randomized translation (in a given range).
- class ShiftInsideOriginalImage(prob, shift_x, shift_y)[source]
Bases:
TransformationStepPerform a random translation. The shift is selected so that the viewport is filled with the image.
This is only possible if the image is larger (i.e. previously scaled up) or equal to the viewport. If this is not the case, this step does nothing.
The shift is computed and performed independently for x- and y-directions. This means that if the image is larger than the viewport in one dimension and smaller in the other one (e.g. due to non-uniform scaling), this step will be performed in the dimension where the image is larger than the viewport.
Also, if the image is larger than the viewport, this step will bring back the image to cover the whole viewport if it was previously moved out of it.
This step cannot be performed if a rotation and/or shearing was potentially performed before.
- class ShiftToAlignWithOriginalImageBorder(prob, border)[source]
Bases:
TransformationStepTranslate the image so that it is aligned to a border of the viewport.
The border to align to can be selected on construction.
This step cannot be performed if a rotation and/or shearing was potentially performed before.
- Parameters:
prob (
float) – Probability to perform step.border (
ShiftToAlignWithOriginalImageBorder) – Border of the viewport to align image to.
- class Rotation(prob, min_rot, max_rot=None)[source]
Bases:
TransformationStepPerform a rotation.
- class UniformScaling(prob, min_scaling, max_scaling=None)[source]
Bases:
TransformationStepPerform uniform scaling (i.e. identical scaling factor in both x- and y-dimensions).
- class NonUniformScaling(prob, min_scaling_xy, max_scaling_xy=None)[source]
Bases:
TransformationStepPerform non-uniform scaling (i.e. scaling factors in x- and y-dimensions are independent).
- Parameters:
prob (
float) – Probability to perform step.min_scaling_xy (
Sequence[float]) – Minimum scaling factors for x- and y-dimensions. Ifmax_scaling_xyis not set, these factors are always applied instead of selecting random factors from the range.max_scaling_xy (
Optional[Sequence[float]], default:None) – Maximum scaling factors for x- and y-dimensions.
- class Shearing(prob, min_shearing_xy, max_shearing_xy=None)[source]
Bases:
TransformationStepPerform shearing.
- Parameters:
prob (
float) – Probability to perform step.min_shearing_xy (
Sequence[float]) – Minimum shearing parameters for x- and y-dimensions. Ifmax_shearing_xyis not set, these parameters are always applied instead of selecting random parameters from the range.max_shearing_xy (
Optional[Sequence[float]], default:None) – Maximum shearing parameters.
- class Selection(prob, option_probs, options)[source]
Bases:
TransformationStepProbabilistically choose one sequence of steps out of multiple alternatives and perform the steps in this sequence.
- Parameters:
prob (
float) – Probability to perform this step.option_probs (
Sequence[float]) – Probabilities for the individual options. Has to sum up to 1 as one option is always taken.options (
Sequence[Union[List[TransformationStep],Tuple[TransformationStep,...],TransformationStep]]) – The individual options. Each option is a sequence of transformation steps or a single step.
- class ResizingMode(value)[source]
Bases:
EnumResizing mode types.
The mode defines how the input viewport is adjusted to the output viewport when the output image shape has not the same aspect ratio as the input image shape.
Note that as the image may be outside the input viewport due to affine transformations, it may e.g. happen that there is still image data in the padded region of the output viewport. In this case, the image will appear in the padded region and will not be replaced by the fill value.
- STRETCH = 0
Viewport is extended to preserve aspect ratio (i.e. if there are no other transformations, the output image will be padded).
- PAD = 1
Viewport is stretched (i.e. image is non-uniformly scaled).
- CROP = 2
Viewport is cropped (i.e. if there are no other transformations, parts of the input image will be cropped away).
- class ResizingAnchor(value)[source]
Bases:
EnumResizing mode anchor.
The anchor defines which reference point in the output image is aligned to the corresponding point in the input image when adjusting the aspect ratio to match the output image using the PAD or CROP resizing mode.
Important
Note that the anchor is only relevant when changing the aspect ratio of the image. The actual transformations such as scaling, rotation, etc. are not affected by the anchor, and always use the center of the image as reference point.
- CENTER = 0
The center of the output image corresponds to the center of the input image
- TOP_OR_LEFT = 1
The top left corner of the output image corresponds to the top left corner of the input image. Depending on which direction is padded / cropped, this corresponds to either keeping the top or the left border aligned.
- BOTTOM_OR_RIGHT = 2
The bottom right corner of the output image corresponds to the bottom left corner of the input image. Depending on which direction is padded / cropped, this corresponds to either keeping the bottom or the right border aligned.
- class accvlab.dali_pipeline_framework.processing_steps.CoordinateCropper(points_fields_name, minimum_point, maximum_point)[source]
Bases:
PipelineStepBaseCrop points to a given axis-aligned box.
- Parameters:
points_fields_name (
str) – Name of the data field containing the points to crop. If multiple fields with that name are present, each is processed independently.minimum_point (
Sequence[float]) – Lower corner (min per dimension) of the crop box.maximum_point (
Sequence[float]) – Upper corner (max per dimension) of the crop box.
- class accvlab.dali_pipeline_framework.processing_steps.PaddingToUniform(field_names=None, fill_value=0.0)[source]
Bases:
PipelineStepBaseProcessing step for padding all data fields in the processed data to have the same shape across the batch.
Padding can be performed either for all data fields, or only for fields with given names.
Note
To pad all fields in a given part (sub-tree) of the input data structure, use the access modifier wrapper steps (see
GroupToApplyToSelectedStepBaseand its subclasses).- Parameters:
field_names (
Union[str,int,List[Union[str,int]],Tuple[Union[str,int],...],None], default:None) – Names of the fields to apply padding to. Can be either a single name or a list of names. All fields with those names are processed. If set toNone, padding is performed for all data fields. Default isNone. Fields can be either data fields or data field arrays.fill_value (
Union[int,float], default:0.0) – Value to insert into the padded region. Default is 0.0.
- class accvlab.dali_pipeline_framework.processing_steps.AxesLayoutSetter(names_fields_to_set, layout_to_set)[source]
Bases:
PipelineStepBaseSet the DALI axes layout string (e.g., “HWC”, “CHW”) for selected fields.
- class accvlab.dali_pipeline_framework.processing_steps.BoundingBoxToHeatmapConverter(annotation_field_name, bboxes_in_name, heatmap_out_name, heatmap_hw, image_field_name=None, image_hw_field_name=None, categories_in_name=None, num_categories=None, min_object_size=None, per_category_min_object_sizes=None, use_per_category_heatmap=True, is_valid_opt_in_name=None, center_opt_in_name=None, is_active_opt_out_name=None, center_opt_out_name=None, center_offset_opt_out_name=None, height_width_bboxes_heatmap_opt_out_name=None, bboxes_heatmap_opt_out_name=None, min_fraction_area_clipping=0.25, min_radius=0.5, max_radius=10.0, radius_scaling_factor=0.8, radius_to_sigma_factor=0.3333333333333333)[source]
Bases:
PipelineStepBaseConvert 2D object bounding box annotations into Gaussian heatmaps.
This step can process data from one or multiple cameras. It expects sibling fields in the input
SampleDataGroup: an image-size field and an annotation field containing bounding boxes (and optionally categories & bounding box centers). Multiple occurrences are supported; each is processed independently (see the constructor for details).Note
The input bounding boxes (and centers, if provided) are clipped to the image size and the corresponding output fields are corresponding to the clipped bounding boxes (scaled to the heatmap resolution).
The following fields can be added inside each processed annotation. Note that apart from the heatmap, all fields are optional and can be omitted if not needed:
heatmap: Heatmap at the specified resolution. If per-category mode is enabled, the shape is
[num_categories, H, W]; otherwise[H, W]. The data type isFLOAT.is_active: Boolean mask containing per-object flags indicating whether the object contributes to the heatmap (after clipping and threshold checks). Inactive objects were not drawn. Note that inactive objects are still contained in the other output fields.
center: Integer pixel center per object in heatmap coordinates (full-pixel location of the peak).
center_offset: Sub-pixel offset from the integer center to the true center in heatmap coordinates.
height_width_bboxes_heatmap: Per-object
[height, width]in heatmap coordinates (after clipping and scaling from image to heatmap).bboxes_heatmap: Per-object bounding box in heatmap coordinates (after clipping and scaling).
To define the size of the individual Gaussians in the heatmap, the radius of the bounding boxes is used (with additional factors for the radius and the sigma-to-radius conversion of the Gaussians). The radius of the bounding boxes is defined as the distance between the center and the nearest edge of the bounding box. If the center is outside the box, the radius is 0 (and the minimum radius as defined on construction is enforced).
- Parameters:
annotation_field_name (
Union[str,int]) – Name of the field containing annotations. Bounding-box related fields are read from here and outputs are added here.bboxes_in_name (
Union[str,int]) – Name of the field containing bounding boxes.heatmap_out_name (
Union[str,int]) – Name of the output field to write the heatmap to.heatmap_hw (
Tuple[int,int]) – Heatmap size(height, width).image_field_name (
Union[str,int,None], default:None) – Name of the field containing the image from which to extract the size. This field is expected to be a sibling field to the annotation field. Only one ofimage_field_nameorimage_hw_field_nameshould be set (single source of truth).image_hw_field_name (
Union[str,int,None], default:None) – Name of the field containing the image height and width. This field is expected to be a sibling field to the annotation field. Only one ofimage_field_nameorimage_hw_field_nameshould be set (single source of truth).categories_in_name (
Union[str,int,None], default:None) – Name of the field containing per-object categories. Required if any of the following holds:use_per_category_heatmapisTrue,per_category_min_object_sizesis notNone, ornum_categoriesis notNone. Otherwise set toNone.num_categories (
Optional[int], default:None) – Number of distinct categories. Objects withcategory >= num_categoriesare marked inactive. Set toNonewhen categories are not used.min_object_size (
Optional[Sequence[float]], default:None) – Category-independent minimum object size[height, width]to be included. Must beNonewhenper_category_min_object_sizesis notNone.per_category_min_object_sizes (
Optional[Sequence[Sequence[float]]], default:None) – Per-category minimum size[height, width]. Must beNonewhenmin_object_sizeis notNone.use_per_category_heatmap (
bool, default:True) – IfTrue, draw a separate heatmap slice per category; otherwise draw a single heatmap.is_valid_opt_in_name (
Union[str,int,None], default:None) – Optional field with per-object validity. Will be applied in addition to the internal checks to determine if an object is active. If absent, all objects are treated as valid (internal checks can still mark objects as inactive).center_opt_in_name (
Union[str,int,None], default:None) – Name of the field containing the center of the bounding boxes. The so defined center is not necessarily the center of the 2D bounding box and could e.g. be the projection of the center of the 3D bounding box onto the image plane. Optional field. If not present, the centers are assumed to be the center of the 2D bounding boxes.is_active_opt_out_name (
Union[str,int,None], default:None) – Output field name for the per-object active flag. Optional field. The corresponding field will not be added if not provided.center_opt_out_name (
Union[str,int,None], default:None) – Output field name for integer center locations in the heatmap. The sub-pixel offset is written tocenter_offset_opt_out_name. Optional field. The corresponding field will not be added if not provided.center_offset_opt_out_name (
Union[str,int,None], default:None) – Output field name for sub-pixel center offsets in heatmap coordinates. Optional field. The corresponding field will not be added if not provided.height_width_bboxes_heatmap_opt_out_name (
Union[str,int,None], default:None) – Output field for per-object[height, width]in the heatmap. Optional field. The corresponding field will not be added if not provided.bboxes_heatmap_opt_out_name (
Union[str,int,None], default:None) – Output field for per-object bounding boxes in the heatmap. Optional field. The corresponding field will not be added if not provided.min_fraction_area_clipping (
float, default:0.25) – Minimum remaining area fraction after clipping for an object to be considered active. For example, with0.25, boxes that lose more than75%of their area due to clipping are set inactive.min_radius (
float, default:0.5) – Minimum radius used when drawing Gaussians. Enforced lower bound is0.5.max_radius (
float, default:10.0) – Maximum radius used when drawing Gaussians. Larger radii are clipped to this value.radius_scaling_factor (
float, default:0.8) – Scaling factor applied to the bbox-derived radius.radius_to_sigma_factor (
float, default:0.3333333333333333) – Factor to convert radius to Gaussian sigma.
- class accvlab.dali_pipeline_framework.processing_steps.AnnotationElementConditionEval(annotation_field_name, condition, remove_data_fields_used_in_condition)[source]
Bases:
PipelineStepBaseEvaluate a declarative condition per annotation element and store the boolean result.
This step looks for data group fields (see documentation of
SampleDataGroup) corresponding to annotations, and applies the the defined conditions to data fields inside the annotation. The results are stored as a new data field inside the annotation. Both the data fields used in the condition and the resulting data fields are referenced by name in the condition string.The used fields are expected to be 1D sequences (one value per object). The condition is evaluated per element, producing a boolean sequence with one result per object.
Note
While 1D sequences are expected, the data may be formatted as 2D tensors. In this case, one dimension needs to have a size of 1.
The condition must start with a variable name, followed by an assignment operator, followed by an expression.
The expression can contain variables (will be mapped to the data fields of the annotation), literals, and operators.
- The supported operators are:
Logical operators:
or,and,notComparison operators:
==,!=,>,>=,<,<=Parentheses:
(and)Unary minus:
-; e.g.-_b1 < -10.5is valid.Assignment operator:
=
- The syntax is similar to Python. However, note that
Only the operators defined above are supported.
Direct comparisons of more than two values are not supported (e.g.
a < b < cis not supported).Only numeric literals are supported.
TrueandFalseare not supported (not needed; use negation instead of comparison toFalse).
The result of the condition is stored in a new data field, which is added to the annotation. The name of the result data field is also defined inside the condition string.
Example
- The condition can be described in a syntax similar to Python, e.g.:
is_valid = (num_lidar_points >= 1 or num_radar_points >= 1) and visibility_levels > 0 and category > 0
In this case:
The data fields
num_lidar_pointsandnum_radar_points,visibility_levels, andcategoryare expected to be children of the annotation data group field.The result of the condition is stored in a new data field inside the annotation data group field, named
is_valid.
Important
In order to use data fields inside the condition, their names must follow the rules of Python variable names (e.g. no spaces, no special characters, do not start with a digit).
See also
Specific complex conditions can be checked with
VisibleBboxSelector,PointsInRangeCheck, and the results of these checks can be combined with this step.ConditionalElementRemovalcan be used to remove elements from the data based on this condition.BoundingBoxToHeatmapConverterhas both input and output fields containing boolean masks denoting the active objects.
- Parameters:
annotation_field_name (
Union[str,int]) – Name of annotation data group field. Note that there can be more than one annotation field (e.g. one for objects visible in each camera). In this case, these annotations are all processed (independently of each other).condition (
str) – Condition to be applied. Please see the description above for more details.remove_data_fields_used_in_condition (
bool) – Whether to remove the data fields used in the condition after evaluating the condition. This is a convenience feature and can be set to True if the data fields are not used after evaluating the condition. However, note that if some of the data fields are used, it has to be set to False, as the fields are not available after this step otherwise.
- class accvlab.dali_pipeline_framework.processing_steps.BEVBBoxesTransformer3D(data_field_names_points, data_field_names_velocities, data_field_names_sizes, data_field_names_orientation, data_field_names_proj_matrices_and_extrinsics, data_field_names_ego_to_world, data_field_names_world_to_ego, rotation_range, rotation_axis, scaling_range, translation_max_abs)[source]
Bases:
PipelineStepBaseAugment BEV bounding boxes (and related geometry) with rotation, scaling, and translation.
The augmentation is applied in world coordinates. Related sensor geometry (e.g., extrinsics) is defined in ego coordinates and is updated accordingly using provided ego<->world transforms.
- The individual augmentation steps are applied in the following order:
Rotation
Scaling
Translation
- Parameters:
data_field_names_points (
Union[str,int,Sequence[Union[str,int]],None]) – Name or names of data fields in the inputSampleDataGroupinstance containing the points representing the bounding box center (in[x, y, z]format). Optional; will be updated if provided.data_field_names_velocities (
Union[str,int,Sequence[Union[str,int]],None]) – Name or names of data fields in the inputSampleDataGroupinstance containing the velocities of the objects (bounding boxes) (in[vx, vy, vz]format). Optional; will be updated if provided.data_field_names_sizes (
Union[str,int,Sequence[Union[str,int]],None]) – Name or names of data fields in the inputSampleDataGroupinstance containing the sizes of the bounding boxes (in[x, y, z]format). Optional; will be updated if provided.data_field_names_orientation (
Union[str,int,Sequence[Union[str,int]],None]) – Name or names of data fields in the inputSampleDataGroupinstance containing the orientations of the bounding boxes (in radians). Optional; will be updated if provided.data_field_names_proj_matrices_and_extrinsics (
Union[str,int,Sequence[Union[str,int]],None]) – Name or names of data fields in the inputSampleDataGroupinstance containing projection matrices and/or extrinsics. Note that camera intrinsics don’t need to be adjusted and must not be included in this list. Optional; will be updated if provided.data_field_names_ego_to_world (
Union[str,int,Sequence[Union[str,int]],None]) – Name or names of data fields in the inputSampleDataGroupinstance containing matrices representing a transformation (e.g. for points) from ego to world coordinates. Optional; will be updated if provided.data_field_names_world_to_ego (
Union[str,int,Sequence[Union[str,int]],None]) – Name or names of data fields in the inputSampleDataGroupinstance containing matrices representing a transformation (e.g. for points) from world to ego coordinates. Optional; will be updated if provided.rotation_range (
Optional[Tuple[float,float]]) – Rotation range for the randomized rotation in the augmentation transformation. Optional; if not provided, no rotation is applied.rotation_axis (
Optional[int]) – Axis of rotation (0indicatingx,1indicatingy, and2indicatingz). Must be provided ifrotation_rangeis provided.scaling_range (
Optional[Tuple[float,float]]) – Scaling range for the augmentation transformation. Optional; if not provided, no scaling is applied.translation_max_abs (
Optional[Tuple[float,float]]) – Maximum absolute translation range in all dimensions. Optional; if not provided, no translation is applied.
- class accvlab.dali_pipeline_framework.processing_steps.VisibleBboxSelector(bboxes_field_name, resulting_mask_field_path, image_field_name=None, image_hw_field_name=None, image_hw=None, check_for_bbox_occlusion=True, check_for_minimum_size=True, depths_field_name=None, minimum_bbox_size=None)[source]
Bases:
PipelineStepBaseSelect visible 2D bounding boxes.
A box is considered visible if it is not completely overlapped by nearer boxes (occlusion test) and/or if it meets a minimum size threshold. The result is written as a boolean mask to the configured output path. Both checks are optional and can be enabled or disabled independently.
A mask is added which indicates which boxes are visible. The original bounding boxes are not modified.
See also
AnnotationElementConditionEvalcan be used to combine the results of this step with other conditions.ConditionalElementRemovalcan be used to remove elements from the data based on this condition or a combination of this condition with other conditions.
Note that the step expects exactly one data field in the input
SampleDataGroupinstance to contain the bounding boxes (as well as only one field containing the depths). If multiple sets of bounding boxes are present in the data, this processing steps has to be applied to parts (sub-trees) of the input data individually so that each part contains only one set of bounding boxes, and access modifier wrapper steps need to be used (see classGroupToApplyToSelectedStepBaseand its subclasses).- Parameters:
bboxes_field_name (
Union[str,int]) – Name of data field in the inputSampleDataGroupinstance containing the bounding boxes. Each row is expected to contain a bounding box in the format:[min_x, min_y, max_x, max_y]. The input data must contain exactly one field with this name.resulting_mask_field_path (
Union[str,int,Tuple[Tuple[str,int],...]]) – Path of the data field to store the result as. The path is relative to the root element. Note that if this step is wrapped by a sub-tree selection step, the root of the selected sub-tree acts as the root.image_field_name (
Union[str,int,None], default:None) – Name of field containing the image from which to extract the size. Only one ofimage_field_name,image_hw_field_name, orimage_hwshould be set (single source of truth).image_hw_field_name (
Union[str,int,None], default:None) – Name of field containing the image size for which the bounding boxes are defined. Only one ofimage_field_name,image_hw_field_name, orimage_hwshould be set (single source of truth).image_hw (
Optional[Sequence[int]], default:None) – Image size[height, width]for the image for which the bounding boxes are defined. Only one ofimage_field_name,image_hw_field_name, orimage_hwshould be set (single source of truth).check_for_bbox_occlusion (
bool, default:True) – Whether to consider boxes invisible if completely overlapped by nearer boxes.check_for_minimum_size (
bool, default:True) – Whether to consider boxes invisible if below a minimum size.depths_field_name (
Union[str,int,None], default:None) – Name of the data field containing the bounding box depth. Needs to be set ifcheck_for_bbox_occlusionis set toTrue. The input data must contain exactly one field with this name.minimum_bbox_size (
Optional[float], default:None) – Minimum size of a bounding box to be visible. Needs to be set ifcheck_for_minimum_sizeis set toTrue.
- class accvlab.dali_pipeline_framework.processing_steps.PointsInRangeCheck(points_fields_name, is_inside_field_name, minimum_point, maximum_point)[source]
Bases:
PipelineStepBaseCheck whether points lie within a given axis-aligned box and add a boolean mask.
See also
AnnotationElementConditionEvalcan be used to combine the results of this step with other conditions.ConditionalElementRemovalcan be used to remove elements from the data based on this condition or a combination of this condition with other conditions.
- Parameters:
points_fields_name (
str) – Name of the data field containing the points to check. If multiple fields with that name are present, each is processed independently.is_inside_field_name (
str) – Name of the sibling data field to store the boolean mask in. Must not already exist.minimum_point (
Sequence[float]) – Lower corner (min per dimension) of the region.maximum_point (
Sequence[float]) – Upper corner (max per dimension) of the region.
- class accvlab.dali_pipeline_framework.processing_steps.ConditionalElementRemover(annotation_field_name, mask_field_name, field_names_to_process, field_dims_to_process, fields_to_process_num_dims, remove_mask_field)[source]
Bases:
PipelineStepBaseRemove elements from arrays (e.g., per‑object data) based on a boolean mask.
Arrays are stored as (multi-dimensional) tensors; for each array a dimension index indicates the element axis (the axis along which the elements to be removed/retained are enumerated). Elements with mask value
Falseare removed along the configured dimension for each target field.See also
Multiple classes are available which evaluate conditions of some kind and store the results as boolean masks. These masks can be used in this class:
- Parameters:
annotation_field_name (
Union[str,int]) – Name of the annotation data group field to process. Each annotation field is processed independently.mask_field_name (
Union[str,int]) – Name of the boolean mask indicating which elements to keep (True) or remove (False). Must be a child of each annotation field.field_names_to_process (
Sequence[Union[str,int]]) – Names of fields to process. The fields must be present in each annotation field.field_dims_to_process (
Sequence[int]) – For each field name, the dimension index along which elements are to be removed.fields_to_process_num_dims (
Sequence[int]) – For each field name, the number of dimensions in the tensor.remove_mask_field (
bool) – Whether to remove the mask field after applying this step.
- class accvlab.dali_pipeline_framework.processing_steps.UnneededFieldRemover(unneeded_field_names)[source]
Bases:
PipelineStepBaseProcessing step for removing unneeded fields from the data.
This step does not add any processing steps to the DALI graph, i.e. it is fully performed on DALI graph construction time and does not have any overhead at runtime. This means that this step can be used inside the pipeline multiple times to ensure a clean data structure without any performance penalty (apart from the overhead at graph construction time).
Note
For pipelines which use data which is not needed in the final output (e.g. intermediate results, image size on the CPU, etc.), it is advisable to perform this step at least once, directly before outputting the data, in order to avoid unneeded copies & clutter in the final output.