Processing Steps Submodule

This module contains the classes which represent individual processing steps as well as the respective base classes, which can be used to implement custom processing steps (see PipelineStepBase) as well as access modifier wrapper steps (see GroupToApplyToSelectedStepBase).

The individual processing steps are the building blocks of the pipeline, which is defined by a sequence of processing steps (in addition to the input callable/iterable, see the inputs sub-module).

class accvlab.dali_pipeline_framework.processing_steps.PipelineStepBase[source]

Bases: ABC

Base class for pipeline processing steps.

Pipeline processing steps are the building blocks of the pipeline and represent individual operations applied to input data in sequence to produce outputs.

Provides the common interface and common functionality shared by all processing steps:

Consistent & Independent Data Processing

Many of the included processing steps can be configured to operate on more than one field in the input SampleDataGroup object. For some steps (e.g. those which apply random transformations), the question arises whether these steps should apply consistent processing across all fields they process (e.g. same augmentation transformation for all images), or if the processing should happen independently for different fields (e.g. different transformations for different images). The answer to this question depends on the use-case.

By default, the processing steps are designed to apply consistent processing. For example, AffineTransformer applies the same spatial transform to all processed images, as well as corresponding fields such as point sets defined on the image or projection matrices. This ensures that:

  • Consistent randomization is possible if needed (e.g., between an image, a corresponding segmentation mask, projection matrix, and points defined on the image).

  • No correspondences between multiple fields need to be explicitly maintained. For example, if multiple images and projection matrices are present, there is no need to know which projection matrix corresponds to which image, as the same transformation is applied to all of them. This is useful when processing multiple fields which are related to one another.

To ensure that independent processing (e.g. different randomizations) can be applied to different parts of the data (e.g., different randomizations for data from different cameras), sub-classes of GroupToApplyToSelectedStepBase can be used to select one or more parts (sub-trees) of the input data to process independently of each other. The selection of the sub-trees also allows to establish field correspondences (e.g., process the image and projection matrix from one camera consistently) in a natural way, i.e. by grouping all related fields in one sub-tree (e.g. one sub-tree per camera).

The available wrappers include DataGroupInPathAppliedStep, DataGroupsWithNameAppliedStep, DataGroupArrayInPathElementsAppliedStep, and DataGroupArrayWithNameElementsAppliedStep. Please see the documentation of these classes for more details. If necessary, new wrappers can be added by subclassing GroupToApplyToSelectedStepBase.

Having both options (e.g. consistent or different randomizations for different parts of the data) available, as well as the ability to group related data (e.g. all images and projection matrices for one camera) allows for flexible pipeline design which can be tailored to the specific use-case by configuration.

abstract _check_and_adjust_data_format_input_to_output(data_empty)[source]

Check the input data format for compatibility and return the output data format (blueprint).

If the input data format is incompatible, raise an exception describing the problem.

Please see check_input_data_format_and_set_output_data_format() for a description of typical checks and format changes that need to be performed here.

This method may or may not modify data_empty directly, but in any case has to return an object representing the modified format (i.e., either the modified data_empty or a new object).

Note

Parameters:

data_empty (SampleDataGroup) – Input data format (blueprint)

Returns:

SampleDataGroup – Resulting data format (blueprint)

abstract _process(data)[source]

Apply the processing step to the input, or to a selected sub-tree when wrapped accordingly.

Individual processing steps need to override this method and implement the actual functionality.

The method may mutate the input data; callers must not rely on the input remaining unchanged or corresponding to the output after the call.

Note

  • Override this method in each (non-abstract) derived class to define the actual functionality.

  • This method is called by __call__() and should not be called directly.

Parameters:

data (SampleDataGroup) – Data to be processed by the step.

Returns:

SampleDataGroup – Resulting processed data.

__call__(data)[source]

Apply the processing step and validate its output format.

Important

To define the actual functionality of a processing step, override _process(), not this method.

Parameters:

data (SampleDataGroup) – Input data to process.

Returns:

SampleDataGroup – Processed output data.

check_input_data_format_and_set_output_data_format(data_empty)[source]

Check the input data format for compatibility and return the output data format (blueprint).

Compatibility typically means that expected data fields are present and types are compatible, and that the output data fields can be added (are not already present). Typical changes to the data format include additions/removals of fields or changes to data types (e.g., an image may change from types.DALIDataType.UINT8 to types.DALIDataType.FLOAT in a normalization step).

This method does not modify data_empty in place; it returns a new SampleDataGroup describing the modified format.

If the input data format is incompatible, an exception is raised.

Important

To define the actual functionality of the check, override _check_and_adjust_data_format_input_to_output(), not this method.

Parameters:

data_empty (SampleDataGroup) – Input data format (blueprint),

Returns:

SampleDataGroup – Resulting data format (blueprint).

class accvlab.dali_pipeline_framework.processing_steps.GroupToApplyToSelectedStepBase(processing_step_to_apply)[source]

Bases: PipelineStepBase

Base class for wrappers that apply a contained processing step to selected parts (sub-trees) of the input.

The wrapper forwards only the selected parts (sub-tree(s)) to the contained step, which then operates as if the sub-tree were the full input. If multiple sub-trees are selected (e.g. each sub-tree corresponding to data of one step in time out of a sequence), the contained step is called multiple times, executing independently for each sub-tree. If joint processing is required, design the contained step to consume the full tree (or a larger sub-tree) instead of using a wrapper.

Parameters:

processing_step_to_apply – Processing step to apply to the selected sub-trees.

Important

Ensure that the constructor of this class is called by any derived class.

abstract _check_and_get_paths_to_apply_to(data)[source]

Check input and return paths to all sub-trees to process.

Requirements on the input include that at least one sub-path is found and that paths match the expected type (e.g., array data group fields when iterating over elements). See SampleDataGroup for what constitutes an array and how to check whether a field is an array.

If the requirements are not satisfied, an error shall be raised.

Note

Override this method in each (non-abstract) derived class to define the actual selection of sub-trees to process. Note that this is the only method which needs to be overridden, and is used by the other methods of this class, which perform the actual processing.

class accvlab.dali_pipeline_framework.processing_steps.DataGroupInPathAppliedStep(processing_step_to_apply, path_to_apply_to)[source]

Bases: GroupToApplyToSelectedStepBase

Apply a contained processing step to the sub-tree rooted at a given path.

Parameters:
  • processing_step_to_apply – The contained processing step

  • path_to_apply_to – Path to the root of the sub-tree to apply processing_step_to_apply to

class accvlab.dali_pipeline_framework.processing_steps.DataGroupsWithNameAppliedStep(processing_step_to_apply, names_of_groups_to_apply_to, check_minimum_one_name_match=True)[source]

Bases: GroupToApplyToSelectedStepBase

Apply a contained processing step to all sub-trees whose root is a data group field with a given name.

The name is defined at construction; all matching data group fields are located and the contained step is applied to each corresponding sub-tree.

Parameters:
  • processing_step_to_apply – Contained processing step to apply.

  • names_of_groups_to_apply_to – Name or list of names of data group fields to select as sub-tree roots.

  • check_minimum_one_name_match (default: True) – If True, require that at least one field is found for each provided name, and an error is raised otherwise when checking the input.

class accvlab.dali_pipeline_framework.processing_steps.DataGroupArrayInPathElementsAppliedStep(processing_step_to_apply, path_to_array_to_apply_to)[source]

Bases: DataGroupInPathAppliedStep

Apply a contained processing step independently to each element of an array data group field at a path.

The path of the array data group field is defined at construction. Each element of that array is processed independently by the contained step.

Parameters:
  • processing_step_to_apply – Contained processing step to apply.

  • path_to_array_to_apply_to – Path to the array data group field whose children should be processed.

class accvlab.dali_pipeline_framework.processing_steps.DataGroupArrayWithNameElementsAppliedStep(processing_step_to_apply, name_of_arrays_to_apply_to, check_minimum_one_name_match=True)[source]

Bases: DataGroupsWithNameAppliedStep

Apply a contained processing step independently to each element of all array data group fields with a given name.

The name is defined at construction. All fields with that name must be array data group fields (see SampleDataGroup). Each element of each found array is processed independently by the contained step.

Parameters:
  • processing_step_to_apply (PipelineStepBase) – Contained processing step to apply.

  • name_of_arrays_to_apply_to (Union[str, int]) – Name of the array data group fields whose elements should be processed.

  • check_minimum_one_name_match (default: True) – If True, require that at least one array is found; otherwise an error is raised when checking the input.

class accvlab.dali_pipeline_framework.processing_steps.ImageDecoder(image_name, use_device_mixed, hw_decoder_load=0.65, as_bgr=False)[source]

Bases: PipelineStepBase

Decode images.

Behavior:
  • Finds all images by name, decodes them (to RGB or BGR), and replaces the encoded image data by the decoded version in place.

  • Image search happens at DALI graph construction time; only the actual decoding operator is part of the DALI graph. This means that the runtime performance is not affected by the search for images.

Parameters:
  • image_name (str) – Name of the image data field(s) to decode

  • use_device_mixed (bool) – If True, decoding will be partially performed on the GPU and the resulting images will be located in GPU memory. If False, only the CPU is used.

  • hw_decoder_load (float, default: 0.65) – In case of use_device_mixed==True, this parameter sets the fraction of the workload to be performed by decoding hardware (as opposed to software CUDA kernels).

  • as_bgr (bool, default: False) – Whether to output BGR images (instead of RGB images).

class accvlab.dali_pipeline_framework.processing_steps.ImageToTileSizePadder(image_name, tile_size_to_pad_to)[source]

Bases: PipelineStepBase

Pad images so height and width are multiples of a given tile size.

The image is padded with zeros and the image size field is updated to the padded size.

Parameters:
  • image_name (Union[str, int]) – Name of the image data fields to pad.

  • tile_size_to_pad_to (Union[int, Sequence[int]]) – Tile size to be used. This means that the size of the padded image will be a multiple of the tile size (in each dimension).

class accvlab.dali_pipeline_framework.processing_steps.ImageRange01Normalizer(image_name)[source]

Bases: PipelineStepBase

Convert RGB or BGR image from UINT8 to FLOAT and scale to [0.0, 1.0].

Each matching image is cast to types.DALIDataType.FLOAT and divided by 255.0 per channel.

Parameters:

image_name (Union[str, int]) – Name of the image data field(s) to normalize.

class accvlab.dali_pipeline_framework.processing_steps.ImageMeanStdDevNormalizer(image_name, mean, std_dev, output_type=<DALIDataType.FLOAT: 9>)[source]

Bases: PipelineStepBase

Normalize RGB or BGR images by mean and standard deviation, using pre-defined mean & standard deviation values.

Normalization subtracts the mean and divides by the standard deviation per channel over spatial axes. Scalars broadcast to all channels; For 3‑vectors, each element corresponds to a channel; No distinction between RGB and BGR is made. This means that the mean and standard deviation values need to be provided for the channels in the order corresponding to the image format.

Note

The mean and standard deviation values need to be provided on construction. They are not computed from the images at runtime.

Parameters:
  • image_name (Union[str, int]) – Name of the image data fields to normalize.

  • mean (Union[Sequence[float], float]) – Mean value used as basis for the normalization. Can be a single value (applied to all color channels) or a vector, containing the values for all channels.

  • std_dev (Union[Sequence[float], float]) – Standard deviation used as basis for the normalization. Can be a single value (applied to all color channels) or a vector, containing the values for all channels.

  • output_type (DALIDataType, default: <DALIDataType.FLOAT: 9>) – Data type for the output image. Default value is types.DALIDataType.FLOAT (i.e. 32-bit float).

class accvlab.dali_pipeline_framework.processing_steps.PhotoMetricDistorter(image_name, min_max_brightness, min_max_hue, min_max_contrast, min_max_saturation, prob_brightness_aug=0.5, prob_hue_aug=0.5, prob_contrast_aug=0.5, prob_saturation_aug=0.5, prob_swap_channels=0.5, is_bgr=False, enforce_process_on_gpu=True)[source]

Bases: PipelineStepBase

Apply photometric augmentations to images (brightness, contrast, saturation, hue, channel swap).

The same random decision & parametrization for each augmentation is shared across all matched images to keep consistency (e.g., across multi-view inputs).

Parameters:
  • image_name (Union[str, int]) – Name of the image data fields to augment.

  • min_max_brightness (Sequence[float]) – Minimum and maximum biases to apply to the brightness. Note that as the image may be in different ranges ([0; 1] for float images, [0; 255] for uint8 images), the values provided here are expected to be in the corresponding range.

  • min_max_hue (Sequence[float]) – Minimum and maximum change in hue (degrees).

  • min_max_contrast (Sequence[float]) – Minimum and maximum contrast factor (multiplicative).

  • min_max_saturation (Sequence[float]) – Minimum and maximum saturation factor (multiplicative in HSV space).

  • prob_brightness_aug (float, default: 0.5) – Probability to apply brightness augmentation. Default value is 0.5.

  • prob_hue_aug (float, default: 0.5) – Probability to apply hue change augmentation. Default value is 0.5.

  • prob_contrast_aug (float, default: 0.5) – Probability to apply contrast augmentation. Default value is 0.5.

  • prob_saturation_aug (float, default: 0.5) – Probability to apply saturation augmentation. Default value is 0.5.

  • prob_swap_channels (float, default: 0.5) – Probability to randomly permute color channels.

  • is_bgr (bool, default: False) – Whether the image is in BGR format (RGB otherwise).

  • enforce_process_on_gpu (bool, default: True) – Whether to enforce the augmentation to happen on the GPU, even if the input image is stored on the CPU. Default value is True.

class accvlab.dali_pipeline_framework.processing_steps.AffineTransformer(output_hw, resizing_mode, resizing_anchor=None, image_field_names=None, image_hw_field_names=None, projection_matrix_field_names=None, point_field_names=None, transformation_steps=None, transform_image_on_gpu=True)[source]

Bases: PipelineStepBase

Apply affine augmentations (translation, scaling, rotation, shearing) to images, and update associated geometry (points, projection matrices) consistently.

This step can process one or multiple images, as well as point sets and projection matrices. It expects image data fields and sibling image-size fields in the input (see SampleDataGroup). Optionally, names of point-set and projection-matrix fields can be provided. Multiple instances may be present; all matching occurrences are processed. If multiple images are found, each must have a sibling size field, and the sizes must match.

The same transformation is applied to all matched images. If different images require different transformations, create multiple instances of this step and apply them to different sub-trees (see GroupToApplyToSelectedStepBase).

Projection geometry represented as intrinsics and extrinsics should be handled by passing only the intrinsics matrix to this step; extrinsics are unaffected by an image-plane affine transform. Note that apart from true projection matrices, any matrices can be handled which transform points from a different coordinate system into the image coordinate system.

The affine transform conceptually moves image content within a fixed viewport. For example, a translation to the right shifts the content rightward and exposes a border on the left. Scaling does not change the viewport size (pixel resolution), so upscaling reveals only the center region, while downscaling fills only part of the viewport.

After augmentation, a resize to the requested output resolution is applied if needed. When aspect ratios differ, the adjustment is controlled by AffineTransformer.ResizingMode and AffineTransformer.ResizingAnchor. Note that this resizing is independent of the affine transformation (where scaling leaves the viewport unchanged), and can be used to change the resolution and aspect ratio of the image.

The overall transform is built as a chain of steps (see AffineTransformer.TransformationStep and subclasses). AffineTransformer.Selection allows probabilistic branching. Some steps that depend on alignments cannot follow incompatible steps (e.g., rotation or shearing). These constraints are validated at construction, and include incompatible steps anywhere in the chain before the step (including potentially applied probabilistic branches).

All steps that require a reference point (e.g., rotation, scaling) use the viewport center.

The composed augmentation and resize are combined to a single image resampling step to minimize, which is advantageous both for quality of the final image and runtime.

Parameters:
  • output_hw (Sequence[int]) – Output resolution [height, width]. The input image is resized to this size.

  • resizing_mode (ResizingMode) – How to resolve aspect-ratio differences. See AffineTransformer.ResizingMode.

  • resizing_anchor (Optional[ResizingAnchor], default: None) – Anchor to use when resizing_mode is not STRETCH. See AffineTransformer.ResizingAnchor. Must be None when resizing_mode is STRETCH and set otherwise.

  • image_field_names (Union[str, int, List[Union[str, int]], Tuple[Union[str, int], ...], None], default: None) – Names of image fields to transform (see SampleDataGroup). Set to None to not process images (e.g., only projection matrices or point sets). Cannot be set if image_hw_field_names is set.

  • image_hw_field_names (Union[str, int, List[Union[str, int]], Tuple[Union[str, int], ...], None], default: None) – Names of the fields containing image size [height, width]. All listed fields must have identical values. If not, call this step separately per image (e.g., by name or by selecting a sub-tree, see GroupToApplyToSelectedStepBase). Cannot be set if image_field_names is set. One of image_field_names or image_hw_field_names must be provided (single source of truth for image size).

  • projection_matrix_field_names (Union[str, int, List[Union[str, int]], Tuple[Union[str, int], ...], None], default: None) – Names of fields with projection matrices that map to pixel coordinates. These matrices are updated to project correctly in the output image. Set to None to skip. If projection geometry is represented by extrinsics and intrinsics, only pass the intrinsics here; extrinsics are unaffected by an image-plane affine transform. Note that apart from true projection matrices, any matrices can be handled which transform points from a different coordinate system into the image coordinate system.

  • point_field_names (Union[str, int, List[Union[str, int]], Tuple[Union[str, int], ...], None], default: None) – Names of fields containing 2D point sets (e.g., landmarks). Points are transformed to remain consistent with the output images. Points are expected as rows; A row may contain multiple points, in which case consecutive pairs are treated as individual points and stored in the same format (e.g. [x1, y1, x2, y2]).

  • transformation_steps (Optional[Sequence[TransformationStep]], default: None) – Sequence of steps to perform. If None, only resizing to the output resolution & handling of changed aspect ratio is performed (no augmentation).

  • transform_image_on_gpu (bool, default: True) – Whether to transform images on the GPU. Must be True if images are already on GPU. Default: True.

class TransformationStep(prob)[source]

Bases: ABC

Step used to build up the overall affine transformation to apply. Each step is processed in sequence and with a given probability.

Probabilistic branching possible by using the AffineTransformer.Selection (also see documentation for that step).

Parameters:

prob (float) – Probability with which this step is applied

class Translation(prob, min_xy, max_xy=None)[source]

Bases: TransformationStep

Perform a randomized translation (in a given range).

Parameters:
  • prob (float) – Probability to apply step.

  • min_xy (Sequence[float]) – Minimum shift in x and y. If max_xy is not set, a shift of exactly min_xy is performed instead of selecting at random from a range.

  • max_xy (Optional[Sequence[float]], default: None) – Maximum shift in x and y.

class ShiftInsideOriginalImage(prob, shift_x, shift_y)[source]

Bases: TransformationStep

Perform a random translation. The shift is selected so that the viewport is filled with the image.

This is only possible if the image is larger (i.e. previously scaled up) or equal to the viewport. If this is not the case, this step does nothing.

The shift is computed and performed independently for x- and y-directions. This means that if the image is larger than the viewport in one dimension and smaller in the other one (e.g. due to non-uniform scaling), this step will be performed in the dimension where the image is larger than the viewport.

Also, if the image is larger than the viewport, this step will bring back the image to cover the whole viewport if it was previously moved out of it.

This step cannot be performed if a rotation and/or shearing was potentially performed before.

Parameters:
  • prob (float) – Probability to apply step.

  • shift_x (bool) – Whether to apply in x-direction.

  • shift_y (bool) – Whether to apply in y-direction.

  • prob – Probability to apply step.

  • shift_x – Whether to apply in x-direction.

  • shift_y – Whether to apply in y-direction.

class ShiftToAlignWithOriginalImageBorder(prob, border)[source]

Bases: TransformationStep

Translate the image so that it is aligned to a border of the viewport.

The border to align to can be selected on construction.

This step cannot be performed if a rotation and/or shearing was potentially performed before.

Parameters:
class Border(value)[source]

Bases: Enum

Enumeration for viewport borders to align to

TOP = 0
LEFT = 1
BOTTOM = 2
RIGHT = 3
class Rotation(prob, min_rot, max_rot=None)[source]

Bases: TransformationStep

Perform a rotation.

Parameters:
  • prob (float) – Probability to perform step.

  • min_rot (float) – Minimum rotation to perform. If max_rot is not set, this rotation is performed instead of selecting a rotation value randomly from the range.

  • max_rot (Optional[float], default: None) – Maximum rotation to perform.

class UniformScaling(prob, min_scaling, max_scaling=None)[source]

Bases: TransformationStep

Perform uniform scaling (i.e. identical scaling factor in both x- and y-dimensions).

Parameters:
  • prob (float) – Probability to perform step.

  • min_scaling (float) – Minimum scaling factor. If max_scaling is not set, this factor is always applied instead of selecting a random factor from the range.

  • max_scaling (Optional[float], default: None) – Maximum scaling factor.

class NonUniformScaling(prob, min_scaling_xy, max_scaling_xy=None)[source]

Bases: TransformationStep

Perform non-uniform scaling (i.e. scaling factors in x- and y-dimensions are independent).

Parameters:
  • prob (float) – Probability to perform step.

  • min_scaling_xy (Sequence[float]) – Minimum scaling factors for x- and y-dimensions. If max_scaling_xy is not set, these factors are always applied instead of selecting random factors from the range.

  • max_scaling_xy (Optional[Sequence[float]], default: None) – Maximum scaling factors for x- and y-dimensions.

class Shearing(prob, min_shearing_xy, max_shearing_xy=None)[source]

Bases: TransformationStep

Perform shearing.

Parameters:
  • prob (float) – Probability to perform step.

  • min_shearing_xy (Sequence[float]) – Minimum shearing parameters for x- and y-dimensions. If max_shearing_xy is not set, these parameters are always applied instead of selecting random parameters from the range.

  • max_shearing_xy (Optional[Sequence[float]], default: None) – Maximum shearing parameters.

class Selection(prob, option_probs, options)[source]

Bases: TransformationStep

Probabilistically choose one sequence of steps out of multiple alternatives and perform the steps in this sequence.

Parameters:
class ResizingMode(value)[source]

Bases: Enum

Resizing mode types.

The mode defines how the input viewport is adjusted to the output viewport when the output image shape has not the same aspect ratio as the input image shape.

Note that as the image may be outside the input viewport due to affine transformations, it may e.g. happen that there is still image data in the padded region of the output viewport. In this case, the image will appear in the padded region and will not be replaced by the fill value.

STRETCH = 0

Viewport is extended to preserve aspect ratio (i.e. if there are no other transformations, the output image will be padded).

PAD = 1

Viewport is stretched (i.e. image is non-uniformly scaled).

CROP = 2

Viewport is cropped (i.e. if there are no other transformations, parts of the input image will be cropped away).

class ResizingAnchor(value)[source]

Bases: Enum

Resizing mode anchor.

The anchor defines which reference point in the output image is aligned to the corresponding point in the input image when adjusting the aspect ratio to match the output image using the PAD or CROP resizing mode.

Important

Note that the anchor is only relevant when changing the aspect ratio of the image. The actual transformations such as scaling, rotation, etc. are not affected by the anchor, and always use the center of the image as reference point.

CENTER = 0

The center of the output image corresponds to the center of the input image

TOP_OR_LEFT = 1

The top left corner of the output image corresponds to the top left corner of the input image. Depending on which direction is padded / cropped, this corresponds to either keeping the top or the left border aligned.

BOTTOM_OR_RIGHT = 2

The bottom right corner of the output image corresponds to the bottom left corner of the input image. Depending on which direction is padded / cropped, this corresponds to either keeping the bottom or the right border aligned.

class accvlab.dali_pipeline_framework.processing_steps.CoordinateCropper(points_fields_name, minimum_point, maximum_point)[source]

Bases: PipelineStepBase

Crop points to a given axis-aligned box.

Parameters:
  • points_fields_name (str) – Name of the data field containing the points to crop. If multiple fields with that name are present, each is processed independently.

  • minimum_point (Sequence[float]) – Lower corner (min per dimension) of the crop box.

  • maximum_point (Sequence[float]) – Upper corner (max per dimension) of the crop box.

class accvlab.dali_pipeline_framework.processing_steps.PaddingToUniform(field_names=None, fill_value=0.0)[source]

Bases: PipelineStepBase

Processing step for padding all data fields in the processed data to have the same shape across the batch.

Padding can be performed either for all data fields, or only for fields with given names.

Note

To pad all fields in a given part (sub-tree) of the input data structure, use the access modifier wrapper steps (see GroupToApplyToSelectedStepBase and its subclasses).

Parameters:
  • field_names (Union[str, int, List[Union[str, int]], Tuple[Union[str, int], ...], None], default: None) – Names of the fields to apply padding to. Can be either a single name or a list of names. All fields with those names are processed. If set to None, padding is performed for all data fields. Default is None. Fields can be either data fields or data field arrays.

  • fill_value (Union[int, float], default: 0.0) – Value to insert into the padded region. Default is 0.0.

class accvlab.dali_pipeline_framework.processing_steps.AxesLayoutSetter(names_fields_to_set, layout_to_set)[source]

Bases: PipelineStepBase

Set the DALI axes layout string (e.g., “HWC”, “CHW”) for selected fields.

Parameters:
  • names_fields_to_set (Union[str, int, Sequence[Union[str, int]]]) – Name or list of names of fields for which the layout should be set. All matching fields are processed.

  • layout_to_set (str) – DALI axes layout string (e.g., “HWC”, “CHW”)

class accvlab.dali_pipeline_framework.processing_steps.BoundingBoxToHeatmapConverter(annotation_field_name, bboxes_in_name, heatmap_out_name, heatmap_hw, image_field_name=None, image_hw_field_name=None, categories_in_name=None, num_categories=None, min_object_size=None, per_category_min_object_sizes=None, use_per_category_heatmap=True, is_valid_opt_in_name=None, center_opt_in_name=None, is_active_opt_out_name=None, center_opt_out_name=None, center_offset_opt_out_name=None, height_width_bboxes_heatmap_opt_out_name=None, bboxes_heatmap_opt_out_name=None, min_fraction_area_clipping=0.25, min_radius=0.5, max_radius=10.0, radius_scaling_factor=0.8, radius_to_sigma_factor=0.3333333333333333)[source]

Bases: PipelineStepBase

Convert 2D object bounding box annotations into Gaussian heatmaps.

This step can process data from one or multiple cameras. It expects sibling fields in the input SampleDataGroup: an image-size field and an annotation field containing bounding boxes (and optionally categories & bounding box centers). Multiple occurrences are supported; each is processed independently (see the constructor for details).

Note

The input bounding boxes (and centers, if provided) are clipped to the image size and the corresponding output fields are corresponding to the clipped bounding boxes (scaled to the heatmap resolution).

The following fields can be added inside each processed annotation. Note that apart from the heatmap, all fields are optional and can be omitted if not needed:

  • heatmap: Heatmap at the specified resolution. If per-category mode is enabled, the shape is [num_categories, H, W]; otherwise [H, W]. The data type is FLOAT.

  • is_active: Boolean mask containing per-object flags indicating whether the object contributes to the heatmap (after clipping and threshold checks). Inactive objects were not drawn. Note that inactive objects are still contained in the other output fields.

  • center: Integer pixel center per object in heatmap coordinates (full-pixel location of the peak).

  • center_offset: Sub-pixel offset from the integer center to the true center in heatmap coordinates.

  • height_width_bboxes_heatmap: Per-object [height, width] in heatmap coordinates (after clipping and scaling from image to heatmap).

  • bboxes_heatmap: Per-object bounding box in heatmap coordinates (after clipping and scaling).

To define the size of the individual Gaussians in the heatmap, the radius of the bounding boxes is used (with additional factors for the radius and the sigma-to-radius conversion of the Gaussians). The radius of the bounding boxes is defined as the distance between the center and the nearest edge of the bounding box. If the center is outside the box, the radius is 0 (and the minimum radius as defined on construction is enforced).

Parameters:
  • annotation_field_name (Union[str, int]) – Name of the field containing annotations. Bounding-box related fields are read from here and outputs are added here.

  • bboxes_in_name (Union[str, int]) – Name of the field containing bounding boxes.

  • heatmap_out_name (Union[str, int]) – Name of the output field to write the heatmap to.

  • heatmap_hw (Tuple[int, int]) – Heatmap size (height, width).

  • image_field_name (Union[str, int, None], default: None) – Name of the field containing the image from which to extract the size. This field is expected to be a sibling field to the annotation field. Only one of image_field_name or image_hw_field_name should be set (single source of truth).

  • image_hw_field_name (Union[str, int, None], default: None) – Name of the field containing the image height and width. This field is expected to be a sibling field to the annotation field. Only one of image_field_name or image_hw_field_name should be set (single source of truth).

  • categories_in_name (Union[str, int, None], default: None) – Name of the field containing per-object categories. Required if any of the following holds: use_per_category_heatmap is True, per_category_min_object_sizes is not None, or num_categories is not None. Otherwise set to None.

  • num_categories (Optional[int], default: None) – Number of distinct categories. Objects with category >= num_categories are marked inactive. Set to None when categories are not used.

  • min_object_size (Optional[Sequence[float]], default: None) – Category-independent minimum object size [height, width] to be included. Must be None when per_category_min_object_sizes is not None.

  • per_category_min_object_sizes (Optional[Sequence[Sequence[float]]], default: None) – Per-category minimum size [height, width]. Must be None when min_object_size is not None.

  • use_per_category_heatmap (bool, default: True) – If True, draw a separate heatmap slice per category; otherwise draw a single heatmap.

  • is_valid_opt_in_name (Union[str, int, None], default: None) – Optional field with per-object validity. Will be applied in addition to the internal checks to determine if an object is active. If absent, all objects are treated as valid (internal checks can still mark objects as inactive).

  • center_opt_in_name (Union[str, int, None], default: None) – Name of the field containing the center of the bounding boxes. The so defined center is not necessarily the center of the 2D bounding box and could e.g. be the projection of the center of the 3D bounding box onto the image plane. Optional field. If not present, the centers are assumed to be the center of the 2D bounding boxes.

  • is_active_opt_out_name (Union[str, int, None], default: None) – Output field name for the per-object active flag. Optional field. The corresponding field will not be added if not provided.

  • center_opt_out_name (Union[str, int, None], default: None) – Output field name for integer center locations in the heatmap. The sub-pixel offset is written to center_offset_opt_out_name. Optional field. The corresponding field will not be added if not provided.

  • center_offset_opt_out_name (Union[str, int, None], default: None) – Output field name for sub-pixel center offsets in heatmap coordinates. Optional field. The corresponding field will not be added if not provided.

  • height_width_bboxes_heatmap_opt_out_name (Union[str, int, None], default: None) – Output field for per-object [height, width] in the heatmap. Optional field. The corresponding field will not be added if not provided.

  • bboxes_heatmap_opt_out_name (Union[str, int, None], default: None) – Output field for per-object bounding boxes in the heatmap. Optional field. The corresponding field will not be added if not provided.

  • min_fraction_area_clipping (float, default: 0.25) – Minimum remaining area fraction after clipping for an object to be considered active. For example, with 0.25, boxes that lose more than 75% of their area due to clipping are set inactive.

  • min_radius (float, default: 0.5) – Minimum radius used when drawing Gaussians. Enforced lower bound is 0.5.

  • max_radius (float, default: 10.0) – Maximum radius used when drawing Gaussians. Larger radii are clipped to this value.

  • radius_scaling_factor (float, default: 0.8) – Scaling factor applied to the bbox-derived radius.

  • radius_to_sigma_factor (float, default: 0.3333333333333333) – Factor to convert radius to Gaussian sigma.

class accvlab.dali_pipeline_framework.processing_steps.AnnotationElementConditionEval(annotation_field_name, condition, remove_data_fields_used_in_condition)[source]

Bases: PipelineStepBase

Evaluate a declarative condition per annotation element and store the boolean result.

This step looks for data group fields (see documentation of SampleDataGroup) corresponding to annotations, and applies the the defined conditions to data fields inside the annotation. The results are stored as a new data field inside the annotation. Both the data fields used in the condition and the resulting data fields are referenced by name in the condition string.

The used fields are expected to be 1D sequences (one value per object). The condition is evaluated per element, producing a boolean sequence with one result per object.

Note

While 1D sequences are expected, the data may be formatted as 2D tensors. In this case, one dimension needs to have a size of 1.

The condition must start with a variable name, followed by an assignment operator, followed by an expression.

The expression can contain variables (will be mapped to the data fields of the annotation), literals, and operators.

The supported operators are:
  • Logical operators: or, and, not

  • Comparison operators: ==, !=, >, >=, <, <=

  • Parentheses: ( and )

  • Unary minus: -; e.g. -_b1 < -10.5 is valid.

  • Assignment operator: =

The syntax is similar to Python. However, note that
  • Only the operators defined above are supported.

  • Direct comparisons of more than two values are not supported (e.g. a < b < c is not supported).

  • Only numeric literals are supported. True and False are not supported (not needed; use negation instead of comparison to False).

The result of the condition is stored in a new data field, which is added to the annotation. The name of the result data field is also defined inside the condition string.

Example

The condition can be described in a syntax similar to Python, e.g.:

is_valid = (num_lidar_points >= 1 or num_radar_points >= 1) and visibility_levels > 0 and category > 0

In this case:

  • The data fields num_lidar_points and num_radar_points, visibility_levels, and category are expected to be children of the annotation data group field.

  • The result of the condition is stored in a new data field inside the annotation data group field, named is_valid.

Important

In order to use data fields inside the condition, their names must follow the rules of Python variable names (e.g. no spaces, no special characters, do not start with a digit).

See also

  • Specific complex conditions can be checked with VisibleBboxSelector, PointsInRangeCheck, and the results of these checks can be combined with this step.

  • ConditionalElementRemoval can be used to remove elements from the data based on this condition.

  • BoundingBoxToHeatmapConverter has both input and output fields containing boolean masks denoting the active objects.

Parameters:
  • annotation_field_name (Union[str, int]) – Name of annotation data group field. Note that there can be more than one annotation field (e.g. one for objects visible in each camera). In this case, these annotations are all processed (independently of each other).

  • condition (str) – Condition to be applied. Please see the description above for more details.

  • remove_data_fields_used_in_condition (bool) – Whether to remove the data fields used in the condition after evaluating the condition. This is a convenience feature and can be set to True if the data fields are not used after evaluating the condition. However, note that if some of the data fields are used, it has to be set to False, as the fields are not available after this step otherwise.

class accvlab.dali_pipeline_framework.processing_steps.BEVBBoxesTransformer3D(data_field_names_points, data_field_names_velocities, data_field_names_sizes, data_field_names_orientation, data_field_names_proj_matrices_and_extrinsics, data_field_names_ego_to_world, data_field_names_world_to_ego, rotation_range, rotation_axis, scaling_range, translation_max_abs)[source]

Bases: PipelineStepBase

Augment BEV bounding boxes (and related geometry) with rotation, scaling, and translation.

The augmentation is applied in world coordinates. Related sensor geometry (e.g., extrinsics) is defined in ego coordinates and is updated accordingly using provided ego<->world transforms.

The individual augmentation steps are applied in the following order:
  1. Rotation

  2. Scaling

  3. Translation

Parameters:
  • data_field_names_points (Union[str, int, Sequence[Union[str, int]], None]) – Name or names of data fields in the input SampleDataGroup instance containing the points representing the bounding box center (in [x, y, z] format). Optional; will be updated if provided.

  • data_field_names_velocities (Union[str, int, Sequence[Union[str, int]], None]) – Name or names of data fields in the input SampleDataGroup instance containing the velocities of the objects (bounding boxes) (in [vx, vy, vz] format). Optional; will be updated if provided.

  • data_field_names_sizes (Union[str, int, Sequence[Union[str, int]], None]) – Name or names of data fields in the input SampleDataGroup instance containing the sizes of the bounding boxes (in [x, y, z] format). Optional; will be updated if provided.

  • data_field_names_orientation (Union[str, int, Sequence[Union[str, int]], None]) – Name or names of data fields in the input SampleDataGroup instance containing the orientations of the bounding boxes (in radians). Optional; will be updated if provided.

  • data_field_names_proj_matrices_and_extrinsics (Union[str, int, Sequence[Union[str, int]], None]) – Name or names of data fields in the input SampleDataGroup instance containing projection matrices and/or extrinsics. Note that camera intrinsics don’t need to be adjusted and must not be included in this list. Optional; will be updated if provided.

  • data_field_names_ego_to_world (Union[str, int, Sequence[Union[str, int]], None]) – Name or names of data fields in the input SampleDataGroup instance containing matrices representing a transformation (e.g. for points) from ego to world coordinates. Optional; will be updated if provided.

  • data_field_names_world_to_ego (Union[str, int, Sequence[Union[str, int]], None]) – Name or names of data fields in the input SampleDataGroup instance containing matrices representing a transformation (e.g. for points) from world to ego coordinates. Optional; will be updated if provided.

  • rotation_range (Optional[Tuple[float, float]]) – Rotation range for the randomized rotation in the augmentation transformation. Optional; if not provided, no rotation is applied.

  • rotation_axis (Optional[int]) – Axis of rotation (0 indicating x, 1 indicating y, and 2 indicating z). Must be provided if rotation_range is provided.

  • scaling_range (Optional[Tuple[float, float]]) – Scaling range for the augmentation transformation. Optional; if not provided, no scaling is applied.

  • translation_max_abs (Optional[Tuple[float, float]]) – Maximum absolute translation range in all dimensions. Optional; if not provided, no translation is applied.

class accvlab.dali_pipeline_framework.processing_steps.VisibleBboxSelector(bboxes_field_name, resulting_mask_field_path, image_field_name=None, image_hw_field_name=None, image_hw=None, check_for_bbox_occlusion=True, check_for_minimum_size=True, depths_field_name=None, minimum_bbox_size=None)[source]

Bases: PipelineStepBase

Select visible 2D bounding boxes.

A box is considered visible if it is not completely overlapped by nearer boxes (occlusion test) and/or if it meets a minimum size threshold. The result is written as a boolean mask to the configured output path. Both checks are optional and can be enabled or disabled independently.

A mask is added which indicates which boxes are visible. The original bounding boxes are not modified.

See also

  • AnnotationElementConditionEval can be used to combine the results of this step with other conditions.

  • ConditionalElementRemoval can be used to remove elements from the data based on this condition or a combination of this condition with other conditions.

Note that the step expects exactly one data field in the input SampleDataGroup instance to contain the bounding boxes (as well as only one field containing the depths). If multiple sets of bounding boxes are present in the data, this processing steps has to be applied to parts (sub-trees) of the input data individually so that each part contains only one set of bounding boxes, and access modifier wrapper steps need to be used (see class GroupToApplyToSelectedStepBase and its subclasses).

Parameters:
  • bboxes_field_name (Union[str, int]) – Name of data field in the input SampleDataGroup instance containing the bounding boxes. Each row is expected to contain a bounding box in the format: [min_x, min_y, max_x, max_y]. The input data must contain exactly one field with this name.

  • resulting_mask_field_path (Union[str, int, Tuple[Tuple[str, int], ...]]) – Path of the data field to store the result as. The path is relative to the root element. Note that if this step is wrapped by a sub-tree selection step, the root of the selected sub-tree acts as the root.

  • image_field_name (Union[str, int, None], default: None) – Name of field containing the image from which to extract the size. Only one of image_field_name, image_hw_field_name, or image_hw should be set (single source of truth).

  • image_hw_field_name (Union[str, int, None], default: None) – Name of field containing the image size for which the bounding boxes are defined. Only one of image_field_name, image_hw_field_name, or image_hw should be set (single source of truth).

  • image_hw (Optional[Sequence[int]], default: None) – Image size [height, width] for the image for which the bounding boxes are defined. Only one of image_field_name, image_hw_field_name, or image_hw should be set (single source of truth).

  • check_for_bbox_occlusion (bool, default: True) – Whether to consider boxes invisible if completely overlapped by nearer boxes.

  • check_for_minimum_size (bool, default: True) – Whether to consider boxes invisible if below a minimum size.

  • depths_field_name (Union[str, int, None], default: None) – Name of the data field containing the bounding box depth. Needs to be set if check_for_bbox_occlusion is set to True. The input data must contain exactly one field with this name.

  • minimum_bbox_size (Optional[float], default: None) – Minimum size of a bounding box to be visible. Needs to be set if check_for_minimum_size is set to True.

class accvlab.dali_pipeline_framework.processing_steps.PointsInRangeCheck(points_fields_name, is_inside_field_name, minimum_point, maximum_point)[source]

Bases: PipelineStepBase

Check whether points lie within a given axis-aligned box and add a boolean mask.

See also

  • AnnotationElementConditionEval can be used to combine the results of this step with other conditions.

  • ConditionalElementRemoval can be used to remove elements from the data based on this condition or a combination of this condition with other conditions.

Parameters:
  • points_fields_name (str) – Name of the data field containing the points to check. If multiple fields with that name are present, each is processed independently.

  • is_inside_field_name (str) – Name of the sibling data field to store the boolean mask in. Must not already exist.

  • minimum_point (Sequence[float]) – Lower corner (min per dimension) of the region.

  • maximum_point (Sequence[float]) – Upper corner (max per dimension) of the region.

class accvlab.dali_pipeline_framework.processing_steps.ConditionalElementRemover(annotation_field_name, mask_field_name, field_names_to_process, field_dims_to_process, fields_to_process_num_dims, remove_mask_field)[source]

Bases: PipelineStepBase

Remove elements from arrays (e.g., per‑object data) based on a boolean mask.

Arrays are stored as (multi-dimensional) tensors; for each array a dimension index indicates the element axis (the axis along which the elements to be removed/retained are enumerated). Elements with mask value False are removed along the configured dimension for each target field.

See also

Multiple classes are available which evaluate conditions of some kind and store the results as boolean masks. These masks can be used in this class:

Parameters:
  • annotation_field_name (Union[str, int]) – Name of the annotation data group field to process. Each annotation field is processed independently.

  • mask_field_name (Union[str, int]) – Name of the boolean mask indicating which elements to keep (True) or remove (False). Must be a child of each annotation field.

  • field_names_to_process (Sequence[Union[str, int]]) – Names of fields to process. The fields must be present in each annotation field.

  • field_dims_to_process (Sequence[int]) – For each field name, the dimension index along which elements are to be removed.

  • fields_to_process_num_dims (Sequence[int]) – For each field name, the number of dimensions in the tensor.

  • remove_mask_field (bool) – Whether to remove the mask field after applying this step.

class accvlab.dali_pipeline_framework.processing_steps.UnneededFieldRemover(unneeded_field_names)[source]

Bases: PipelineStepBase

Processing step for removing unneeded fields from the data.

This step does not add any processing steps to the DALI graph, i.e. it is fully performed on DALI graph construction time and does not have any overhead at runtime. This means that this step can be used inside the pipeline multiple times to ensure a clean data structure without any performance penalty (apart from the overhead at graph construction time).

Note

For pipelines which use data which is not needed in the final output (e.g. intermediate results, image size on the CPU, etc.), it is advisable to perform this step at least once, directly before outputting the data, in order to avoid unneeded copies & clutter in the final output.

Parameters:

unneeded_field_names (Union[Tuple[Union[str, int], ...], List[Union[str, int]]]) – Names of the fields to be removed. All fields with those names are removed.