Pipeline Submodule
This module contains the pipeline definition class as well as classes which are used to structure and manage the data inside the pipeline as well as the output of the pipeline.
- class accvlab.dali_pipeline_framework.pipeline.PipelineDefinition(data_loading_callable_iterable, preprocess_functors=None, check_data_format=True, use_parallel_external_source=True, prefetch_queue_depth=2, print_sample_data_group_format=False)[source]
Bases:
objectDefinition for the data loading and pre-processing pipeline.
Configure with a data-loading functor and an ordered list of processing steps. Exposes utilities to retrieve the input data format (blueprint), infer the output data format by applying each step’s format-checking logic, and build a DALI pipeline that combined the data loading functor and the processing steps.
- Parameters:
data_loading_callable_iterable (
Union[CallableBase,IterableBase]) – Callable or iterable performing the loading of the data.preprocess_functors (
Optional[Sequence[Optional[PipelineStepBase]]], default:None) – Functors for the individual processing steps which will be executed in sequence on the input data. May containNone-elements, which are ignored. Optional, if not set, the loaded data is returned as is.use_parallel_external_source (
bool, default:True) – Whether to use the parallel external source.prefetch_queue_depth (
int, default:2) – The depth of the prefetch queue. Only used if use_parallel_external_source is True.print_sample_data_group_format (
bool, default:False) – Whether to print the sample data group formats after each processing step during the setup of the pipeline (e.g. for debugging purposes).
- property input_data_structure: SampleDataGroup
Get the input data format (blueprint).
The input blueprint is provided by the data-loading functor passed at construction time.
- Returns:
SampleDataGroupblueprint object describing the input data format (no actual data).
- check_and_get_output_data_structure()[source]
Infer and return the output data format (blueprint).
Starting from the input blueprint provided by the loading functor, each processing step validates compatibility and transforms the blueprint (e.g., adding fields or changing types). Steps are applied in sequence to obtain the final output blueprint. If an incompatibility is detected, an exception is raised.
- Returns:
SampleDataGroup–SampleDataGroupblueprint object describing the output data format (no actual data).- Raises:
ValueError – If the data loading functor is not compatible with the first processing step.
- get_dali_pipeline(*args, **kwargs)[source]
Get the DALI pipeline as configured.
Note
This calls a function decorated with
@pipeline_defused by DALI to create a pipeline object. The resulting pipeline object is returned. For more information on the possible arguments (i.e.*argsand**kwargsin this function), see the documentation of thenvidia.dali.pipeline.experimental.pipeline_def()decorator.- Parameters:
*args – Arguments for the DALI pipeline.
**kwargs – Keyword arguments for the DALI pipeline.
- Returns:
Pipeline– The DALI pipeline as configured.
- class accvlab.dali_pipeline_framework.pipeline.DALIStructuredOutputIterator(num_batches_in_epoch, pipeline, sample_data_structure_blueprint, contained_dataset=None, dali_generic_iterator_class=<class 'nvidia.dali.plugin.pytorch.DALIGenericIterator'>, convert_sample_data_group_to_dict=True, post_process_func=None)[source]
Bases:
objectStructured access to DALI pipeline output (as a nested dict or
SampleDataGroup).Designed as a drop-in replacement for a
torch.utils.data.DataLoader. Optionally applies a user-defined lightweight post-processing function (e.g., conversions to types not supported by DALI).- Parameters:
num_batches_in_epoch (
int) – Number of batches in an epoch. Note that this value is only used to output iflen(obj)is called. It is not used internally and is added here to ensure drop-in compatibility withtorch.utils.data.DataLoader.pipeline (
Pipeline) – DALI pipeline object.sample_data_structure_blueprint (
SampleDataGroup) – Blueprint for the output data structure.contained_dataset (
Optional[Any], default:None) – Dataset object which will be exposed viadataset(mirrors PyTorchDataLoaderbehavior). Can be a PyTorchDatasetor any other compatible object. Note that this object is not used internally. Also seedataset.dali_generic_iterator_class (
Union[Type[DALIGenericIterator],Any], default:<class 'nvidia.dali.plugin.pytorch.DALIGenericIterator'>) – Class for the internal DALI generic iterator. Follows thePyTorchDALIGenericIteratorinterface but may emit tensors for other frameworks. Defaults toPyTorchDALIGenericIterator.convert_sample_data_group_to_dict (
bool, default:True) – IfTrue, convert outputSampleDataGroupto a nesteddict. Ensures drop-in compatibility withDataLoaderwhen no post-processing function is provided.post_process_func (
Optional[Callable[[Union[SampleDataGroup,dict]],Union[SampleDataGroup,dict]]], default:None) – Optional post-processing function for the output. This can be e.g. used to convert data to types not supported by DALI. or perform other light-weight steps. The input is aSampleDataGroupobject ifconvert_sample_data_group_to_dict == Falseand adictotherwise. Note that this function is executed when the data is accessed in the thread accessing the data (typically the thread performing the training). Therefore, this function should be kept lightweight to avoid performance penalties.
- class SimpleIterator(obj)[source]
Bases:
IteratorIterator, which can e.g. be used as a drop-in replacement for a PyTorch DataLoader iterator.
Note that a single iterator should be used at any point in time. if multiple iterators are used, they share the state, i.e. getting a new iterator will reset all other iterators and calling next for one iterator will advance all iterators by one element.
- reset()[source]
Reset the iterator.
Will call
DALIStructuredOutputIterator.reset()for the parent object.
- __iter__()[source]
Get an iterator.
Note that a single iterator should be used at any point in time. if multiple iterators are used, they share the state, i.e. getting a new iterator will reset all other iterators and calling next for one iterator will advance all iterators by one element.
- Return type:
- reset()[source]
Reset the current iteration progress (start over from the beginning).
Note that this will reset iterators of the object as well.
- property sample_data_structure_blueprint: SampleDataGroup
Get the output data structure blueprint.
The blueprint is a
SampleDataGrouprepresenting the same nested data format as the output, without the actual data. SeeSampleDataGroupfor details.
- property internal_iterator: DALIGenericIterator | Any
Get the actual DALI iterator used to access the output data internally.
Note that by default, this is a
nvidia.dali.plugin.pytorch.DALIGenericIterator. However, this can be changed in the constructor and in this case, the returned object will be of the type specified in the constructor.
- property dataset: Any
Get the dataset object.
This is the dataset object set in the constructor (if any). If not set, this will return the object for which it is called. This property is used for compatibility with
torch.utils.data.DataLoader.
- __len__()[source]
Number of available batches.
Important
This value is set manually in the constructor, and only used to output it here. This is done as it is a part of the
torch.utils.data.DataLoaderinterface. The value my be not the actual number of batches in the epoch, e.g. for non-epoch based pipelines.
- class accvlab.dali_pipeline_framework.pipeline.SampleDataGroup[source]
Bases:
objectStructured container for sample data. Can also be used as a blueprint to describe the data format.
Data is organized as a tree containing:
Data fields: Leaf nodes that hold the actual data.
Data group fields: Non-leaf nodes that group related items.
Example
An example for accessing the data field
"bounding_boxes"inside nested data group fields"camera"and"annotations":>>> bounding_boxes = data["camera"]["annotations"]["bounding_boxes"]
Note that accessing the data is done as for a nested dictionary. Here, the data group fields are analogous to
dictobjects and data fields correspond to the actual stored values at the leaves.Capabilities (see individual method docs for details):
Enforce a predefined data format (field names, order, and types). Format changes need to be performed explicitly.
Inside the input callable/iterable and outside the DALI pipeline the following can be performed (both can be disabled):
Apply automatic type conversions (e.g., integers to floats) on assignment
Apply optional custom string-to-numeric mappings on assignment for selected fields (see
add_data_field(),add_data_field_array(), andset_apply_mapping()).
Inside the pipeline: Apply automatic type checks on assignment.
Render the tree in a human-readable form via
print(obj).Flatten values to a sequence and reconstruct from a sequence (see
get_data(),set_data(), andset_data_from_dali_generic_iterator_output()). This is useful when passing the data from the input callable/iterable to the pipeline, and when returning data from the pipeline, as nested data structures are not supported there. Also seeDALIStructuredOutputIteratorfor an output iterator which re-assembles the data from the flattened output into aSampleDataGroupinstance or nested dictionaries before returning it.Compare formats of two instances (see
type_matches()). This also ensures that the flattened data obtained from one instance can be used to fill the data of another instance.Utilities that facilitate implementation of pipeline steps: find/remove all occurrences of fields with a given name, add/remove/change fields and types, etc. (e.g. see
find_all_occurrences()). Note that the search is performed at DALI graph construction time, so there is no overhead during the pipeline execution.Supports passing strings through the DALI pipeline and obtaining them as strings in the pipeline output. Note that strings are not supported inside the DALI pipeline. They can be accessed/assigned as strings in the input callable/iterable and outside the DALI pipeline, but appear as uint8 tensors inside the pipeline itself (alternative: use a mapping to numeric values as described above).
Usage modes:
Blueprint: describes the data format (fields and types) but contains no values. This allows inferring downstream formats without running data processing (e.g., to initialize a DALI iterator). When only passing of flattened data is possible, a blueprint can be filled from flattened values (see
get_data(),set_data()).Container: holds actual values. When accessing the data, behaves similarly to a nested dictionary. When assigning data, additional checks/conversions are potentially performed.
Important
Assigning a Field Value
Assignment means using the indexed assignment operator
obj[name] = valueor the methodobj.set_item_in_path(path, value).When assigning data fields, the following holds:
Mappings and conversions will be performed on assignment (inside the input callable/iterable and outside the DALI pipeline; if not disabled). Inside the DALI pipeline itself, no mapping or conversion is applied.
Inside the DALI pipeline, type checks are performed instead on assignment and an error is raised if the type is not correct.
Assigning strings is only supported in the input callable/iterable and outside the DALI pipeline. String fields are handled as uint8 tensors inside the DALI pipeline.
When assigning to data group fields, the following holds:
The assignment succeeds only if the new value’s format matches the previous format, i.e. if
obj[name].type_matches(value)holds. Otherwise, aKeyErroris raised. This is done to prevent changing the data format implicitly by assigning a different type.If the type needs to be changed, this needs to be done explicitly first (e.g., using
change_type_of_data_and_remove_data()).
Important
Getting a Field Value
Getting a field value means using the indexed access operator
obj[name]or the methodobj.get_item_in_path(path).Accessing strings inside the DALI pipeline (except for the input callable/iterable) will return the underlying uint8 tensor instead. Using strings directly is only supported in the input callable/iterable and outside the DALI pipeline.
Important
Changing the Data Format
Changing the data format is always explicit. For example, adding a field and assigning values is a two-step process: create the field first, then assign data. When defining a blueprint, fields are created but left empty.
Important
Type Checking
Type checking is performed on assignment to ensure that the data type is correct (inside the DALI pipeline). This is useful when developing the pipeline/processing step, but adds some overhead. Type checking is enabled by default (see
set_do_check_type()).Note
Additional information:
When converting a
SampleDataGroupto a string (e.g., usingprint(obj)), the data format as well as some details (e.g., for which fields a mapping is defined, which fields are empty, data types of the fields) are printed. The actual stored values are not printed. For a more simple output, seeget_string_no_details().When obtaining the length of a
SampleDataGroup(e.g., usinglen(obj)), the number of direct children (data fields and data group fields) is returned.
- static create_data_field_array(type, num_fields, mapping=None)[source]
Create a
SampleDataGroupcontaining multiple data fields of the same type.The data fields have numerical (integer) names in the range
[0; num_fields - 1]. This means that the returnedSampleDataGroupbehaves as an array of data fields.- Parameters:
- Returns:
SampleDataGroup– Resulting arraySampleDataGroupobject
- static create_data_group_field_array(sample_data_group, num_fields)[source]
Create a
SampleDataGroupcontaining multiple data group fields (themselvesSampleDataGroupinstances).Note that the created data group fields will be initialized as blueprints, i.e. they will not contain any actual data even if
sample_data_groupdoes. This is done to cleanly separate this step (defining the data format) from actually filling the data.- Parameters:
sample_data_group (
SampleDataGroup) – Blueprint representing the element format. Any actual data present insample_data_groupwill be ignored; the resulting elements will be empty of data.num_fields (
int) – Number of fields to create
- Returns:
SampleDataGroup– Resulting arraySampleDataGroupobject
- set_apply_mapping(apply)[source]
Set whether to apply string to numeric mapping (for data fields where such a mapping is defined).
This setting will be propagated to descendants (data group fields) of the data group field for which it is called.
Note
The mapping is applied in the input callable/iterable and outside the DALI pipeline. Inside the DALI pipeline itself, the mapping is not applied. If apply mapping is set to
Trueand an assignment is performed inside the pipeline, a warning will be issued, and the assignment will be performed without mapping (if it is already in the correct format; an error will be raised if the format is not correct).- Parameters:
apply (
bool) – Whether to apply the mapping (for fields where a mapping is set).
- set_do_convert(convert)[source]
Set whether to convert data in the data fields to the types set up when creating those fields.
This setting will be propagated to descendants (data group fields) of the data group field for which it is called.
Note
The conversion is applied in the input callable/iterable and outside the DALI pipeline. Inside the DALI pipeline itself, the conversion is not applied. Instead, type checks are performed (regardless of this setting).
- Parameters:
convert (
bool) – Whether to perform automatic type conversions (e.g., integers to floats) on assignment.
- set_do_check_type(check_type)[source]
Set whether to perform type checking on assignment.
This setting will be propagated to descendants (data group fields) of the data group field for which it is called.
Note
The type checking is useful when developing the pipeline/processing step, but adds some overhead. Therefore, it is advisable to disable it in production.
- Parameters:
check_type (
bool) – Whether to perform type checking (in the DALI pipeline) on assignment.
- get_empty_like_self()[source]
Get an object with the same structure (same nested data group fields and data fields), but no values.
Obtain a blueprint either from another blueprint or from a populated object (ignoring values and initializing all data fields as empty). This can be regarded as a deep-copy of the original object, but with the actual data removed.
- Returns:
SampleDataGroup– Resulting blueprintSampleDataGroupobject.
- get_copy()[source]
Get a copy.
Create a copy: equivalent to
get_empty_like_self()followed by filling the data from the original object. Note that for the actual data, references to the original data are used, i.e. the data itself is not deep-copied. However, the data group fields making up the data format are deep-copied.This means that modifying the data in place will modify the data in the original. However, assigning new data to fields, adding or deleting fields, changing their type etc. will not affect the original.
- Returns:
SampleDataGroup– Resulting copy
- type_matches(other)[source]
Check whether the data type defined by two objects of
SampleDataGroupis the same.The following is not considered when checking for equality as it is not considered to be part of the type described by the object: :rtype:
boolThe actual data stored in the data fields
Whether mapping and conversion should be performed
Whether mappings are available for the same fields and whether mappings themselves are the same
Important
Note that it is checked whether the fields appear in the same order in the two objects. This is the case if the objects are constructed from the same blueprint (or if they were constructed by adding the individual fields in the same order). This is important as it defines whether the flattened data, e.g. obtained by
get_data()from one of the objects can be used to fill the data into the other one, e.g. usingset_data().
- set_item_in_path(path, value)[source]
Assign a field value at a (nested) path.
The path is a sequence of field names/keys. For example, if the path is
path = ("name_1", "name_2", "name_3"), the following are equivalent:obj.set_item_in_path(path, value_to_set)obj["name_1"]["name_2"]["name_3"] = value_to_set
Important
See the class docstring for details on the assignment behavior.
- get_item_in_path(path)[source]
Get a field value at a nested path.
The path is a sequence of field names/keys. For example, if
path = ("name_1", "name_2", "name_3"), the following are equivalent:value = obj.get_item_in_path(path)value = obj["name_1"]["name_2"]["name_3"]
Note
Accessing strings inside the DALI pipeline (except for the input callable/iterable) will return the underlying uint8 tensor instead. Using strings directly is only supported in the input callable/iterable and outside the DALI pipeline.
- get_parent_of_path(path)[source]
Get the parent of an element described in path.
- The following are equivalent:
obj.get_parent_of_path(path)obj.get_item_in_path(path[:-1])
Note
As a parent node cannot be a data field (i.e. a leaf node), the returned value is always a
SampleDataGroupinstance.
- get_type_of_item_in_path(path)[source]
Get the type of the item at a nested path.
See also
SampleDataGroup.get_item_in_path()for a description of the path parameter.SampleDataGroup.get_type_of_field()for a description of how type information is returned (which applies to this method as well).
- Returns:
Union[DALIDataType,type] – Data type of the field. For data group fields,SampleDataGroup. For data fields, the correspondingnvidia.dali.types.DALIDataType. Ifpathis empty, returnsself.
- path_exists_and_is_data_group_field(path)[source]
Check if a field with the given path exists and is a data group field.
- get_type_of_field(name)[source]
Get type of a field.
The type is either expressed as a
nvidia.dali.types.DALIDataType(data fields) orSampleDataGroup(data group fields).- Parameters:
- Returns:
Union[DALIDataType,type] – Type of the field. For string fields this returnsnvidia.dali.types.DALIDataType.STRING. Note that this is different from flattened contexts (e.g.,field_types_flat), where strings are represented asnvidia.dali.types.DALIDataType.UINT8. This is as the flattened data is used internally to pass data betweenSampleDataGroupobjects where the object itself cannot be passed and consequently, the string data is passed as stored internally (i.e. the underlying uint8 tensors). Here, the actual type as configured (e.g. byadd_data_field()) is returned.
- get_string_no_details()[source]
Get string representing the
SampleDataGroupinstance, omitting details.Omits per-field details such as whether a value is set and whether a mapping is available.
- Return type:
- is_array(field=None)[source]
Check whether (self or child) object can be regarded as an array.
- This is the case if all of the following hold:
The field names have integer numeric names.
Each element in the range
[0; len(self) - 1]is present as a name.The value order is such that for each element, the name increases by 1, i.e.
self.contained_top_level_field_names == (0, 1, 2, 3, ...).
- is_data_field_array(field=None)[source]
Check whether (self or child) object is an array whose elements are all data fields (no data group fields).
See documentation of
is_array()for conditions for a data group field to be regarded as an array.
- is_data_group_field_array(field=None)[source]
Check whether (self or child) object is an array whose elements are all data group fields (no data fields).
See documentation of
is_array()for conditions for a data group field to be regarded as an array.
- property contained_top_level_field_names: Tuple[str | int]
Get the names of the contained top-level fields.
The order of the fields corresponds to the order in which they were added.
- Returns:
Names of contained fields.
- property field_top_level_types: Tuple[DALIDataType | type]
Types of the top-level fields.
The order of the fields corresponds to the order in which they were added (and to the order of the elements returned by
contained_top_level_field_names).Types fields are
nvidia.dali.types.DALIDataTypeinstances for data fields andSampleDataGroupblueprints for data group fields.
- property field_names_flat: Tuple[str]
Names of contained data fields flattened (all leaf nodes, not only direct children).
Each element corresponds to a data field (leaf node). Original nesting is reflected in the names (concatenated with “.” between parent and child). Numerical names are converted to strings to ensure that they can be used as names in other places (e.g. DALI generic iterator). For example, the numeric name
5would become"[5]". For example, if there is a data field in the original object in the pathobject["name_0"][1]["name_2"], the name used in the flattened tuple of names would be"name_0.[1].name_2".The order of the elements corresponds to the order used in
get_data(), so that the names obtained here correspond to the values obtained there.No names are added for data group fields themselves. If they contain descendants which are data fields, their name will appear in the name of the descendants (before “.”). However, if a data group field does not contain any data field descendants, it will not contribute a name to the output.
Note
The names themselves reflect the hierarchy of the data, so that the names are unique, even if there are multiple fields with the same name in the structure.
- property field_types_flat: Tuple[DALIDataType]
Types of contained data fields flattened (all leaf nodes, not only direct children).
Each element corresponds to a leaf node.
The order of the elements corresponds to the order used in
get_data(), so that the types obtained here correspond to the values obtained there.No types are added for data group fields themselves. If they contain descendants which are data fields, the types of these descendants will be added. However, if a data group field does not contain any data field descendants, it will not contribute a type to the output.
Note
As only the leaf nodes containing data are considered, no entries directly corresponding to data group fields will be added.
String fields are represented as
nvidia.dali.types.DALIDataType.UINT8, matching their in-pipeline representation. Note that this is different from e.g.get_type_of_field(), but consistent withget_data()(seeget_data()for details on the rationale).
- get_data(as_list_type=False)[source]
Get values of all data fields as a flattened sequence (all leaf nodes, not only direct children).
The order of the elements is the order of a depth-first traversal with the order of the children at each node corresponding to the order in which the elements were added (consistent with, e.g.,
contained_top_level_field_names). The order is the same as infield_names_flatandfield_types_flat, so that these can be used to obtain information about the individual elements of the obtained sequence of values. Only data fields (leaf nodes that are notSampleDataGroup) contribute values. Data group fields are not included directly, but their data field descendants contribute values.Note
- The tuple returned by this function can be used directly to
Pass parameters from an input callable/iterable to the DALI pipeline.
Return the final output of the DALI pipeline.
In these cases, the returned sequence can be used to fill the original data structure (using
set_data()orset_data_from_dali_generic_iterator_output()) into aSampleDataGroupblueprint object with the same format asself.Important
For string data fields, the values are the underlying uint8 arrays/tensors (or DataNodes), not Python
strobjects (both inside and outside the DALI pipeline). This method is designed to exchange data betweenSampleDataGroupobjects and directly returns the underlying data, with the encoded strings. The conversion to Pythonstrobjects is performed when the data is obtained, e.g. using the indexed access operator[]orget_item_in_path().
- set_data(data)[source]
Set values of all descendant data fields from a flattened sequence.
The sequence needs to contain the data in the same order as indicated by
field_names_flat. If the flat data was obtained byget_data()from aSampleDataGroupobject with the same data format asself, this will always be the case. The compatibility between the object from which the flattened data was obtained and this instance can be checked withtype_matches().Important
When setting data in this way, no conversions or mappings are applied (both inside and outside the DALI pipeline). This method is designed to exchange data between
SampleDataGroupobjects and expects the data as stored in theSampleDataGroupobject (i.e., already converted and with mappings applied) as input.
- set_data_from_dali_generic_iterator_output(data, index)[source]
Set values from the output of a DALI generic iterator.
The DALI generic iterator refers to
nvidia.dali.plugin.pytorch.DALIGenericIteratoror any other iterator which follows the same interface (tensor types may be from a different framework).The iterator (and therefore, the underlying DALI pipeline) must output the flattened data in the format as this instance (using
get_data()), with names assigned in the iterator to the individual fields matchingfield_names_flatof this object. The compatibility between the object from which the flattened data was obtained and this instance can be checked withtype_matches().See also
get_like_self_filled_from_iterator_output()Note
Values for string fields are uint8 arrays/tensors (not Python strings). For details, see
get_data().
- add_data_field(name, type, mapping=None)[source]
Add a data field as a direct child.
Data field means that the field contains actual data, i.e. is not another data group field (
SampleDataGroupinstance).Note
If a mapping is defined, it is applied both to strings and to (possibly nested, multi-dimensional) sequences of strings (lists/tuples/arrays). The mapping is a dictionary from original string values to numeric values. The special key
Noneprovides a default value for unmatched inputs.The mapping is only applied when data is assigned inside the input callable/iterable or outside the DALI pipeline. The mapping is not performed for assignments inside the actual DALI pipeline (and setting data there is only supported directly using numerical values).
Note
Alternatively to using a mapping, strings can be directly assigned to data fields by setting the data type to
nvidia.dali.types.DALIDataType.STRING. However,String processing in this way is only supported inside the input callable/iterable and outside the DALI pipeline, and such strings appear as uint8 tensors inside the DALI pipeline.
Only single strings can be assigned, not sequences of strings (although outputting 1D sequences of strings is supported to enable output of batch-wise data).
Often, using a mapping is advantageous to meaningfully process the data in the pipeline and also needs to be performed for other reasons (e.g. to convert class labels from strings to integers to be used in the loss computation).
This way of handling strings is e.g. useful to pass sample tags or other high-level descriptors through the pipeline.
- Parameters:
type (
DALIDataType) – Type of (the elements of) the field to add. If a mapping is used, this is the type after mapping is applied.mapping (
Optional[Dict[Optional[str],Union[int,float,number,bool]]], default:None) – Mapping defining the mapping from input string values to numerical values. The conversion from string to numeric happens at data assignment (if applying mapping is not disabled).Nonecan be added as a key to the mapping. In this case, the respective value is used if the input string(s) do not match any of the other keys. Mapping is applied both if a single string is assigned, but also for (n-dimensional) sequences of strings. Note that if a mapping is set, numeric values can still be assigned directly to the data field alternatively to strings.
- add_data_group_field(name, blueprint_sample_data_group)[source]
Add a data group field as a direct child.
Data group field means a child of the type
SampleDataGroup, which itself can contain data fields and/or data group fields. Data group fields are used to group elements together logically.blueprint_sample_data_groupacts as a blueprint. A new empty instance with the same format is created and added as the child. Values can be assigned later directly (or viaset_item_in_path()).- Parameters:
name (
str) – Name of the new field.blueprint_sample_data_group (
SampleDataGroup) –SampleDataGroupinstance describing the field format to add.
- add_data_field_array(name, type, num_fields, mapping=None)[source]
Add a data field array.
Add a child data group field (type
SampleDataGroup) that containsnum_fieldselements, each with the type and mapping defined here. Elements are added with integer names from0tonum_fields - 1, so the child behaves like an array.Note
If a blueprint of the array is already created as another, independent blueprint, you can use
add_data_group_field()to add the blueprint to this object.- Parameters:
name (
str) – Name of the array data group field to addtype (
DALIDataType) – Type of the fields to add to the array data group fieldnum_fields (
int) – Number of fields to add to the array data group fieldmapping (
Optional[Dict[Optional[str],Union[int,float,number,bool]]], default:None) – Optional mapping for the fields (seeadd_data_field()for details on mappings).
- add_data_group_field_array(name, blueprint_sample_data_group, num_fields)[source]
Add a data group field array.
Add a child data group field (type
SampleDataGroup) that containsnum_fieldselements, each matching the provided blueprint. Elements are added with integer names from0tonum_fields - 1so the child behaves like an array.Note
If a blueprint of the array is already created as another, independent blueprint, you can use
add_data_group_field()to add the blueprint to this object.- Parameters:
name (
str) – Name of the array data group field to addblueprint_sample_data_group (
SampleDataGroup) –SampleDataGroupdescribing the element format (each element is initialized fromget_empty_like_self()of the blueprint).num_fields (
int) – Number of elements to add.
- remove_all_occurrences(name_to_remove)[source]
Remove all fields with a given name.
All fields with a given name are removed in the tree of which
selfis the root, i.e. of this node and its descendants.See also
- find_all_occurrences(name_to_find)[source]
Find all occurrences of fields with a given name.
The search is performed in the tree where
selfis the root, i.e. of this node and its descendants.See also
- Parameters:
name_to_find (
Union[str,int]) – Name of the field(s) to find- Returns:
Tuple[Tuple[Union[str,int]]] – Paths to the found fields. If none were found, an empty tuple is returned. The individual paths are themselves tuples. For example, the path("name_1", "name_2", "name_3")would denote the elementself["name_1"]["name_2"]["name_3"].
- get_num_occurrences(name_to_find)[source]
Get the number of occurrences of fields with a given name.
Returns the number of occurrences in the tree where
selfis the root, i.e. of this node and its descendants.See also
- change_type_of_data_and_remove_data(path, new_type, new_mapping=None)[source]
Change the type of a child field and remove its data.
The data is removed as it is incompatible with the new type. Note that removing the data means resetting the reference, not actively deleting the data.
Example
A typical use case would be:
Get the data of which the type should be changed, e.g.:
data = obj["name"]Change the data type
Change the data type as stored in the structure, e.g.:
obj.change_type_of_data_and_remove_data("name", dali.types.DALIDataType.FLOAT)Convert the actual data, e.g.:
data = dali.fn.cast(data, dtype=types.DALIDataType.FLOAT)
Write data back, e.g.:
obj["name"] = data
Note that instead of
"name", a nested path can be used.- Parameters:
path (
Union[Tuple[Union[str,int]],str,int]) – Either a child name or a nested path (sequence of names).new_type (
Union[DALIDataType,SampleDataGroup]) – For data fields, atypes.DALIDataType. For data group fields, aSampleDataGroupused as a blueprint describing the new format.new_mapping (
Optional[Dict[Optional[str],Union[int,float,number,bool]]], default:None) – New mapping for data fields (seeadd_data_field()). Must beNonefor data group fields.
- get_flat_index_first_discrepancy_to_other(other)[source]
Get the first flat index where two instances differ in field structure, name, or type.
Compares flattened field names and types (see
field_names_flat,field_types_flat). The flattened names include full paths, making structural differences visible. Empty sample data group nodes (no data field descendants) are ignored.- Parameters:
other (
SampleDataGroup) – Other SampleDataGroup instance to compare to.- Returns:
int– Index where the first difference is present, or -1 if there are no differences. Note that string fields are compared asnvidia.dali.types.DALIDataType.UINT8in the flattened types, matchingfield_types_flat.
- ensure_uniform_size_in_batch(fill_value)[source]
For each data field, ensure uniform size in batch by padding with
fill_value.This is equivalent to calling
dali.fn.pad(field_values)for all contained data fields (in this data group field, and its descendants).Warning
This method needs to be called inside the DALI pipeline (except the input callable/iterable).
Scalar (i.e. 0D) tensors are not supported. If such tensors are present, an error will be raised.
- ensure_uniform_size_in_batch_for_all_strings()[source]
Ensure uniform size in batch for all string data fields.
This is useful before outputting from the DALI pipeline in a format that expects uniform size. A padding with 0-values is performed for all string data fields. This is done for all contained string data fields (in this data group field, and its descendants).
Note
When obtaining the data as strings, the padding is removed and only the actual data is returned.
- is_data_field(name)[source]
Check whether a child field is a data field.
- Parameters:
- Returns:
bool– Whether the child field is a data field (contains values) as opposed to a data group field (field of typeSampleDataGroup).
- is_data_group_field(name)[source]
Check whether a child field is a data group field.
- Parameters:
- Returns:
bool– Whether the child field is a data group field (field of typeSampleDataGroup).
- to_dictionary()[source]
Get a nested dictionary with the same (nested) data structure and contained values.
This and descendants
SampleDataGroupobjects are converted todictobjects. Contained strings are returned as Python strings.- Returns:
dict– Resulting dictionary.
- static get_numpy_type_for_dali_type(dali_type)[source]
Get the numpy dtype corresponding to a DALI data type. :rtype:
typeNote
Only numeric and boolean DALI types are supported. A
ValueErroris raised for unsupported types.
- check_has_children(data_field_children=None, data_group_field_children=None, data_field_array_children=None, data_group_field_array_children=None, current_name=None)[source]
Check that required children are present; raise
ValueErrorif not.Convenience helper for validating presence and kinds of children.
- Parameters:
data_field_children (
Union[str,int,Sequence[Union[str,int]],None], default:None) – Required child names which must be data fields.data_group_field_children (
Union[str,int,Sequence[Union[str,int]],None], default:None) – Required child names which must be data group fields.data_field_array_children (
Union[str,int,Sequence[Union[str,int]],None], default:None) – Required child names which must be arrays of data fields.data_group_field_array_children (
Union[str,int,Sequence[Union[str,int]],None], default:None) – Required child names which must be arrays of data group fields.current_name (
Optional[str], default:None) – Name of the current element. Optional, only used to provide clearer error messages.
- Raises:
ValueError – If a required child is not present or is not of the expected type.