Torch Dataloader

class nvtabular.loader.torch.DLDataLoader(dataset: torch.utils.data.dataset.Dataset[torch.utils.data.dataloader.T_co], batch_size: Optional[int] = 1, shuffle: bool = False, sampler: Optional[torch.utils.data.sampler.Sampler[int]] = None, batch_sampler: Optional[torch.utils.data.sampler.Sampler[Sequence[int]]] = None, num_workers: int = 0, collate_fn: Optional[Callable[[List[torch.utils.data.dataloader.T]], Any]] = None, pin_memory: bool = False, drop_last: bool = False, timeout: float = 0, worker_init_fn: Optional[Callable[[int], None]] = None, multiprocessing_context=None, generator=None, *, prefetch_factor: int = 2, persistent_workers: bool = False)[source]

Bases: Generic[torch.utils.data.dataloader.T_co]

This class is an extension of the torch dataloader. It is required to support the FastAI framework.

class nvtabular.loader.torch.TorchAsyncItr(dataset, cats=None, conts=None, labels=None, batch_size=1, shuffle=False, parts_per_chunk=1, devices=None)[source]

Bases: torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co]

This class creates batches of tensor. Each batch size is specified by the user. The data input requires an NVTabular dataset. Handles spillover to ensure all batches are the specified size until the final batch.

Parameters
  • dataset (NVTabular dataset) –

  • cats ([str]) – the list of categorical columns in the dataset

  • conts ([str]) – the list of continuous columns in the dataset

  • labels ([str]) – the list of label columns in the dataset

  • batch_size (int) – the size of each batch to supply to the model

  • shuffle (bool) – enable/disable shuffling of dataset

  • parts_per_chunk (int) – number of partitions from the iterator, an NVTabular Dataset, to concatenate into a “chunk”

  • devices ([int]) – list representing all available GPU IDs

Torch Layers

class nvtabular.framework_utils.torch.layers.embeddings.ConcatenatedEmbeddings(embedding_table_shapes, dropout=0.0)[source]

Bases: torch.nn.modules.module.Module

Map multiple categorical variables to concatenated embeddings.

Parameters
  • embedding_table_shapes – A dictionary mapping column names to (cardinality, embedding_size) tuples.

  • dropout – A float.

Inputs:

x: An int64 Tensor with shape [batch_size, num_variables].

Outputs:

A Float Tensor with shape [batch_size, embedding_size_after_concat].

class nvtabular.framework_utils.torch.layers.embeddings.MultiHotEmbeddings(embedding_table_shapes, dropout=0.0, mode='sum')[source]

Bases: torch.nn.modules.module.Module

Map multiple categorical variables to concatenated embeddings.

Parameters
  • embedding_dict_shapes – A dictionary mapping column names to (cardinality, embedding_size) tuples.

  • dropout – A float.

Inputs:
x: A dictionary with multi-hot column name as keys and a tuple

containing the column values and offsets as values.

Outputs:

A Float Tensor with shape [batch_size, embedding_size_after_concat].

class nvtabular.framework_utils.torch.models.Model(embedding_table_shapes, num_continuous, emb_dropout, layer_hidden_dims, layer_dropout_rates, max_output=None, bag_mode='sum')[source]

Bases: torch.nn.modules.module.Module

Generic Base Pytorch Model, that contains support for Categorical and Continous values.

Parameters
  • embedding_tables_shapes (dict) – A dictionary representing the <column>: <max cardinality of column> for all categorical columns.

  • num_continuous (int) – Number of continuous columns in data.

  • emb_dropout (float, 0 - 1) – Sets the embedding dropout rate.

  • layer_hidden_dims (list) – Hidden layer dimensions.

  • layer_dropout_rates (list) – A list of the layer dropout rates expressed as floats, 0-1, for each layer

  • max_output (float) – Signifies the max output.