class QuantRNNBase

Bases: DynamicModule

property all_input_quantizers_disabled

Check if all input quantizer are disabled.

default_quant_desc_input = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, trt_high_precision_dtype='Float', calibrator='max')
default_quant_desc_weight = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, trt_high_precision_dtype='Float', calibrator='max')
forward(input, *args, **kwargs)

Quantize the input and the weight before calling the original forward method.

property functionals_to_replace: Iterator[Tuple[module, str, Callable]]

Replace functions of packages on the fly.


Context in which self.weight is quantized.

weight_quantizer: TensorQuantizer | SequentialQuantizer
class QuantRNNFullBase

Bases: QuantRNNBase

class RNNLayerForward

Bases: object

__init__(cell, reverse=False, variable_len=False)

Init the layer forward for different cells, directions, and inputs.

class VFRNNForward

Bases: object

It’s less efficient compared to oringinal calls.

__init__(mode, bidirectional, num_layers, has_proj, has_bias, input_quantizers, proj_input_quantizers=None, batch_first=False)

Pre-construct necessary parameters for vf calls to reduce overhead.

Refer to torch RNN modules for parameter informations.

  • mode (str) –

  • bidirectional (bool) –

  • num_layers (int) –

  • has_proj (bool) –

  • has_bias (bool) –

  • input_quantizers (List[TensorQuantizer]) –

  • proj_input_quantizers (List[TensorQuantizer] | None) –

  • batch_first (bool | None) –

forward(layer_forwards, input, flat_weights, hidden, dropout=0, training=True, batch_sizes=None)

This this the core implementation of vf rnn calls.

  • layer_forwards (Tuple[Callable]) –

  • input (Tensor) –

  • flat_weights (List[Tensor]) –

  • hidden (Tensor | Tuple[Tensor]) –

  • dropout (float | None) –

  • training (bool | None) –

  • batch_sizes (Tensor | None) –

get_quantized_rnn_layer_forward(cell, reverse=False)

Note that batch_sizes is here for keeping a consistant signature with the forward of variable length.


lstm_cell_with_proj(input, hidden, *weights, proj_input_quantizer=None)

This implementation is not optimized for cuda compared to _VF.lstm_cell, so we only use it when projection exists.

quantized_cell_forward(cell, input, hidden, weights, input_quantizer, proj_input_quantizer=None)

