quant_rnn

Quantized RNN.

Classes

`QuantRNNBase`	Base class for quantized RNN modules.
`QuantRNNFullBase`	Quantized RNN with input quantizer.
`RNNLayerForward`	A single layer of rnn modules.
`VFRNNForward`	Reimplement the _VF rnn calls with python to enable input quantizers.

Functions

`get_quantized_rnn_layer_forward`	Construct the forward call for different rnn cells.
`get_quantized_rnn_layer_variable_len_forward`	Construct the forward call for packed sequence.
`get_quantized_rnn_layer_variable_len_reverse_forward`	Construct the forward call for packed sequence in the reversed direction.
`lstm_cell_with_proj`	Currently the _VF.lstm_cell doesn't accept projected inputs.
`quantized_cell_forward`	Call input quantizer before calling cell.

class QuantRNNBase

Bases: QuantModule

Base class for quantized RNN modules.

property all_input_quantizers_disabled: Check if all input quantizer are disabled.

default_quant_desc_input = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, bias=None, trt_high_precision_dtype='Float', calibrator='max', rotate=False)

default_quant_desc_weight = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, bias=None, trt_high_precision_dtype='Float', calibrator='max', rotate=False)

forward(input, *args, **kwargs): Quantize the input and the weight before calling the original forward method.

property functionals_to_replace: Iterator[tuple[ModuleType, str, Callable]]: Replace functions of packages on the fly.

class QuantRNNFullBase

Bases: QuantRNNBase

Quantized RNN with input quantizer.

class RNNLayerForward

Bases: object

A single layer of rnn modules.

__init__(cell, reverse=False, variable_len=False): Init the layer forward for different cells, directions, and inputs.

class VFRNNForward

Bases: object

Reimplement the _VF rnn calls with python to enable input quantizers.

It’s less efficient compared to oringinal calls.

__init__(mode, bidirectional, num_layers, has_proj, has_bias, input_quantizers, proj_input_quantizers=None, batch_first=False)

Pre-construct necessary parameters for vf calls to reduce overhead.

Refer to torch RNN modules for parameter informations.

Parameters:

forward(layer_forwards, input, flat_weights, hidden, dropout=0, training=True, batch_sizes=None)

This this the core implementation of vf rnn calls.

Parameters:

get_quantized_rnn_layer_forward(cell, reverse=False)

Construct the forward call for different rnn cells.

Note that batch_sizes is here for keeping a consistant signature with the forward of variable length.

get_quantized_rnn_layer_variable_len_forward(cell): Construct the forward call for packed sequence.

get_quantized_rnn_layer_variable_len_reverse_forward(cell): Construct the forward call for packed sequence in the reversed direction.

lstm_cell_with_proj(input, hidden, *weights, proj_input_quantizer=None)

Currently the _VF.lstm_cell doesn’t accept projected inputs. i.e. h_n and c_n must be same shape.

This implementation is not optimized for cuda compared to _VF.lstm_cell, so we only use it when projection exists.

quantized_cell_forward(cell, input, hidden, weights, input_quantizer, proj_input_quantizer=None): Call input quantizer before calling cell.