quant_rnn
Quantized RNN.
Classes
Base class for quantized RNN modules. |
|
Quantized RNN with input quantizer. |
|
A single layer of rnn modules. |
|
Reimplement the _VF rnn calls with python to enable input quantizers. |
Functions
Construct the forward call for different rnn cells. |
|
Construct the forward call for packed sequence. |
|
Construct the forward call for packed sequence in the reversed direction. |
|
Currently the _VF.lstm_cell doesn't accept projected inputs. |
|
Call input quantizer before calling cell. |
- class QuantRNNBase
Bases:
DynamicModule
Base class for quantized RNN modules.
- property all_input_quantizers_disabled
Check if all input quantizer are disabled.
- default_quant_desc_input = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, trt_high_precision_dtype='Float', calibrator='max')
- default_quant_desc_weight = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, trt_high_precision_dtype='Float', calibrator='max')
- forward(input, *args, **kwargs)
Quantize the input and the weight before calling the original forward method.
- property functionals_to_replace: Iterator[Tuple[module, str, Callable]]
Replace functions of packages on the fly.
- quantize_weight()
Context in which
self.weight
is quantized.
- weight_quantizer: TensorQuantizer | SequentialQuantizer
- class QuantRNNFullBase
Bases:
QuantRNNBase
Quantized RNN with input quantizer.
- class RNNLayerForward
Bases:
object
A single layer of rnn modules.
- __init__(cell, reverse=False, variable_len=False)
Init the layer forward for different cells, directions, and inputs.
- class VFRNNForward
Bases:
object
Reimplement the _VF rnn calls with python to enable input quantizers.
It’s less efficient compared to oringinal calls.
- __init__(mode, bidirectional, num_layers, has_proj, has_bias, input_quantizers, proj_input_quantizers=None, batch_first=False)
Pre-construct necessary parameters for vf calls to reduce overhead.
Refer to torch RNN modules for parameter informations.
- Parameters:
mode (str) –
bidirectional (bool) –
num_layers (int) –
has_proj (bool) –
has_bias (bool) –
input_quantizers (List[TensorQuantizer]) –
proj_input_quantizers (List[TensorQuantizer] | None) –
batch_first (bool | None) –
- forward(layer_forwards, input, flat_weights, hidden, dropout=0, training=True, batch_sizes=None)
This this the core implementation of vf rnn calls.
- Parameters:
layer_forwards (Tuple[Callable]) –
input (Tensor) –
flat_weights (List[Tensor]) –
hidden (Tensor | Tuple[Tensor]) –
dropout (float | None) –
training (bool | None) –
batch_sizes (Tensor | None) –
- get_quantized_rnn_layer_forward(cell, reverse=False)
Construct the forward call for different rnn cells.
Note that batch_sizes is here for keeping a consistant signature with the forward of variable length.
- get_quantized_rnn_layer_variable_len_forward(cell)
Construct the forward call for packed sequence.
- get_quantized_rnn_layer_variable_len_reverse_forward(cell)
Construct the forward call for packed sequence in the reversed direction.
- lstm_cell_with_proj(input, hidden, *weights, proj_input_quantizer=None)
Currently the _VF.lstm_cell doesn’t accept projected inputs. i.e. h_n and c_n must be same shape.
This implementation is not optimized for cuda compared to _VF.lstm_cell, so we only use it when projection exists.
- quantized_cell_forward(cell, input, hidden, weights, input_quantizer, proj_input_quantizer=None)
Call input quantizer before calling cell.