int4_tensor
Implements INT4 quantization for efficient tensor storage and computation.
Classes
Implements the INT4 quantization on tensors for more efficient storage or computation. |
- class INT4QTensor
Bases:
BaseQuantizedTensor
Implements the INT4 quantization on tensors for more efficient storage or computation.
- quantized_data
The quantized data stored as a packed uint8 tensor.
- Type:
torch.Tensor
- dequantize(dtype=None, **kwarg)
Dequantze INT4 packed tensor to a target dtype.
- Parameters:
dtype (dtype) –
- classmethod quantize(input, block_size)
Converting a tensor to a quantized format based on INT4 (AWQ) quantization.
- Parameters:
input (torch.Tensor) – The input tensor to be quantized.
block_size (int) – The size of each block for quantization.
- Returns:
Contains quantized data, input quantization config, and scale quantization config.
- Return type:
tuple