nf4_tensor

Implements NF4 quantization for efficient tensor storage and computation.

Classes

NF4QTensor

Implements the NF4 quantization on tensors for more efficient storage or computation.

class NF4QTensor

Bases: BaseQuantizedTensor

Implements the NF4 quantization on tensors for more efficient storage or computation.

quantized_data

The quantized data stored as a packed uint8 tensor.

Type:

torch.Tensor

dequantize(dtype=None, **kwarg)

Dequantze NF4 packed tensor to a target dtype.

Parameters:

dtype (dtype) –

classmethod double_quantization(scales, scale_block_size, num_scale_bits)

Perform double quantization on the scales.

Unlike the quantize method quantizing input data, this function quantizes float scales into int8 to further reduce memory usage of scales.

Parameters:
  • scales (Tensor) –

  • scale_block_size (int) –

  • num_scale_bits (int) –

classmethod quantize(input, block_size)

Converting a tensor to a quantized format based on NF4 double quantization.

Parameters:
  • input (torch.Tensor) – The input tensor to be quantized.

  • block_size (int) – The size of each block for quantization.

  • scale_block_size (int) – The block size for scaling during quantization.

Returns:

Contains quantized data, input quantization config, and scale quantization config.

Return type:

tuple