dequantize

nvtripy.dequantize(input: Tensor, scale: Tensor | Number | Sequence[Number] | Sequence[Sequence[Number]], dtype: dtype, dim: int | Any = None) Tensor[source]

Dequantizes the input tensor.

If dim is not given, this function will perform “per-tensor” or “block-wise” dequantization.

  • For “per-tensor” dequantization, the scale must be a scalar tensor or a single python number.

  • For “block-wise” dequantization, the dtype must only be nvtripy.int4. The input tensor must only have 2 dimensions, e.g. [D0, D1]. The scale must also be a 2-D tensor or a 2-D python sequence. The first dimension of scale must be able to divide D0, where “blocking” is performed. The second dimension of scale must equal to D1.

If dim is given, this function will perform “per-channel” dequantization. The scale must be a 1-D tensor or a python sequence both with size of input.shape[dim].

Parameters:
  • input (Tensor) – [dtype=T1] The input tensor with a valid quantized data type.

  • scale (Tensor | Number | Sequence[Number] | Sequence[Sequence[Number]]) – [dtype=T2] The scale tensor. Must be a constant tensor.

  • dtype (dtype) – [dtype=T2] The data type after dequantization. Must be nvtripy.float32 or nvtripy.float16.

  • dim (int | Any) – The dimension for per-channel dequantization

Returns:

[dtype=T2] The dequantized tensor.

Return type:

Tensor

TYPE CONSTRAINTS:
Example: Per-tensor dequantization
1input = tp.Tensor([1, 2, 3], dtype=tp.int8)
2scale = 0.99872
3output = tp.dequantize(input, scale, tp.float32)
Local Variables
>>> input
tensor([1, 2, 3], dtype=int8, loc=gpu:0, shape=(3,))

>>> output
tensor([0.9987, 1.9974, 2.9962], dtype=float32, loc=gpu:0, shape=(3,))
Example: Per-channel dequantization
1input = tp.Tensor([[1, 2, 3], [4, 5, 6]], dtype=tp.int8)
2scale = [0.99872, 0.96125]
3output = tp.dequantize(input, scale, tp.float32, dim=0)
Local Variables
>>> input
tensor(
    [[1, 2, 3],
     [4, 5, 6]], 
    dtype=int8, loc=gpu:0, shape=(2, 3))

>>> output
tensor(
    [[0.9987, 1.9974, 2.9962],
     [3.8450, 4.8063, 5.7675]], 
    dtype=float32, loc=gpu:0, shape=(2, 3))
Example: Block-wise dequantization
1input = tp.Tensor([[0, 1], [2, 3]], dtype=tp.float32)
2scale = [[1.0, 1.0]]
3quant = tp.quantize(input, scale, tp.int4)
4output = tp.dequantize(quant, scale, tp.float32)
Local Variables
>>> output
tensor(
    [[0.0000, 1.0000],
     [2.0000, 3.0000]], 
    dtype=float32, loc=gpu:0, shape=(2, 2))

See also

quantize()