dequantize¶

nvtripy.dequantize(input: Tensor, scale: Tensor | Number | Sequence[Number] | Sequence[Sequence[Number]], dtype: dtype, dim: int | None = None) → Tensor[source]¶

Dequantizes the input tensor.

If dim is not given, this function will perform “per-tensor” or “block-wise” dequantization.

For “per-tensor” dequantization, the scale must be a scalar tensor or a single python number.
For “block-wise” dequantization, the dtype must only be nvtripy.int4. The input tensor must only have 2 dimensions, e.g. [D0, D1]. The scale must also be a 2-D tensor or a 2-D python sequence. The first dimension of scale must be able to divide D0, where “blocking” is performed. The second dimension of scale must equal to D1.

If dim is given, this function will perform “per-channel” dequantization. The scale must be a 1-D tensor or a python sequence both with size of input.shape[dim].

Parameters:

input (Tensor) – [dtype=T1] The input tensor with a valid quantized data type.
scale (Tensor | Number | Sequence[Number] | Sequence[Sequence[Number]]) – [dtype=T2] The scale tensor. Must be a constant tensor.
dtype (dtype) – [dtype=T2] The data type after dequantization. Must be nvtripy.float32 or nvtripy.float16.
dim (int | None) – The dimension for per-channel dequantization

Returns:

[dtype=T2] The dequantized tensor.

Return type:

Tensor

DATA TYPE CONSTRAINTS:

T1: float8, int4, int8
T2: float16, bfloat16, float32

Example: Per-tensor dequantization

input = tp.cast(tp.Tensor([1, 2, 3]), dtype=tp.int8)
scale = 0.99872
output = tp.dequantize(input, scale, tp.float32)

Local Variables¶

>>> input
tensor([1, 2, 3], dtype=int8, loc=gpu:0, shape=(3,))

>>> output
tensor([0.99872, 1.99744, 2.99616], dtype=float32, loc=gpu:0, shape=(3,))

Example: Per-channel dequantization

input = tp.cast(tp.Tensor([[1, 2, 3], [4, 5, 6]]), dtype=tp.int8)
scale = [0.99872, 0.96125]
output = tp.dequantize(input, scale, tp.float32, dim=0)

Local Variables¶

>>> input
tensor(
    [[1, 2, 3],
     [4, 5, 6]], 
    dtype=int8, loc=gpu:0, shape=(2, 3))

>>> output
tensor(
    [[0.99872, 1.99744, 2.99616],
     [3.845, 4.80625, 5.7675]], 
    dtype=float32, loc=gpu:0, shape=(2, 3))

Example: Block-wise dequantization

input = tp.Tensor([[0.0, 1.0], [2.0, 3.0]])
scale = [[1.0, 1.0]]
quant = tp.quantize(input, scale, tp.int4)
output = tp.dequantize(quant, scale, tp.float32)

Local Variables¶

>>> output
tensor(
    [[0, 1],
     [2, 3]], 
    dtype=float32, loc=gpu:0, shape=(2, 2))