Conv¶
- class nvtripy.Conv(in_channels: int, out_channels: int, kernel_dims: Sequence[int], stride: Sequence[int] | None = None, padding: Sequence[Tuple[int, int]] | None = None, dilation: Sequence[int] | None = None, groups: int | None = None, bias: bool = True, dtype: dtype = float32)[source]¶
Applies a convolution on the input tensor.
With an input of shape \((N, C_{\text{in}}, D_0,\ldots,D_n)\) and output of shape \((N, C_{\text{out}}, D_{0_{\text{out}}},\ldots,D_{n_{\text{out}}})\) the output values are given by:
\[\text{out}(N_i, C_{\text{out}_j}) = \text{Bias}_{C_{\text{out}}} + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)\]where \(\star\) is the cross-correlation operator applied over the spatial dimensions of the input and kernel, \(N\) is the batch dimension, \(C\) is the channel dimension, and \(D_0,\ldots,D_n\) are the spatial dimensions.
- Parameters:
in_channels (int) – The number of channels in the input tensor.
out_channels (int) – The number of channels produced by the convolution.
kernel_dims (Sequence[int]) – The spatial shape of the kernel.
padding (Sequence[Tuple[int, int]]) – A sequence of pairs of integers of length \(M\) indicating the zero padding to apply to the input along each spatial dimension before and after the dimension respectively, where \(M\) is the number of spatial dimensions, i.e. \(M = \text{rank(input)} - 2\). Defaults to all 0.
stride (Sequence[int]) – A sequence of length \(M\) indicating the stride of convolution across each spatial dimension, where \(M\) is the number of spatial dimensions, i.e. \(M = \text{rank(input)} - 2\). Defaults to all 1.
groups (int) – The number of groups in a grouped convolution where the input and output channels are divided into
groups
groups. Each output group is connected only to its corresponding input group through the convolution kernel weights, and the outputs for each group are concatenated to produce the final result. This is in contrast to a standard convolution which has full connectivity between all input and output channels. Grouped convolutions reduce computational cost by a factor ofgroups
and can benefit model parallelism and memory usage. Note that in_channels and out_channels must both be divisible bygroups
. Defaults to 1 (standard convolution).dilation (Sequence[int]) – A sequence of length \(M\) indicating the number of zeros to insert between kernel weights across each spatial dimension, where \(M\) is the number of spatial dimensions, i.e. \(M = \text{rank(input)} - 2\). This is known as the a trous algorithm and further downsamples the output by increasing the receptive field of the kernel. For each dimension with value \(x\), \(x-1\) zeros are inserted between kernel weights.
bias (Tensor | None) – Whether to add a bias term to the output or not. The bias has a shape of \((\text{out_channels},)\).
dtype (dtype) – The data type to use for the convolution weights.
Example
1input = tp.reshape(tp.arange(16, dtype=tp.float32), (1, 1, 4, 4)) 2conv = tp.Conv( 3 in_channels=1, out_channels=1, kernel_dims=(2, 2), dtype=tp.float32 4) 5 6conv.weight = tp.iota(conv.weight.shape) 7conv.bias = tp.iota(conv.bias.shape) 8 9output = conv(input)
Local Variables¶>>> input tensor( [[[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]]]], dtype=float32, loc=gpu:0, shape=(1, 1, 4, 4)) >>> conv Conv( bias: Parameter = (shape=(1,), dtype=float32), weight: Parameter = (shape=(1, 1, 2, 2), dtype=float32), ) >>> conv.state_dict() { bias: tensor([0], dtype=float32, loc=gpu:0, shape=(1,)), weight: tensor( [[[[0, 0], [0, 0]]]], dtype=float32, loc=gpu:0, shape=(1, 1, 2, 2)), } >>> output tensor( [[[[0, 0, 0], [0, 0, 0], [0, 0, 0]]]], dtype=float32, loc=gpu:0, shape=(1, 1, 3, 3))
Example: Using Padding and Stride
1input = tp.reshape(tp.arange(16, dtype=tp.float32), (1, 1, 4, 4)) 2conv = tp.Conv( 3 1, 4 1, 5 (3, 3), 6 padding=((1, 1), (1, 1)), 7 stride=(3, 1), 8 bias=False, 9 dtype=tp.float32, 10) 11 12conv.weight = tp.iota(conv.weight.shape) 13 14output = conv(input)
Local Variables¶>>> input tensor( [[[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]]]], dtype=float32, loc=gpu:0, shape=(1, 1, 4, 4)) >>> conv Conv( weight: Parameter = (shape=(1, 1, 3, 3), dtype=float32), ) >>> conv.state_dict() { weight: tensor( [[[[0, 0, 0], [0, 0, 0], [0, 0, 0]]]], dtype=float32, loc=gpu:0, shape=(1, 1, 3, 3)), } >>> output tensor( [[[[0, 0, 0, 0], [0, 0, 0, 0]]]], dtype=float32, loc=gpu:0, shape=(1, 1, 2, 4))
Example: Depthwise Convolution
1input = tp.reshape(tp.arange(18, dtype=tp.float32), (1, 2, 3, 3)) 2conv = tp.Conv(2, 2, (3, 3), groups=2, bias=False, dtype=tp.float32) 3 4conv.weight = tp.iota(conv.weight.shape) 5 6output = conv(input)
Local Variables¶>>> input tensor( [[[[0, 1, 2], [3, 4, 5], [6, 7, 8]], [[9, 10, 11], [12, 13, 14], [15, 16, 17]]]], dtype=float32, loc=gpu:0, shape=(1, 2, 3, 3)) >>> conv Conv( weight: Parameter = (shape=(2, 1, 3, 3), dtype=float32), ) >>> conv.state_dict() { weight: tensor( [[[[0, 0, 0], [0, 0, 0], [0, 0, 0]]], [[[1, 1, 1], [1, 1, 1], [1, 1, 1]]]], dtype=float32, loc=gpu:0, shape=(2, 1, 3, 3)), } >>> output tensor( [[[[0]], [[117]]]], dtype=float32, loc=gpu:0, shape=(1, 2, 1, 1))
Example: Dilated Convolution (a trous algorithm)
1input = tp.reshape(tp.arange(9, dtype=tp.float32), (1, 1, 3, 3)) 2conv = tp.Conv(1, 1, (2, 2), dilation=(2, 2), bias=False, dtype=tp.float32) 3 4conv.weight = tp.iota(conv.weight.shape) 5 6output = conv(input)
Local Variables¶>>> input tensor( [[[[0, 1, 2], [3, 4, 5], [6, 7, 8]]]], dtype=float32, loc=gpu:0, shape=(1, 1, 3, 3)) >>> conv Conv( weight: Parameter = (shape=(1, 1, 2, 2), dtype=float32), ) >>> conv.state_dict() { weight: tensor( [[[[0, 0], [0, 0]]]], dtype=float32, loc=gpu:0, shape=(1, 1, 2, 2)), } >>> output tensor([[[[0]]]], dtype=float32, loc=gpu:0, shape=(1, 1, 1, 1))
- padding: Sequence[Tuple[int, int]]¶
A sequence of pairs of integers of length \(M\) indicating the zero padding to apply to the input along each spatial dimension before and after the dimension respectively, where \(M\) is the number of spatial dimensions, i.e. \(M = \text{rank(input)} - 2\).
- stride: Sequence[int]¶
A sequence of length \(M\) indicating the stride of convolution across each spatial dimension, where \(M\) is the number of spatial dimensions, i.e. \(M = \text{rank(input)} - 2\).
- __call__(*args: Any, **kwargs: Any) Any ¶
Calls the module with the specified arguments.
- Parameters:
*args (Any) – Positional arguments to the module.
**kwargs (Any) – Keyword arguments to the module.
- Returns:
The outputs computed by the module.
- Return type:
Any
Example
1class Module(tp.Module): 2 def forward(self, x): 3 return tp.relu(x) 4 5 6module = Module() 7 8input = tp.arange(-3, 3) 9out = module(input) # Note that we do not call `forward` directly.
Local Variables¶>>> module Module( ) >>> module.state_dict() {} >>> input tensor([-3, -2, -1, 0, 1, 2], dtype=float32, loc=gpu:0, shape=(6,)) >>> out tensor([0, 0, 0, 0, 1, 2], dtype=float32, loc=gpu:0, shape=(6,))
- load_state_dict(state_dict: Dict[str, Tensor], strict: bool = True) Tuple[Set[str], Set[str]] ¶
Loads parameters from the provided
state_dict
into the current module. This will recurse over any nested child modules.- Parameters:
- Returns:
missing_keys: keys that are expected by this module but not provided in
state_dict
.unexpected_keys: keys that are not expected by this module but provided in
state_dict
.
- Return type:
A
tuple
of twoset
s of strings representing
Example
1class MyModule(tp.Module): 2 def __init__(self): 3 super().__init__() 4 self.param = tp.ones((2,), dtype=tp.float32) 5 6 7module = MyModule() 8 9print(f"Before: {module.param}") 10 11module.load_state_dict({"param": tp.zeros((2,), dtype=tp.float32)}) 12 13print(f"After: {module.param}")
Output¶Before: tensor([1, 1], dtype=float32, loc=gpu:0, shape=(2,)) After: tensor([0, 0], dtype=float32, loc=gpu:0, shape=(2,))
See also
- named_children() Iterator[Tuple[str, Module]] ¶
Returns an iterator over immediate children of this module, yielding tuples containing the name of the child module and the child module itself.
- Returns:
An iterator over tuples containing the name of the child module and the child module itself.
- Return type:
Iterator[Tuple[str, Module]]
Example
1class StackedLinear(tp.Module): 2 def __init__(self): 3 super().__init__() 4 self.linear1 = tp.Linear(2, 2) 5 self.linear2 = tp.Linear(2, 2) 6 7 8stacked_linear = StackedLinear() 9 10for name, module in stacked_linear.named_children(): 11 print(f"{name}: {type(module).__name__}")
Output¶linear1: Linear linear2: Linear
- named_parameters() Iterator[Tuple[str, Tensor]] ¶
- Returns:
An iterator over tuples containing the name of a parameter and the parameter itself.
- Return type:
Iterator[Tuple[str, Tensor]]
Example
1class MyModule(tp.Module): 2 def __init__(self): 3 super().__init__() 4 self.alpha = tp.Tensor(1) 5 self.beta = tp.Tensor(2) 6 7 8linear = MyModule() 9 10for name, parameter in linear.named_parameters(): 11 print(f"{name}: {parameter}")
Output¶alpha: tensor(1, dtype=int32, loc=cpu:0, shape=()) beta: tensor(2, dtype=int32, loc=cpu:0, shape=())
- state_dict() Dict[str, Tensor] ¶
Returns a dictionary mapping names to parameters in the module. This will recurse over any nested child modules.
- Returns:
A dictionary mapping names to parameters.
- Return type:
Dict[str, Tensor]
Example
1class MyModule(tp.Module): 2 def __init__(self): 3 super().__init__() 4 self.param = tp.ones((2,), dtype=tp.float32) 5 self.linear1 = tp.Linear(2, 2) 6 self.linear2 = tp.Linear(2, 2) 7 8 9module = MyModule() 10 11state_dict = module.state_dict()
Local Variables¶>>> state_dict { param: tensor([1, 1], dtype=float32, loc=gpu:0, shape=(2,)), linear1.weight: <nvtripy.frontend.module.parameter.DefaultParameter object at 0x79774be4d490>, linear1.bias: <nvtripy.frontend.module.parameter.DefaultParameter object at 0x79774be45730>, linear2.weight: <nvtripy.frontend.module.parameter.DefaultParameter object at 0x79774b9c9850>, linear2.bias: <nvtripy.frontend.module.parameter.DefaultParameter object at 0x79774b9c9be0>, }
- groups: int¶
The number of groups in a grouped convolution where the input and output channels are divided into
groups
groups. Each output group is connected only to its corresponding input group through the convolution kernel weights, and the outputs for each group are concatenated to produce the final result. This is in contrast to a standard convolution which has full connectivity between all input and output channels. Grouped convolutions reduce computational cost by a factor ofgroups
and can benefit model parallelism and memory usage. Note that in_channels and out_channels must both be divisible bygroups
.
- dilation: Sequence[int]¶
A sequence of length \(M\) indicating the number of zeros to insert between kernel weights across each spatial dimension, where \(M\) is the number of spatial dimensions, i.e. \(M = \text{rank(input)} - 2\). This is known as the a trous algorithm and further downsamples the output by increasing the receptive field of the kernel. For each dimension with value \(x\), \(x-1\) zeros are inserted between kernel weights.
- bias: Tensor | None¶
The bias term to add to the output. The bias has a shape of \((\text{out_channels},)\).
- weight: Tensor¶
The kernel of shape \((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}}, *\text{kernel_dims})\).
- forward(input: Tensor) Tensor [source]¶
- Parameters:
input (Tensor) – The input tensor.
- Returns:
A tensor of the same data type as the input with a shape \((N, \text{out_channels}, D_{0_{\text{out}}},\ldots,D_{n_{\text{out}}})\) where \(D_{k_{\text{out}}} = \large \left\lfloor \frac{D_{k_{\text{in}}} + \text{padding}_{k_0} + \text{padding}_{k_1} - \text{dilation}_k \times (\text{kernel_dims}_k - 1) - 1}{\text{stride}_k} \right\rfloor + \normalsize 1\)
- Return type: