'speeding up 1d convolution in PyTorch

For my project I am using pytorch as a linear algebra backend. For the performance part of my code, I need to do 1D convolutions of 2 small (length between 2 and 9) vectors (1D tensors) a very large number of times. My code allows for batch-processing of inputs and thus I can stack a couple of input vectors to create matrices that can then be convolved all at the same time. Since torch.conv1d does not allow for convolving along a single dimension for 2D inputs, I had to write my own convolution function called convolve. This new function however consists of a double for-loop and is therefore very very slow.

Question: how can I make the convolve function perform faster through better code-design and let it be able to deal with batched inputs (=2D tensors)?

Partial answer: somehow avoid the double for-loop

Below are three jupyter notebook cells that recreate a minimal example. Note that the you need line_profiler and the %%writefile magic command to make this work!

%%writefile SO_CONVOLVE_QUESTION.py
import torch

def conv1d(a, v):
    padding = v.shape[-1] - 1
    return torch.conv1d(
        input=a.view(1, 1, -1), weight=v.flip(0).view(1, 1, -1), padding=padding, stride=1
    ).squeeze()

def convolve(a, v):
    if a.ndim == 1:
        a = a.view(1, -1)
        v = v.view(1, -1) 

    nrows, vcols = v.shape
    acols = a.shape[1]

    expanded = a.view((nrows, acols, 1)) * v.view((nrows, 1, vcols))
    noutdim = vcols + acols - 1
    out = torch.zeros((nrows, noutdim))
    for i in range(acols):  
        for j in range(vcols):
            out[:, i+j] += expanded[:, i, j]  
    return out.squeeze()
    
x = torch.randn(5)
y = torch.randn(7)

I write the code to the SO_CONVOLVE_QUESTION.py because that is necessary for line_profiler and to use as a setup for timeit.timeit.

Now we can evaluate the output and performance of the code above on non-batch input (x, y) and batched input (x_batch, y_batch):

from SO_CONVOLVE_QUESTION import *
# Without batch processing
res1 = conv1d(x, y)
res = convolve(x, y)
print(torch.allclose(res1, res)) # True

# With batch processing, NB first dimension!
x_batch = torch.randn(5, 5)
y_batch = torch.randn(5, 7)

results = []
for i in range(5):
    results.append(conv1d(x_batch[i, :], y_batch[i, :]))
res1 = torch.stack(results)
res = convolve(x_batch, y_batch)
print(torch.allclose(res1, res))  # True

print(timeit.timeit('convolve(x, y)', setup=setup, number=10000)) # 4.83391789999996
print(timeit.timeit('conv1d(x, y)', setup=setup, number=10000))   # 0.2799923000000035

In the block above you can see that performing convolution 5 times using the conv1d function produces the same result as convolve on batched inputs. We can also see that convolve (= 4.8s) is much slower than the conv1d (= 0.28s). Below we assess the slow part of the convolve function WITHOUT batch processing using line_profiler:

%load_ext line_profiler
%lprun -f convolve convolve(x, y)  # evaluated without batch-processing!

Output:

Timer unit: 1e-07 s

Total time: 0.0010383 s
File: C:\python_projects\pysumo\SO_CONVOLVE_QUESTION.py
Function: convolve at line 9

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     9                                           def convolve(a, v):
    10         1         68.0     68.0      0.7      if a.ndim == 1:
    11         1        271.0    271.0      2.6          a = a.view(1, -1)
    12         1         44.0     44.0      0.4          v = v.view(1, -1) 
    13                                           
    14         1         28.0     28.0      0.3      nrows, vcols = v.shape
    15         1         12.0     12.0      0.1      acols = a.shape[1]
    16                                           
    17         1       4337.0   4337.0     41.8      expanded = a.view((nrows, acols, 1)) * v.view((nrows, 1, vcols))
    18         1         12.0     12.0      0.1      noutdim = vcols + acols - 1
    19         1        127.0    127.0      1.2      out = torch.zeros((nrows, noutdim))
    20         6         32.0      5.3      0.3      for i in range(acols):  
    21        40        209.0      5.2      2.0          for j in range(vcols):
    22        35       5194.0    148.4     50.0              out[:, i+j] += expanded[:, i, j]  
    23         1         49.0     49.0      0.5      return out.squeeze()

Obviously a double for-loop and the line creating the expanded tensor are the slowest. Are these parts avoidable with better code-design?

Solution 1:^[1]

Turns out that there is a way to do it without for-loops via grouping of the inputs along a dimension:

out = torch.conv1d(x_batch.unsqueeze(0), y_batch.unsqueeze(1).flip(2), padding=y_batch.size(1)-1, groups=x_batch.size(0))
print(torch.allclose(out, res1))  # True

Solution 2:^[2]

Pytorch has a batch analyzing tool called torch.nn.functional and there you have a conv1d function (obviously 2d as well and much much more). we will use conv1d.

Suppose you want to convolve 100 vectors given in v1 with 1 another vector given in v2. v1 has dimension of (minibatch , in channels , weights) and you need 1 channel by default. In addition, v2 has dimensions of * (\text{out_channels} , (out_channels,groups / in_channels,kW)*. You are using 1 channel and therefore 1 group so v1 and v2 will be given by:

import torch
from torch.nn import functional as F

num_vectors = 100
len_vectors = 9
v1 = torch.rand((num_vectors, 1, len_vectors))
v2 = torch.rand(1, 1, 6)

now we can simply compute the necessary padding via

padding = torch.min(torch.tensor([v1.shape[-1], v2.shape[-1]])).item() - 1

and the convolution can be done using

conv_result = temp = F.conv1d(v1, v2, padding=padding)

I did not time it but it should be considerably faster than your initial double for loop.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Patrickens
Solution 2	Tomer Geva

'speeding up 1d convolution in PyTorch

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]