'How to accelerate compute of changeful matrix?

I'm using Genetic Algorithm to solve a TSP-like problem. In this problem, besides the route length of TSP problem, each route will cause a loss. So my final target is to minimize score = length*f1+loss*f2

The loss is compute based on the sum of a A = [T * N * W] matrix, but I will multiply every N row in A with a different value, the value is computed using the route and the distance matrix in TSP problem.

The procedure of compute the score is:

# each round has 100 lives
# T=7,N=45,W=46

1. generate a matrix M1 of size [7*4500*45] and move it to GPU
  # M1 is a concate of 100*[7*45*45], each is an array of 7 diag matrix
2. load M2 of size [7*45*46] to GPU,M2 is immutable
for each round:
  1. alter M1 based on the route
  2. use torch.sum(torch.bmm(M1,M2)) to compute the score

But the speed is very slow! If I don't alter M1, the speed is 5000 it/s, but if I alter M1 every round, the speed is 2 it/s! I've looked up and found that the overhead of moving data from CPU to GPU may be the problem.

The speed is very crucial to me, is there any way to accelerate this procedure? Can data in GPU be altered with little overhead?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source