'translate CSRA from pyTorch to TensorFlow
I'm trying to implement the CSRA module from this paper in TensorFlow instead of PyTorch, using MobileNetV3 as a feature extractor.
I'm only versed in TensorFlow, and I'm running into some trouble translating from PyTorch to TensorFlow.
The official source code implementing the module uses some functions that are not commonly used in TensorFlow, and the tf docs for those functions offer little help.
The source code is the following:
class CSRA(nn.Module): # one basic block
def __init__(self, input_dim, num_classes, T, lam):
super(CSRA, self).__init__()
self.T = T # temperature
self.lam = lam # Lambda
self.head = nn.Conv2d(input_dim, num_classes, 1, bias=False)
self.softmax = nn.Softmax(dim=2)
def forward(self, x):
# x (B d H W)
# normalize classifier
# score (B C HxW)
score = self.head(x) / torch.norm(self.head.weight, dim=1, keepdim=True).transpose(0,1)
score = score.flatten(2)
base_logit = torch.mean(score, dim=2)
if self.T == 99: # max-pooling
att_logit = torch.max(score, dim=2)[0]
else:
score_soft = self.softmax(score * self.T)
att_logit = torch.sum(score * score_soft, dim=2)
return base_logit + self.lam * att_logit
score_soft = self.softmax(score * self.T)
att_logit = torch.sum(score * score_soft, dim=2)
return base_logit + self.lam * att_logit
I don't need to define CSRA as a class, I'm trying to stitch the module together as a keras functional model. This is the code I've managed to put together:
inputs = tf.keras.Input(shape=(None, None, 3))
head = tf.keras.layers.Conv2D(2, kernel_size=1, padding='same', use_bias=False, input_shape=(None, None, None, 960))
features = base_model(inputs, training=False)
print(head.get_weights())
score = head(features) / tf.transpose((tf.linalg.normalize(head.get_weights(), axis=3)), perm=(0,1))
shape = scores.get_shape().as_list()
score = tf.reshape(score, [-1, shape[1] * shape[2], shape[3]])
# scores = tf.reshape(scores[1:2])
avg_scores = tf.keras.backend.mean(score, axis=1)
max_scores_act = tf.keras.activations.softmax(score, axis=1)
max_scores = tf.math.reduce_sum(max_scores_act * score, axis=1)
outputs = (avg_scores + max_scores*0.2)
model = tf.keras.Model(inputs, outputs)
The major problem I'm having is the step to obtain a score in TensorFlow. The current problem I'm facing seems to be that the object returned by head.get_weights() is an empty list. I understand that because head is 'empty', its weights are zeroes, but how to define score then?
Besides this, I have many doubts regarding the translation. Since pyTorch works with "channels first" and TensorFlow with "channels last", I had to change the dims that are affected in each step, but I don't know if I'm doing it right and if all operations in PyTorch are necessary for TensorFlow.
Solution 1:[1]
After quite some digging of the pythorch docs, I mostly figured it out
What was throwing me off is that instead of using pre-made layers, the code is implementing the layers from grounds up
Regarding the access to the FC layer weights, there's a fundamental difference between pytorch's weight attribute and tensorflow's get_weights() method. The former returns the weights of the layer's output, while the latter returns the weights of the layer. Anyways, this step was necessary for weight normalization, which is conveniently implmented in tensorflow
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ghylander |
