'Why does inputting a tensor into a neural network fail to get an output?
I'm new to deep learning and trying to reproduce a neural renderer program.
FCN() is a neural renderer network that renders a 10-dimensional stroke parameter into a stroke on a 128x128 canvas. renderer.pkl is the network parameter trained by the author, the size of the input is batchsize x 10, here I assume batchsize==1; finally, five strokes are rendered on the canvas at a time.
Stroke generation I use random generation, when print(action), you can see that it is a non-zero [5,10] shape tensor.
But I don't know why canvas1 is still in the state of canvas0 (all 0), which means that the renderer is not working at all ???
I would appreciate if you could give a help.
import cv2
import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
class FCN(nn.Module):
def __init__(self):
super(FCN, self).__init__()
self.fc1 = (nn.Linear(10, 512))
self.fc2 = (nn.Linear(512, 1024))
self.fc3 = (nn.Linear(1024, 2048))
self.fc4 = (nn.Linear(2048, 4096))
self.conv1 = (nn.Conv2d(16, 32, 3, 1, 1))
self.conv2 = (nn.Conv2d(32, 32, 3, 1, 1))
self.conv3 = (nn.Conv2d(8, 16, 3, 1, 1))
self.conv4 = (nn.Conv2d(16, 16, 3, 1, 1))
self.conv5 = (nn.Conv2d(4, 8, 3, 1, 1))
self.conv6 = (nn.Conv2d(8, 4, 3, 1, 1))
self.pixel_shuffle = nn.PixelShuffle(2)
def forward(self, x): # b x 10
x = F.relu(self.fc1(x)) #512
x = F.relu(self.fc2(x)) #1024
x = F.relu(self.fc3(x)) #2048
x = F.relu(self.fc4(x)) #4096
x = x.view(-1, 16, 16, 16) #reshape b x16x16x16
x = F.relu(self.conv1(x)) # b x16x16x32
x = self.pixel_shuffle(self.conv2(x)) # b x16x16x32 (8x2x2) -> b x32x32x8
# (*, C×r^2, H, W)(∗,C×r,H×r,W×r)
x = F.relu(self.conv3(x)) # b x32x32x16
x = self.pixel_shuffle(self.conv4(x)) # b x32x32x16 -> b x64x64x4
x = F.relu(self.conv5(x)) # b x64x64x8
x = self.pixel_shuffle(self.conv6(x)) # b x64x64x4 -> b x128x128x1
x = torch.sigmoid(x)
return 1 - x.view(-1, 128, 128)
Decoder = FCN()
Decoder.load_state_dict(torch.load('renderer.pkl'))
def decode(x, canvas): # b * 10
x = x.view(-1, 10)
stroke = 1 - Decoder(x[:, :10])
stroke = stroke.view(-1, 128, 128, 1) #bsz x 128 x 128 x 1
stroke = stroke.permute(0, 3, 1, 2) # b x1x128x128
stroke = stroke.view(-1, 5, 1, 128, 128)
for i in range(5):
canvas = canvas * (1 - stroke[:, i])
return canvas
width = 128
canvas0 = torch.zeros([ 1, width, width], dtype=torch.uint8).to(device)
action=[]
for i in range(5): # 5 x 10
f = np.random.uniform(0,1,10)
action.append(f)
action = torch.tensor(action).float()
action = action.unsqueeze(0) # 1 x 5 x 10
canvas1 = decode(action, canvas0) # 1 x 1 x 128 x 128
canvas1 = torch.squeeze(canvas1, dim=0)
canvas1 = torch.squeeze(canvas1, dim=0) # 128 x 128
canvas1 = canvas1.detach().numpy()
Image.fromarray(np.uint8(canvas1))
print(canvas1)
Solution 1:[1]
I suspect the issue may be caused by the np.uint8 line. Most neural networks are parameterized such that their outputs fall in the range [0,1]. In this case, yours is (sigmoid activation function range is [0,1]). Casting any float in this range as an int will truncate it to 0. Try multiplying by 255 first so that the range of outputs is over a non-trivial space after you cast as an integer.
...
canvas1 = decode(action, canvas0) # 1 x 1 x 128 x 128
canvas1 = torch.squeeze(canvas1, dim=0)
canvas1 = torch.squeeze(canvas1, dim=0) # 128 x 128
canvas1 = canvas1.detach().numpy() * 255
Image.fromarray(np.uint8(canvas1))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | DerekG |
