'Given groups=1, weight of size [10, 1, 5, 5], expected input[2, 3, 28, 28] to have 1 channels, but got 3 channels instead

I am trying to run CNN with train MNIST, but test on my own written digits. To do that I wrote the following code but I getting an error in title of this questions: I am trying to run CNN with train MNIST, but test on my own written digits. To do that I wrote the following code but I getting an error in title of this questions:

batch_size = 64
train_dataset = datasets.MNIST(root='./data/',
                               train=True,
                               transform=transforms.ToTensor(),
                               download=True)
test_dataset = ImageFolder('my_digit_images/', transform=transforms.ToTensor())
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        #print(self.conv1.weight.shape)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv3 = nn.Conv2d(20, 20, kernel_size=3)
       #print(self.conv2.weight.shape)
        self.mp = nn.MaxPool2d(2)
        self.fc = nn.Linear(320, 10)

    def forward(self, x):
        in_size = x.size(0)
        x = F.relu(self.conv1(x))
        #print(x.shape)
        x = F.relu(self.mp(self.conv2(x)))
        x = F.relu(self.mp(self.conv3(x)))
        
        #print("2.", x.shape)
       # x = F.relu(self.mp(self.conv3(x)))
        x = x.view(in_size, -1)  # flatten the tensor
        #print("3.", x.shape)
        x = self.fc(x)
        return F.log_softmax(x)
model = Net()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))


def test():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data, target = Variable(data, volatile=True), Variable(target)
        output = model(data)
        test_loss += F.nll_loss(output, target, size_average=False).data
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.eq(target.data.view_as(pred)).cpu().sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

Solution 1:^[1]

MNIST dataset contains black and white 1-channel images, while yours are 3-channeled RGB probably. Either recode your images or preprocess it like

img = img[:,0:1,:,:]

You can do it with custom transform, adding it after transforms.ToTensor()

Solution 2:^[2]

The images in training and testing should follow the same distribution. Since MNIST data is by default in Grayscale and it is expected that you didn't change the channels, then the model expects the same number of channels in testing.

The following code is an example of how it's done using a transformation. Following the order defined below, it

Converts the image to a single channel (Grayscale)
Resize the image to the size of the default MNIST data
Convert the image to a tensor
Normalize the tensor to have same mean and std as that of during training(assuming that you used the same values).


test_dataset = ImageFolder('my_digit_images/', transform=transforms.Compose([transforms.Grayscale(num_output_channels=1),
                                                                            transforms.Resize((28, 28)),
                                                                            transforms.ToTensor(),
                                                                            transforms.Normalize((0.1307,), (0.3081,))]))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Alexey Birukov
Solution 2	Pathi_rao

'Given groups=1, weight of size [10, 1, 5, 5], expected input[2, 3, 28, 28] to have 1 channels, but got 3 channels instead

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]