'About the memory usage of Mobilenet

I'm building MobileNetV1 with Pytorch and had my memory ran out every time I train the model. (The pytorch log "Killed!" and suddenly crashed).
This is my code

Config file: (yaml)

n_gpu: 0

arch: 
    type: MobileNet
    args: 
        in_channels: 3
        num_classes: 26
    
data_loader: 
    type: BallDataLoader
    args:
        data_dir: data/balls/
        batch_size: 64
        shuffle: true
        validation_split: 0.2
        num_workers: 0
        resize: 
        - 224
        - 224
    
optimizer:
    type: Adam
    args:
        lr: 1.0e-2
        weight_decay: 0
        amsgrad: true
    
loss: nll_loss
metrics: 
    - accuracy
    - top_k_acc

lr_scheduler: 
    type: StepLR
    args: 
        step_size: 50
        gamma: 0.1
    

trainer: 
    epochs: 50
    save_dir: saved/
    save_period: 2
    verbosity: 2
    monitor: min val_loss
    early_stop: 10
    tensorboard: true

modules.py:

class DepthwiseSeparableConv(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size = 3, stride = 1, padding = None):
        super().__init__()
        if padding == None:
            padding = kernel_size // 2
        self.depth_wise_conv = nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, groups= in_channels)
        self.bn1 = nn.BatchNorm2d(in_channels)
        self.point_wise_conv = nn.Conv2d(in_channels, out_channels, (1,1), 1, 0)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.in_channels = in_channels
        self.out_channels = out_channels

    def forward(self, x):
        x = self.depth_wise_conv(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.point_wise_conv(x)
        x = self.bn2(x)
        x = F.relu(x)
        return x

model.py

class MobileNet(ImageNet):
    def __init__(self, in_channels = 3, num_classes = 1000):
        super().__init__()
        
        self.convs = nn.Sequential(
            nn.Conv2d(in_channels, 32, kernel_size= 3, padding= 1, stride = 1 ),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace = True),
            DepthwiseSeparableConv(32, 64),
            DepthwiseSeparableConv(64, 128, stride = 2),
            DepthwiseSeparableConv(128, 128),
            DepthwiseSeparableConv(128, 256),
            DepthwiseSeparableConv(256, 256),
            DepthwiseSeparableConv(256, 512, stride = 2),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 1024, stride = 1),
            DepthwiseSeparableConv(1024, 1024, stride= 2),
            nn.AdaptiveAvgPool2d(1)
        )
        self.fc = nn.Linear(1024, num_classes)

    def forward(self, x):
        x = self.convs(x)
        x = x.view(-1, 1024)
        x = self.fc(x)
        x = F.log_softmax(x, dim = 1)
        return x

So I found a model from https://github.com/jmjeon94/MobileNet-Pytorch, and it worked. After hours I still can't find out why this happened as the models are nearly identical, and since the architect of mobilenet is farely light, this shouldn't take much space to run I supposed. Is there any chance that this is because of the python interpreter or there are actually something wrong with my code?



Solution 1:[1]

I think it's because of your batch size. Try using smaller batch size like 32,16,8,4,2.

Solution 2:[2]

I delete the line nn.Conv2d(in_channels, 32, kernel_size= 3, padding= 1, stride = 1 ) and rewrite the same and the code ran. Still don't know why but it seem to be the interpreter or the text editor which cause the error. Thank you for attending. And special thanks to Mr. @Anmol Narang for your efford, I'm very appreciated.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Anmol Narang
Solution 2 Do Nam