'How to do a binary classification for output of LSTM and Linear Layer
i am trying to build a wake word model for my AI Assistant project. I am getting audios, convert them mfcc, give them to LSTM and LSTM gives me output (i use h_n output) shape like (4,32,32)which is directions∗num_layers, batch, hidden_size then i give it to my Linear Layer and it gives me (4,32,1).
I am trying to solve a binary classification problem so i have 2 classes 0 is dont wake up 1 is wake the AI.
But i dont understand the output of the Linear layer. I would imagine and output like (32,1) which would be batch size, prediction. But how should i process this (4,32,1) from linear Layer. I think i am missing something on the basics here.
Could you please explain it to me. I am leaving my model code below.
class LSTMWakeWord(nn.Module):
def __init__(self,input_size,hidden_size,num_layers,dropout,bidirectional,num_of_classes, device='cpu'):
super(LSTMWakeWord, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.num_layers = num_layers
self.device = device
self.bidirectional = bidirectional
self.directions = 2 if bidirectional else 1
self.lstm = nn.LSTM(input_size=input_size,
hidden_size = hidden_size,
num_layers = num_layers,
dropout=dropout,
bidirectional=bidirectional,
batch_first=True)
self.layernorm = nn.LayerNorm(input_size)
self.classifier = nn.Linear(hidden_size , num_of_classes)
def _init_hidden(self,batch_size):
n, d, hs = self.num_layers, self.directions, self.hidden_size
return (torch.zeros(n * d, batch_size, hs).to(self.device),
torch.zeros(n * d, batch_size, hs).to(self.device))
def forward(self,x):
# the values with e+xxx are gone. so it normalizes the values
x = self.layernorm(x)
# x shape -> feature(n_mfcc),batch,seq_len(time)
hidden = self._init_hidden(x.size()[0])
out, (hn, cn) = self.lstm(x, hidden)
print("hn "+str(hn.shape))# directions∗num_layers, batch, hidden_size
#print("out " + str(out.shape))# batch, seq_len, direction(2 or 1)*hidden_size
out = self.classifier(hn)
print("out2 " + str(out.shape))
return out
Solution 1:[1]
You can try this:
hn = hn[-1, :, :]
out = self.classifier(hn)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
