'What algorithm should I use for the following sequential dataset training?
I have a dataset that contains opcode sequences of malware files. I read a paper where the author tried to implement a RNN algorithm like LSTM, but he specified a preprocessing step where he creates a word-bag and uses Word2Vec to convert everything into vectorized format. I am stuck at this place. Any help would be appreciated.
model = gensim.models.Word2Vec()
model.build_vocab(sequence_text, progress_per=1000)
model.train(sequence_text, total_examples=model.corpus_count, epochs=model.epochs)
I will also put a screenshot of the CSV file.

Ultimate Goal: I need to identify if a sequence belongs to malware class or not.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
