'Feedforward Neural Net Language model - computational complexity (word2vec) [closed]

I was reading this paper on word2vec, and came around the following description of a feedforward NNLM:

It consists of input, projection, hidden and output layers. At the input layer, N previous words are encoded using 1-of-V coding, where V is size of the vocabulary. The input layer is then projected to a projection layer P that has dimensionality N × D, using a shared projection matrix. As only N inputs are active at any given time, composition of the projection layer is a relatively cheap operation.

The following expression is given for the computational complexity per training example:

Q = N×D + N×D×H + H×V.

The last two terms make sense to me: N×D×H is roughly the amount of parameters in a dense layer from the N×D-dimesnional projection layer to the H hidden neurons, analogous for H×V. The first term, however, I expected to be V×D since the mapping from a one-hot encoded word to a D-dimensional vector is done via a V×D dimensional matrix. I came to that conclusion after reading this referenced paper and this SO post where the workings of the projection layer are explained in more detail.

Perhaps I have misunderstood what is meant by "training complexity".



Solution 1:[1]

If I understand this excerpt correctly, it is N x D because N previous words are taken into account, so you don't just test for one word, nor do you test for all words (V). Instead your input consists of N words. Therefore N x D is the complexity of this layer

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 David Hutter