'Polynomial Expansion without sklearn

I want to try and recreate this functions from scratch (without using sklearn):

# The matrix is M which is 1000x10 matrix.

from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(2)
df = pd.DataFrame(poly.fit_transform(M))
print(df)

So basically, I want to multiply each column with all possible combination. I have tried to create a matrix with a column of 1's, and try to multiply and append those new columns to this matrix, but it feels inefficient.

Any help will be much appreciated!

Thanks!



Solution 1:[1]

IIUC, considering that one doesn't want to use sklearn, the following function using numpy should do the work

import numpy as np

def polynomial_features(M):

    # Create a new matrix with the same number of columns as M
    # and the same number of rows as M
    new_M = np.zeros((M.shape[0], M.shape[1] * M.shape[1]))

    # Iterate over the rows of M
    for i in range(M.shape[0]):

        # Iterate over the columns of M
        for j in range(M.shape[1]):

            # Iterate over the rows of M
            for k in range(M.shape[1]):

                # Set the value of the new matrix at row i, column jk
                # to the value of M at row i, column j multiplied by M at row i, column k
                new_M[i, j * M.shape[1] + k] = M[i, j] * M[i, k]

    return new_M

Let's create a matrix with 1000 x 10 elements (the requirement OP mentions)

M = np.random.rand(1000, 10)

And now let's test the function polynomial_features created above

print(polynomial_features(M))

[Out]: 
[[0.08786517 0.16912289 0.29598568 ... 0.002051   0.03651199 0.0193943 ]
 [0.00225755 0.03328205 0.04603064 ... 0.16071782 0.02018234 0.12334937]
 [0.79684389 0.69021532 0.19384513 ... 0.16546981 0.2222551  0.1148623 ]
 ...
 [0.48763216 0.28216774 0.55510565 ... 0.04953669 0.53549051 0.29981426]
 [0.08601165 0.16686318 0.15856746 ... 0.02940633 0.00697805 0.01778009]
 [0.57611244 0.21359366 0.4506878  ... 0.54699887 0.05539072 0.35611483]]

Alternatively, @rickhg12hs's suggestion (a one liner) also works

new_M = np.apply_along_axis(lambda x: np.array([x[k1]*x[k2] for k2 in range(len(x)) for k1 in range(len(x)) if k1>=k2]), axis=1, arr=M)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1