'Get PCA matrix with feature names in pyspark

Thanks for reading.

I need some help here on something which should be simple but doesn't seem to be.

I'm running PCA in Pyspark and I'm looking to find out the names of the features it is selecting as part of the dimension reduction.

I get to the dense matrix but I'm not sure how to get to something interpretable. Ideally I'd like a spark dataframe with feature names included.

I'm having the same problems with the feature selectors within spark it seems theres no though gone into interpretability once these techniques are applied.

Who knew you might want to know which features were being selected....

Any way thanks very much for your help!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source