'The meaning of the independent variables in prcomp formula
I'm reading "Applied Predictive Modeling" by Kuhn and Johnson. In the code for chapter 3.3 Data Transformations for Multiple Predictors, there is a code snippet:
pr <- prcomp(~ AvgIntenCh1 + EntropyIntenCh1,
data = segTrainTrans,
scale. = TRUE)
Full code example here.
In the documentation for prcomp, I couldn't find much on how this first parameter is even interpreted (this ~ AvgIntenCh1 + EntropyIntenCh1 formula). It just says:
formula: a formula with no response variable, referring only to
numeric variables.
How is that formula used by the prcomp call, what does it mean?
Solution 1:[1]
I think it's just an alternative way of specifying which variables to run the PCA on. It seems that it's equivalent to just specifying x instead of a formula.
prcomp(iris[,-5])
#> Standard deviations (1, .., p=4):
#> [1] 2.0562689 0.4926162 0.2796596 0.1543862
#>
#> Rotation (n x k) = (4 x 4):
#> PC1 PC2 PC3 PC4
#> Sepal.Length 0.36138659 -0.65658877 0.58202985 0.3154872
#> Sepal.Width -0.08452251 -0.73016143 -0.59791083 -0.3197231
#> Petal.Length 0.85667061 0.17337266 -0.07623608 -0.4798390
#> Petal.Width 0.35828920 0.07548102 -0.54583143 0.7536574
prcomp(~Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris)
#> Standard deviations (1, .., p=4):
#> [1] 2.0562689 0.4926162 0.2796596 0.1543862
#>
#> Rotation (n x k) = (4 x 4):
#> PC1 PC2 PC3 PC4
#> Sepal.Length 0.36138659 -0.65658877 0.58202985 0.3154872
#> Sepal.Width -0.08452251 -0.73016143 -0.59791083 -0.3197231
#> Petal.Length 0.85667061 0.17337266 -0.07623608 -0.4798390
#> Petal.Width 0.35828920 0.07548102 -0.54583143 0.7536574
Created on 2022-04-05 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Dan Adams |
