'How to simulate PCA Data?
I am trying to simulate PCA Data as follows:
q <- 5 # no. of PCs
p <- 20 # no. of variables
n <- 2000 # no. of individuals
eps <- 0.05 # error standard deviation
# Eigenvalues
Sig <- seq(3, 1, length.out = q)^2
Lambda <- diag(Sig)
# Matrix of Principal Components
H <- rmvnorm(n = n, mean = rep(0, q), sigma = Lambda)
# Add gaussian noise
E <- matrix(rnorm(n*p, sd = sqrt(eps)), ncol = p)
# Data matrix
Y <- H %*% t(Amat) + E
# Perform PCA
summary(m1 <- prcomp(Y, scale = T)) # and so on...
However, I have no idea how to create the matrix of Loadings Amat in a meaningful way.
Thanks for any help I receive from you and I appreciate it!
Solution 1:[1]
This is not using the same structure as the OP, but it simulates a PCA with 4 different groups (which could be species) which each have 3 "traits" (each of the trait have different means and sd based on some biological data found in the literature for example).
set.seed(123) # setting this so the random results will be repeatable
library(MASS)
# Simulating 3 traits for 4 different species
n = 200 # number of "individuals"
# Generate the groups
Amat1 = MASS::mvrnorm(n, mu = c(11.2,11.8,9.91), Sigma = diag(c(1.31,1.01,1.02)))
Amat2 = MASS::mvrnorm(n, mu = c(7.16,8.54,6.82), Sigma = diag(c(0.445,0.546,0.350)))
Amat3 = MASS::mvrnorm(n, mu = c(15.6,14.6,13.5), Sigma = diag(c(1.43,0.885,0.990)))
Amat4 = MASS::mvrnorm(n, mu = c(8.65,14.1,8.24), Sigma = diag(c(0.535,0.844,0.426)))
# Combine the data
Amat = rbind(Amat1,Amat2,Amat3,Amat4)
# Make group data
Amat.gr = cbind(Amat, gl(4,k=n,labels = c(1,2,3,4)))
# Calculate the covariance matrix for each group
by(Amat.gr[,1:3],INDICES = Amat.gr[,4],FUN = cov) # calculate covariance matrix for all groups
# Plot the result
summary(m1 <- prcomp(Amat, scale= T))
# biplot(m1, xlabs=rep(".", nrow(Amat)), cex = 2)
plot(vegan::scores(m1), asp = 1, pch = 19, col = gl(4,k=n,labels = c(1,2,3,4)))
plot(Amat[,1],Amat[,2], pch = 19, col = gl(4,k=n,labels = c(1,2,3,4)))
The plot on the left shows the PCA and on the right the raw data.
I added a toy example with data to show what is the algorithm to compute a PCA in R from Legendre and Legendre 2012.
# Generate vectors (example from Legendre and Legendre 2012)
v1 = c(2,3,5,7,9)
v2 = c(1,4,0,6,2)
# If you want to play with sample size
# n = 100
# v1 = rnorm(n = n, mean = mean(v1), sd = sd(v1))
# v2 = rnorm(n = n, mean = mean(v2), sd = sd(v2))
# Get the y matrix
y = cbind(v1,v2)
# Centered y matrix
yc = apply(y, 2, FUN = function(x) x-mean(x))
# Dispersion matrix
s = 1/(nrow(y)-1)*t(yc) %*% yc
# Compute the single value decomposition to get the eigenvectors and
ev = svd(s)$v
# get the principal components
f = yc %*% ev
# This gives the identity matrix
round(t(svd(s)$v) %*% svd(s)$v,2)
# these are the eigen values
svd(s)$d
-svd(yc)$v #p. 104
plot(f, pch = 19); abline(h=0,v=0, lty = 3)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |

