'How to simulate PCA Data?

I am trying to simulate PCA Data as follows:

q <-   5        # no. of PCs
p <-   20       # no. of variables 
n <-   2000     # no. of individuals 
eps <- 0.05     # error standard deviation 

# Eigenvalues  
Sig <- seq(3, 1, length.out = q)^2  
Lambda <- diag(Sig)

# Matrix of Principal Components 
H <- rmvnorm(n = n, mean = rep(0, q), sigma = Lambda)  

# Add gaussian noise 
E <- matrix(rnorm(n*p, sd = sqrt(eps)), ncol = p) 

# Data matrix 
Y <- H  %*% t(Amat) + E 

# Perform PCA
summary(m1 <- prcomp(Y, scale = T)) # and so on...

However, I have no idea how to create the matrix of Loadings Amat in a meaningful way.

Thanks for any help I receive from you and I appreciate it!



Solution 1:[1]

This is not using the same structure as the OP, but it simulates a PCA with 4 different groups (which could be species) which each have 3 "traits" (each of the trait have different means and sd based on some biological data found in the literature for example).

set.seed(123) # setting this so the random results will be repeatable 
library(MASS)
# Simulating 3 traits for 4 different species 
n = 200 # number of "individuals"

# Generate the groups 
Amat1 = MASS::mvrnorm(n, mu = c(11.2,11.8,9.91), Sigma = diag(c(1.31,1.01,1.02)))
Amat2 = MASS::mvrnorm(n, mu = c(7.16,8.54,6.82), Sigma = diag(c(0.445,0.546,0.350)))
Amat3 = MASS::mvrnorm(n, mu = c(15.6,14.6,13.5), Sigma = diag(c(1.43,0.885,0.990)))
Amat4 = MASS::mvrnorm(n, mu = c(8.65,14.1,8.24), Sigma = diag(c(0.535,0.844,0.426)))

# Combine the data 
Amat = rbind(Amat1,Amat2,Amat3,Amat4)

# Make group data 
Amat.gr = cbind(Amat, gl(4,k=n,labels = c(1,2,3,4)))

# Calculate the covariance matrix for each group 
by(Amat.gr[,1:3],INDICES = Amat.gr[,4],FUN = cov) # calculate covariance matrix for all groups 

# Plot the result 
summary(m1 <- prcomp(Amat, scale= T))
# biplot(m1, xlabs=rep(".", nrow(Amat)), cex = 2)
plot(vegan::scores(m1), asp = 1, pch = 19, col = gl(4,k=n,labels = c(1,2,3,4)))
plot(Amat[,1],Amat[,2], pch = 19, col = gl(4,k=n,labels = c(1,2,3,4)))

The plot on the left shows the PCA and on the right the raw data.

enter image description here

I added a toy example with data to show what is the algorithm to compute a PCA in R from Legendre and Legendre 2012.

# Generate vectors (example from Legendre and Legendre 2012)
v1 = c(2,3,5,7,9) 
v2 = c(1,4,0,6,2)
# If you want to play with sample size
# n = 100
# v1 = rnorm(n = n, mean = mean(v1), sd = sd(v1))
# v2 = rnorm(n = n, mean = mean(v2), sd = sd(v2))

# Get the y matrix
y = cbind(v1,v2)

# Centered y matrix
yc = apply(y, 2, FUN = function(x) x-mean(x))

# Dispersion matrix
s = 1/(nrow(y)-1)*t(yc) %*% yc

# Compute the single value decomposition to get the eigenvectors and
ev = svd(s)$v

# get the principal components
f = yc %*% ev

# This gives the identity matrix 
round(t(svd(s)$v) %*% svd(s)$v,2)

# these are the eigen values
svd(s)$d
-svd(yc)$v #p. 104

plot(f, pch = 19); abline(h=0,v=0, lty = 3)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1