'How to mean center and z-score data using R?

I was reading a paper and found that the authors calculated a score of a gene signature (composed of many genes) for each sample based on the following steps: 1) mean center, 2) average over genes, 3) Z-score. The description seems a little bit confusing to me and I want an exact example.

Here is a dataframe. How can I calculate the signature score for these two samples (a score for each column) separately based on the method described above?

        Sample1   Sample2
Gene1 0.3019117 0.3649211
Gene2 0.2861431 0.3072168
Gene3 0.3794475 0.6505417
Gene4 0.2794465 0.3906110
Gene5 0.3334156 0.5845917
Gene6 0.3513268 0.6560779

data

structure(list(Sample1 = c(0.301911734515308, 0.286143128965312, 
                                    0.379447523688471, 0.279446490938859, 0.333415615105398, 0.351326812590339
), Sample2 = c(0.36492108146509, 0.307216787356549, 0.650541715557005, 
                       0.390610992781682, 0.584591653411763, 0.656077880562312)), row.names = c("Gene1", 
                                                                                                "Gene2", "Gene3", "Gene4", "Gene5", "Gene6"
                       ), class = "data.frame")
r


Solution 1:[1]

It's not super clear. I assume:

  1. is just the mean of each sample
  2. Is the rowwise mean of each gene
  3. is the z score of each sample

If true, then the below does it:

#assume mean center is mean of each sample and z score
df <- df %>% 
  mutate(across(everything(),
               list(mean=mean,
                    z_score=scale),
                .names ="{.fn}_{.col}"))
#sample level mean
df <- df %>% 
  rowwise() %>% 
  mutate(gene_mean=mean(c_across(1:2)))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 alejandro_hagan