'R: how to calculate fraction of empty rows in all columns of a large matrix?
I'm working with single-nuclei rna sequencing and I made a matrix of a subset of genes across all features that shows the counts per gene for each of them. I want to calculate the proportion of genes that are expressed in each feature but can't get my code to return the right result. This is separate from the number of counts per feature which I calculated with colSums already.
Genes that aren't expressed have a "." value, so I want to count how many of those there in each column, calculate it as a fraction of the total number of genes, and use (1-fraction) to find the proportion of genes expressed for each feature.
What should I use for this counting?
Code that I have tried that doesn't quite work (where counts.fc2 is my matrix):
marker <- c('.')
Matrix::colSums(counts.fc2@assays$RNA@counts[markers, ])
And
na.counts <- counts.fc2[grep(".", counts.fc2), ]
Any advice would be appreciated!
Edit: as requested, this is an example of what the matrix looks like (column headers are feature no, eg: CATACTTAGAGTACCG-1:
ppn . 0.8982865 . . .
ocn . . . . .
CheB53a 3.2424953 . . . .
CG5762 . . . . 0.8982865
srp . 2.698674 . . . .
fraction 0.2 0.4 0 0 0.2
Solution 1:[1]
set.seed(1)
m <- replicate(15, sample(c(letters, "."), 20, replace = T))
colMeans(m != ".")
[1] 0.95 1.00 0.95 1.00 0.95 0.95 1.00 1.00 1.00 0.90 1.00 1.00 0.95 0.95 0.95
You can test which elements of your matrix are not .
using the comparison operator !=
. m != "."
will output a boolean matrix that is FALSE
for the elements that are .
and TRUE
otherwise. colMeans
will return the proportion of each column that is TRUE
.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | LMc |