'Heatmap creation using ggplot for large genomic dataset
Dear StackOverflow community,
I have a very large data set with an extract that looks like the below:
AC010327.1 AC010368.1 AC010525.2
TGYR 0 0 0.984
BHT 0.1 0 0
THY_RHE 0 0.0002 0
FJU_WJNKO 0 0 0
PAED_DISE 0.342 0 0
DID PID 0 0.3821 0
Each column is a gene, this is 30,000 columns long. There are 9 rows in total each a code for a disease type. The figures represent a statistical test outcome that is between 0-1 that has been run for that disease against the gene type.
I would like to present this mass of data in an easy to view form and thought a heatmap would be most suitable.
Using:
x <- data
x <-as.data.frame(x)
heatmap(x, scale - 'none')
Gets me a pretty ugly block of data.
I have been trying ggplot2 with geom_tile but keep getting error messages. I am slightly unsure what the "aes" function of this would be as I haven't names my row or coloumn names.
I can provide more information if needed but would be grateful for some guidance?
Many thanks
Update 13/2/18
Using solution below, is there a way of weighting it in preference to results greater than 0?
Solution 1:[1]
We can convert the data frame from wide format to long format, and then use the geom_tile.
library(tidyverse)
dat2 <- dat %>%
rownames_to_column(var = "Disease") %>%
gather(Gene, Value, -Disease)
ggplot(dat2, aes(x = Gene, y = Disease, fill = Value)) +
geom_tile() +
scale_fill_viridis_c()
DATA
dat <- read.table(text = " 'AC010327.1' 'AC010368.1' 'AC010525.2'
TGYR 0 0 0.984
BHT 0.1 0 0
THY_RHE 0 0.0002 0
FJU_WJNKO 0 0 0
PAED_DISE 0.342 0 0
'DID PID' 0 0.3821 0",
header = TRUE, stringsAsFactors = FALSE)
Solution 2:[2]
When you are observing covariance (difference among different variables), and suppose the check/test is with two categorical variable like yours, its always better to use geom_tile for a fairly medium size dataset.
But when your dataset is huge that it cant be seen in geom_tile, then its better to use d3heatmap
I can show you an example with a large dataset, which you can also try and is similar to your dataset.
library(d3heatmap)
url <- "http://datasets.flowingdata.com/ppg2008.csv"
nba_players <- read.csv(url, row.names = 1)
d3heatmap(nba_players, scale = "column")
The result can be opened in web browser and can be played interactively An example result can be seen in this site: Output
Check this site for more information
Notes
The dataset should be a numeric dataset, d3 heatmaps won't accept any negative values or any characters
To avoid the problem you can make a percentage share for each row or column
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | www |
| Solution 2 | marc_s |

