'Plotting Levenshtein distance scores in R
I'm trying to plot the Levenshtein distance scores between 2 list of sequences (amino acid sequences) using something other than a heatmap. This is a code I used to generate a heatmap as an example:
library (utils)
library (pheatmap)
dist_scores<-adist(LV #first list of sequence
,CD4 #second list of sequences, counts = TRUE)
colors = c("tomato","khaki1","darkseagreen2", "mediumseagreen", "gray30")
breaks <- c(0, 1,2,3,4,5)
pheatmap(dist_scores,breaks=breaks, color=colors, cluster_rows = T, cluster_cols = T)
and here is the heatmap from the example:
https://i.stack.imgur.com/4ay55.png
I want to have a more intuitive way of showing the data.. I'm thinking of plotting the data as nodes (representing different sequences) and edges (representing the distances..where the length of the edge increases as the score increases), and also color-code the nodes by whether they are from "LV" or "CD4". Is there a way to do this in R?
My coding skills are subpar at best so I would be really grateful for any help. Thanks :)
Solution 1:[1]
IIRC, from my previous experience doing Bioinformatics decades ago, there are already good graphical representations to show similarity between DNA sequences. One option I recall uses sequence dot plots, and R has at least 2 packages for doing that: seqinr (https://cran.r-project.org/web/packages/seqinr/index.html) and dotplot (https://github.com/evolvedmicrobe/dotplot). One option that is not an R package but a web tool, is YASS (https://bioinfo.cristal.univ-lille.fr/yass/index.php).
For some alternative metrics and representations, see: https://ieeexplore.ieee.org/document/9097943 ("Levenshtein Distance, Sequence Comparison and Biological Database Search"), https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4880953/ ("Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images"), and https://pdf.sciencedirectassets.com/271876/1-s2.0-S0888613X07X01403/1-s2.0-S0888613X07000382/main.pdf ("Distance measures for biological sequences: Some recent approaches")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jmcastagnetto |
