'Make a column with values borrowed from another column and negative sign from another
I would like to make a copy of a column in a new column but replace all instances of '0' with '1E+308' but borrow the '-' or '+' sign from another column. My data frame looks as follows
Gene log2FoldChange pvalue
MAN_08191 2.237973215 6.78E-307
MAN_10213 -4.701066739 7.95E-306
MAN_02441 -5.084348578 0
MAN_02858 -4.200656711 5.33E-304
MAN_03363 -2.580385454 0
MAN_01699 3.003051841 4.72E-303
MAN_02586 5.200458256 0
I would like to get the following new column "Rank"
Gene log2FoldChange pvalue Rank
MAN_08191 2.237973215 6.78E-307 6.78E-307
MAN_10213 -4.701066739 7.95E-306 -7.95E-306
MAN_02441 -5.084348578 0 -1E+308
MAN_02858 -4.200656711 5.33E-304 -5.33E-304
MAN_03363 -2.580385454 0 -1E+308
MAN_01699 3.003051841 4.72E-303 4.72E-303
MAN_02586 5.200458256 0 1E+308
Solution 1:[1]
You can use sign()
library(dplyr)
dt %>%
mutate(rank = sign(log2FoldChange)*(if_else(pvalue==0,1E+308,pvalue)))
Output:
Gene log2FoldChange pvalue rank
1 MAN_08191 2.237973 6.78e-307 6.78e-307
2 MAN_10213 -4.701067 7.95e-306 -7.95e-306
3 MAN_02441 -5.084349 0.00e+00 -1.00e+308
4 MAN_02858 -4.200657 5.33e-304 -5.33e-304
5 MAN_03363 -2.580385 0.00e+00 -1.00e+308
6 MAN_01699 3.003052 4.72e-303 4.72e-303
7 MAN_02586 5.200458 0.00e+00 1.00e+308
Solution 2:[2]
If awk is an option
$ awk 'NR == 1 { $( NF+1 ) = "Rank" } NR > 1 {split($2,a,"[0-9]"); ( $NF==0 ? $(NF+1)=a[1]"1E+308" : $( NF+1 ) = $NF) }1' input_file
Gene log2FoldChange pvalue Rank
MAN_08191 2.237973215 6.78E-307 6.78E-307
MAN_10213 -4.701066739 7.95E-306 7.95E-306
MAN_02441 -5.084348578 0 -1E+308
MAN_02858 -4.200656711 5.33E-304 5.33E-304
MAN_03363 -2.580385454 0 -1E+308
MAN_01699 3.003051841 4.72E-303 4.72E-303
MAN_02586 5.200458256 0 1E+308
Solution 3:[3]
awk '
NR==1 {$(NF+1) = "rank"}
NR!=1 {
$(NF+1)=($2 ~ /-/ ? "-" : "")($NF == 0 ? "1E+308" : $NF)
}1' input_file|column -s ' ' -t
Gene log2FoldChange pvalue rank
MAN_08191 2.237973215 6.78E-307 6.78E-307
MAN_10213 -4.701066739 7.95E-306 -7.95E-306
MAN_02441 -5.084348578 0 -1E+308
MAN_02858 -4.200656711 5.33E-304 -5.33E-304
MAN_03363 -2.580385454 0 -1E+308
MAN_01699 3.003051841 4.72E-303 4.72E-303
MAN_02586 5.200458256 0 1E+308
Solution 4:[4]
INPUT
Gene log2FoldChange pvalue
MAN_08191 2.237973215 6.78E-307
MAN_10213 -4.701066739 7.95E-306
MAN_02441 -5.084348578 0
MAN_02858 -4.200656711 5.33E-304
MAN_03363 -2.580385454 0
MAN_01699 3.003051841 4.72E-303
MAN_02586 5.200458256 0
CODE
{m,g}awk 'sub("$","\t"( !_==NR ? "Rank" \
: !+$+NF ? "1E+308" \
: /^[^ \t]+[ \t]+[-]/ ? "-" $NF : $NF))'
OUTPUT
Gene log2FoldChange pvalue Rank
MAN_08191 2.237973215 6.78E-307 6.78E-307
MAN_10213 -4.701066739 7.95E-306 -7.95E-306
MAN_02441 -5.084348578 0 1E+308
MAN_02858 -4.200656711 5.33E-304 -5.33E-304
MAN_03363 -2.580385454 0 1E+308
MAN_01699 3.003051841 4.72E-303 4.72E-303
MAN_02586 5.200458256 0 1E+308
if you want it nicely formatted, then it's
CODE
gawk -e '$(_<_)=sprintf("%-15s %+20s %15s %15s", $(_^=_<_), $++_, $++_,
( (+$_) * (($--_)~"^[-]") ? "-" : "") \
(--_==NR ? "Rank" : !+$NF ? "1E+308" : $NF ) )'
OUTPUT
Gene log2FoldChange pvalue Rank
MAN_08191 2.237973215 6.78E-307 6.78E-307
MAN_10213 -4.701066739 7.95E-306 -7.95E-306
MAN_02441 -5.084348578 0 1E+308
MAN_02858 -4.200656711 5.33E-304 -5.33E-304
MAN_03363 -2.580385454 0 1E+308
MAN_01699 3.003051841 4.72E-303 4.72E-303
MAN_02586 5.200458256 0 1E+308
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | HatLess |
| Solution 3 | |
| Solution 4 | RARE Kpop Manifesto |
