'Issue with unique occurences in R vector
I need help please : I have a list "reves" of vectors, and one of them is composed of names :
reves$personnes
[1] "rebelle, professeur, gypsie"
[2] ""
[3] "corinne, roxane, pdl "
[4] "fabrice, melissa, bernadette, franck, corinne, elizabeth, tom, roxane"
[5] "didier, bernadette, franck, elizabeth, roxane, autres"
[6] "autres"
[7] "elizabeth, sebastien_houssiere"
[8] "elizabeth, corinne"
[9] "genevieve, barbara, camille, famille"
[10] "gypsie, inconnue"
At the end I would like to calculate the percentages at which each name appears. So first, I split each line according to "," and I add the names to a new vector :
# Creating vector of characters
new_vec <- c()
for (i in c(1:nrow(reves))){
x <- reves$personnes[i]
y <- strsplit(x, split=",")[[1]]
new_vec <- c(new_vec, y[1:length(y)])
}
It seems to work since new_vec is chr [1:32] :
> new_vec
[1] "rebelle" " professeur" " gypsie"
[4] NA "corinne" " roxane"
[7] " pdl " "fabrice" " melissa"
[10] " bernadette" " franck" " corinne"
[13] " elizabeth" " tom" " roxane"
[16] "didier" " bernadette" " franck"
[19] " elizabeth" " roxane" " autres"
[22] "autres" "elizabeth" " sebastien_houssiere"
[25] "elizabeth" " corinne" "genevieve"
[28] " barbara" " camille" " famille"
[31] "gypsie" " inconnue"
Using new_vec, I planned to use table(new_vec) to get the appearance rate of each name. However, same names are not counted as unique occurrences. As you can see :
unique(new_vec)
[1] "rebelle" " professeur" " gypsie"
[4] NA "corinne" " roxane"
[7] " pdl " "fabrice" " melissa"
[10] " bernadette" " franck" " corinne"
[13] " elizabeth" " tom" "didier"
[16] " autres" "autres" "elizabeth"
[19] " sebastien_houssiere" "genevieve" " barbara"
[22] " camille" " famille" "gypsie"
[25] " inconnue"
and here, we clearly see that, for example, "corinne" appears with a score of 2 in the 1st column and with a score of 1 in the second column :
> table(new_vec)
new_vec
autres barbara bernadette camille
1 1 2 1
corinne elizabeth famille franck
2 2 1 2
gypsie inconnue melissa pdl
1 1 1 1
professeur roxane sebastien_houssiere tom
1 3 1 1
autres corinne didier elizabeth
1 1 1 2
fabrice genevieve gypsie rebelle
1 1 1 1
Please, how could I get this new_vec with the correct numbers of occurrences so that I can perform my calculations?
Thanks for your help :)
Solution 1:[1]
You do not need a loop so you code can be simplified as follows. First provide reproducible data:
dput(personnes)
c("rebelle, professeur, gypsie", "", "corinne, roxane, pdl ",
"fabrice, melissa, bernadette, franck, corinne, elizabeth, tom, roxane",
"didier, bernadette, franck, elizabeth, roxane, autres", "autres",
"elizabeth, sebastien_houssiere", "elizabeth, corinne", "genevieve, barbara, camille, famille",
"gypsie, inconnue")
new_vec <- unlist(strsplit(personnes, ", "))
new_vec <- trimws(new_vec) # Remove space at the end of "pdl "
sort(unique(new_vec))
# [1] "autres" "barbara" "bernadette" "camille" "corinne" "didier" "elizabeth"
# [8] "fabrice" "famille" "franck" "genevieve" "gypsie" "inconnue" "melissa"
# [15] "pdl" "professeur" "rebelle" "roxane" "sebastien_houssiere" "tom"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | dcarlson |
