'Issue with unique occurences in R vector

I need help please : I have a list "reves" of vectors, and one of them is composed of names :

reves$personnes
 [1] "rebelle, professeur, gypsie"                                          
 [2] ""                                                                     
 [3] "corinne, roxane, pdl "                                                
 [4] "fabrice, melissa, bernadette, franck, corinne, elizabeth, tom, roxane"
 [5] "didier, bernadette, franck, elizabeth, roxane, autres"                
 [6] "autres"                                                               
 [7] "elizabeth, sebastien_houssiere"                                       
 [8] "elizabeth, corinne"                                                   
 [9] "genevieve, barbara, camille, famille"                                 
[10] "gypsie, inconnue"

At the end I would like to calculate the percentages at which each name appears. So first, I split each line according to "," and I add the names to a new vector :

# Creating vector of characters
new_vec <- c()
for (i in c(1:nrow(reves))){
x <- reves$personnes[i]
y <- strsplit(x, split=",")[[1]]
new_vec <- c(new_vec, y[1:length(y)])
}

It seems to work since new_vec is chr [1:32] :

> new_vec 
 [1] "rebelle"              " professeur"          " gypsie"             
 [4] NA                     "corinne"              " roxane"             
 [7] " pdl "                "fabrice"              " melissa"            
[10] " bernadette"          " franck"              " corinne"            
[13] " elizabeth"           " tom"                 " roxane"             
[16] "didier"               " bernadette"          " franck"             
[19] " elizabeth"           " roxane"              " autres"             
[22] "autres"               "elizabeth"            " sebastien_houssiere"
[25] "elizabeth"            " corinne"             "genevieve"           
[28] " barbara"             " camille"             " famille"            
[31] "gypsie"               " inconnue" 

Using new_vec, I planned to use table(new_vec) to get the appearance rate of each name. However, same names are not counted as unique occurrences. As you can see :

unique(new_vec)
 [1] "rebelle"              " professeur"          " gypsie"             
 [4] NA                     "corinne"              " roxane"             
 [7] " pdl "                "fabrice"              " melissa"            
[10] " bernadette"          " franck"              " corinne"            
[13] " elizabeth"           " tom"                 "didier"              
[16] " autres"              "autres"               "elizabeth"           
[19] " sebastien_houssiere" "genevieve"            " barbara"            
[22] " camille"             " famille"             "gypsie"              
[25] " inconnue"

and here, we clearly see that, for example, "corinne" appears with a score of 2 in the 1st column and with a score of 1 in the second column :

> table(new_vec)
new_vec
              autres              barbara           bernadette              camille 
                   1                    1                    2                    1 
             corinne            elizabeth              famille               franck 
                   2                    2                    1                    2 
              gypsie             inconnue              melissa                 pdl  
                   1                    1                    1                    1 
          professeur               roxane  sebastien_houssiere                  tom 
                   1                    3                    1                    1 
              autres              corinne               didier            elizabeth 
                   1                    1                    1                    2 
             fabrice            genevieve               gypsie              rebelle 
                   1                    1                    1                    1 

Please, how could I get this new_vec with the correct numbers of occurrences so that I can perform my calculations?

Thanks for your help :)



Solution 1:[1]

You do not need a loop so you code can be simplified as follows. First provide reproducible data:

dput(personnes)
c("rebelle, professeur, gypsie", "", "corinne, roxane, pdl ", 
"fabrice, melissa, bernadette, franck, corinne, elizabeth, tom, roxane", 
"didier, bernadette, franck, elizabeth, roxane, autres", "autres", 
"elizabeth, sebastien_houssiere", "elizabeth, corinne", "genevieve, barbara, camille, famille", 
"gypsie, inconnue")

new_vec <- unlist(strsplit(personnes, ", "))
new_vec <- trimws(new_vec)  # Remove space at the end of "pdl "
sort(unique(new_vec))
# [1] "autres"              "barbara"             "bernadette"          "camille"             "corinne"             "didier"              "elizabeth"          
# [8] "fabrice"             "famille"             "franck"              "genevieve"           "gypsie"              "inconnue"            "melissa"            
# [15] "pdl"                 "professeur"          "rebelle"             "roxane"              "sebastien_houssiere" "tom"       

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 dcarlson