'How do I find the most common words in a character vector in R?

I am analysing some fmri data – in particular, I am looking at what sorts of cognitive functions are associated with coordinates from an fmri scan (conducted while subjects were performing a task. My data can be obtained with the following function:

library(httr)
scrape_and_sort = function(neurosynth_link){
  result = content(GET(neurosynth_link), "parsed")$data
  names  = c("Name", "z_score", "post_prob", "func_con", "meta_analytic")
  df = do.call(rbind, lapply(result, function(x) setNames(as.data.frame(x), names)))
  df$z_score = as.numeric(df$z_score)
  df = df[order(-df$z_score), ]
  df = df[-which(df$z_score<3),]
  df = na.omit(df)
  return(df)
}
 RO4 = scrape_and_sort('https://neurosynth.org/api/locations/-58_-22_6_6/compare')

Now, I want know which key words are coming up most often and ideally construct a list of the most common words. I tried the following:

sort(table(RO4$Name),decreasing=TRUE)

But this clearly won't work.The problem is that the names (for example: "auditory cortex") are strings with multiple words in, so results such 'auditory' and 'auditory cortex' come out as two separate entries, whereas I want them counted as two instances of 'auditory'.

But I am not sure how to search inside each string and record individual words like that. Any ideas?



Solution 1:[1]

Not sure to understand. Can't you proceed like this:

x <- c("auditory cortex", "auditory", "auditory", "hello friend")
unlist(strsplit(x, " "))
# "auditory" "cortex"   "auditory" "auditory" "hello"    "friend"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Stéphane Laurent