'How to get tally of unique words using only SQL?

How do you get a list of all unique words and their frequencies ("tally") in one text column of one table of an SQL database? An answer for any SQL dialect in which this is possible would be appreciated.

In case it helps, here's a one-liner that does it in Ruby using Sequel:

Hash[DB[:table].all.map{|r| r[:text]}.join("\n").gsub(/[\(\),]+/,'').downcase.strip.split(/\s+/).tally.sort_by{|_k,v| v}.reverse]

To give an example, for table table with 4 rows with each holding one line of Dr. Dre's Still D.R.E.'s refrain in a text field, the output would be:

{"the"=>4,
 "still"=>3,
 "them"=>3,
 "for"=>2,
 "d-r-e"=>1,
 "it's"=>1,
 "streets"=>1,
 "love"=>1,
 "got"=>1,
 "i"=>1,
 "and"=>1,
 "beat"=>1,
 "perfect"=>1,
 "to"=>1,
 "time"=>1,
 "my"=>1,
 "taking"=>1,
 "girl"=>1,
 "low-lows"=>1,
 "in"=>1,
 "corners"=>1,
 "hitting"=>1,
 "world"=>1,
 "across"=>1,
 "all"=>1,
 "gangstas"=>1,
 "representing"=>1,
 "i'm"=>1}

This works, of course, but is naturally not as fast or elegant as it would be to do it in pure SQL - which I have no clue if that's even in the realm of possibilites...

sql


Solution 1:[1]

I guess it would depend on how the SQL database would look like. You would have to first turn your 4 row "database" into a data of single column, each row representing one word. To do that you could use something like String_split, where every space would be a delimiter.

STRING_SPLIT('I'm representing for them gangstas all across the world', ' ')

https://www.sqlservertutorial.net/sql-server-string-functions/sql-server-string_split-function/ This would turn it into a table where every word is a row.

Once you've set up your data table, then it's easy.

Your_table:

[Word]
I'm
representing
for 
them
...
world

Then you can just write:

SELECT Word, count(*) 
FROM your_table
GROUP BY Word;

Your output would be:

Word   |    Count
I'm          1
representing 1

Solution 2:[2]

I had a play using XML in sql server. Just an idea :)

with cteXML as
(
select * 
,cast('<wd>' + replace(songline,' ' ,'</wd><wd>') + '</wd>' as xml) as XMLsongline
from @tSongs
),cteBase as
(
select p.value('.','nvarchar(max)') as singleword
from cteXML as x
cross apply x.XMLsongline.nodes('/wd') t(p)
)
select b.singleword,count(b.singleword)
from cteBase as b
group by b.singleword

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 RocketJumpTrist
Solution 2 Nathan_Sav