'How to remove stopwords through SQL

I have a table with millions of messages. I want to remove a list of stopwords from each of those message - in SQL

Example input:

id  message
-------------------------------
1   we are on top of the world
2   too bad the apple is rotten
3   there is no I in team
4   it matters where you go to

Stopwords to remove:

in, on, of, to, too

Desired output:

id  message
-------------------------------
1   we are top the world
2   bad the apple is rotten
3   there is no I team
4   it matters where you go

The problem here it, I guess, that either stopword could be at the beginning, in the middle or at the end of the message. So a query like this would be satisfactory:

UPDATE table SET message = REPLACE(message, ' in ', '');
UPDATE table SET message = REPLACE(message, ' on ', '');
UPDATE table SET message = REPLACE(message, ' of ', '');
etc...

Is there a better solution?



Solution 1:[1]

If you create a table called stopwords with a field stopword in it holding a list of all your stopwords you can do it like this:

CREATE TABLE [dbo].[stopwords](
    [stopword] char(100) NOT NULL
) 

insert into stopwords values ('in');
insert into stopwords values ('on');
insert into stopwords values ('of');
insert into stopwords values ('to');
insert into stopwords values ('too');

-- DEBUG: select message ,stopword, replace(message,CONCAT(' ', stopword , ' '), ' ')
update table 
set message = trim(replace(CONCAT(' ',message, ' '),CONCAT(' ',stopword,' '),' ')) 
from stopwords
where CONCAT(' ', message , ' ')  like CONCAT('% ' ,stopword , ' %')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1