'How to make an accent insensitive `MATCH() AGAINST()` sentence?
I have a table called products which has a varchar column called name encoded in latin1_swedish_ci.
I'm trying to make a search query in order to filter products by name using the MATCH() AGAINST() sentence that look like this:
SELECT *
FROM products
WHERE MATCH(name) AGAINST('*search_string*' IN BOOLEAN MODE)
It works really good varchars without accents, however, if I have a product called Colágeno and the search input is Colageno, the query would not catch the product.
So I tried what is showed in this SO answer without success.
SELECT * FROM products WHERE MATCH(CONVERT(BINARY(name) USING utf8)) AGAINST('*Colageno*' IN BOOLEAN MODE)
I got the error:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CONVERT(BINARY(name) USING utf8)) AGAINST('*Colageno*' IN BOOLEAN MODE)' at line 3
I only want to catch special characters like á, é, í, ó, ú.
Thanks in advance!
Solution 1:[1]
Solved!
After some discussion in the comments, I realized that the error was not in the MATCH() AGAINST() statement, since it does not distinguish diacritics by default.
So the problem had to do with how the diacritics were stored in MySQL, in my case, they were stored like this COLÁGENO -> COLÃ<0x81>GENO. Therefore, it was necessary to find out how to save the tildes correctly without corrupting the table.
Encodings
I tried making encoding changes by executing in phpmyadmin the instruction:
ALTER TABLE products CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
However, there were no changes in performance.
I then tried changing the encoding in the exported table file by changing DEFAULT CHARSET=latin1 to DEFAULT CHARSET=utf8mb4, but no change in results either.
Modify the accents manually
My other attempt was, once the encoding change was made, to manually modify the cells that had characters like Ã<0x81> by their corresponding character with tilde Á. But sadly this seemed to corrupt queries to the table (I was still able to access the other tables normally).
So I thought about what masterguru said in the comments about encoding changes altering the way scripts connect to the table, and apparently when I manually modified a character to put the tilde, the scripts kept accessing the table. table with the previous encoding.
Solution
The scripts were in PHP so I had to find the solution in that language.
I found this answer in English SO where it said how to save tildes in the database correctly. To do this, you had to write...
mysqli_set_charset($connection, "utf8");
...this after the connection to the database. Finally, I had to change the rare characters in my database to their corresponding tilde character for the MATCH AGAINST to work, and voila!
Many thanks to masterguru, Triby and aeportugal for the help provided in the comments!
Original post: https://es.stackoverflow.com/questions/511745/como-hacer-que-la-b%c3%basqueda-match-against-ignore-los-tildes-o-acentos
Solution 2:[2]
As the author of the answer pointed out, you can use CONVERT(BINARY(name) USING utf8) instead of CONVERT(BINARY(name) USING utf8).
MySQL already converts varchar values to binary ones when converting them to fulltext.
You need to take that into account when creating the index, otherwise you'll end up searching in binary data and not in varchar data.
Another solution is to use a certain collation for your varchar column, let's say latin1_general_ci. This way, the search will always convert the data to binary data (if the varchar column is not already a binary one) and look it up in the index as bytes.
You can read more about it here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | AlexSp3 |
| Solution 2 | PJMan0300 |
