'Trying to bind a complex file name in a DBI insert place holder, but special characters get mangled
Guys I don't know if this is a bind problem, character set problem or something I haven't considered. Here's the challenge - the file system was written by some cd ripping software but the problem is more generic, I need to be able to insert any legal filename from a Linux OS into a database, this set of titles is just a test case.
I tied to make this as clear as I could:
find(\&process, $_);< populates @fnames with File::Find:name
my $count = scalar @fnames;
print "File count $count\n";
So I use File:Find to fill the array with the file name strings, some are really problematic..
$stm = $dbh_sms->prepare(qq{
INSERT INTO $table (path) VALUES (?)
});
foreach $fname(@fnames){
print "File:$fname\n";
$stm->execute("$fname");
}
Now here's what I see printed, compared to what comes back out of MariaDb, just a few examples showing the problem:
File:/test1/music/THE DOOBIE BROTHERS - BEST OF THE DOOBIES/02 - THE DOOBIE BROTHERS - LONG TRAIN RUNNIN´.mp3
A bunch of these titles have the back ticks, they are problem #1 - here's how they come back out of a select against the table after I populate it:
| /test1/music/THE DOOBIE BROTHERS - BEST OF THE DOOBIES/02 - THE DOOBIE BROTHERS - LONG TRAIN RUNNIN´.mp3
This character is also a problem:
File:/test1/music/Blue Öyster Cult - Workshop Of The Telescopes/01 - Blue Öyster Cult - Don't Fear The Reaper.mp3
From the database:
/test1/music/Blue Ãyster Cult - Workshop Of The Telescopes/01 - Blue Ãyster Cult - Don't Fear The Reaper.mp3
And one more, but far from the last of the problematic strings:
File:/test1/music/Better Than Ezra - Deluxe/03 - Better Than Ezra - Southern Gürl.mp3
Comes out as:
/test1/music/Better Than Ezra - Deluxe/03 - Better Than Ezra - Southern Gürl.mp3
I though this was a character set problem, so I added this to the connect string:
{ RaiseError => 1, mysql_enable_utf8mb4 => 1 }
I've also tried:
$dbh_sms->do('SET NAMES utf8mb4');
Is that a hex C2 being added to the string? (Edit: it's an octal 303, hex c3) I've also tried changing the column to varbinary and still see the same results. Anyone have a clue why this is happening? I'm at a loss....
TIA
Edit - I dumped the table with OD to find out what is actually getting inserted since I was of the belief that with a placeholder, a bind variable would be written without interpolation, basically a binary transfer. To save my eyes, I just did a handful of records concentrating on what I thought was a 'back tick' from the above example, which is an octal 264, AKA "Acute accent - spacing acute".
It is in the table, but preceded by 303 202 and 302, which are a couple of those 'A' characters with the icing on top and a "low quote" character in between. Which contradicts my prior understanding about the utility of place holders and bind variables.
So I am more confused now than before.
Solution 1:[1]
I found the problem, it was in the perl character encoding:
$fname = decode("UTF-8", $fname);
was all I needed.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
