'Trying to bind a complex file name in a DBI insert place holder, but special characters get mangled

Guys I don't know if this is a bind problem, character set problem or something I haven't considered. Here's the challenge - the file system was written by some cd ripping software but the problem is more generic, I need to be able to insert any legal filename from a Linux OS into a database, this set of titles is just a test case.

I tied to make this as clear as I could:

    find(\&process, $_);< populates @fnames with File::Find:name
    my $count = scalar @fnames;
    print "File count $count\n";

So I use File:Find to fill the array with the file name strings, some are really problematic..

    $stm = $dbh_sms->prepare(qq{
    INSERT INTO $table (path) VALUES (?)
    });
    foreach $fname(@fnames){
    print "File:$fname\n";
    $stm->execute("$fname");
    }

Now here's what I see printed, compared to what comes back out of MariaDb, just a few examples showing the problem:

File:/test1/music/THE DOOBIE BROTHERS - BEST OF THE DOOBIES/02 - THE DOOBIE BROTHERS - LONG TRAIN RUNNIN´.mp3

A bunch of these titles have the back ticks, they are problem #1 - here's how they come back out of a select against the table after I populate it:

| /test1/music/THE DOOBIE BROTHERS - BEST OF THE DOOBIES/02 - THE DOOBIE BROTHERS - LONG TRAIN RUNNIN´.mp3

This character is also a problem:

File:/test1/music/Blue Öyster Cult - Workshop Of The Telescopes/01 - Blue Öyster Cult - Don't Fear The Reaper.mp3

From the database:

/test1/music/Blue Ãyster Cult - Workshop Of The Telescopes/01 - Blue Ãyster Cult - Don't Fear The Reaper.mp3

And one more, but far from the last of the problematic strings:

File:/test1/music/Better Than Ezra - Deluxe/03 - Better Than Ezra - Southern Gürl.mp3

Comes out as:

/test1/music/Better Than Ezra - Deluxe/03 - Better Than Ezra - Southern Gürl.mp3

I though this was a character set problem, so I added this to the connect string:

    { RaiseError => 1, mysql_enable_utf8mb4 => 1 }

I've also tried:

    $dbh_sms->do('SET NAMES utf8mb4');

Is that a hex C2 being added to the string? (Edit: it's an octal 303, hex c3) I've also tried changing the column to varbinary and still see the same results. Anyone have a clue why this is happening? I'm at a loss....

TIA

Edit - I dumped the table with OD to find out what is actually getting inserted since I was of the belief that with a placeholder, a bind variable would be written without interpolation, basically a binary transfer. To save my eyes, I just did a handful of records concentrating on what I thought was a 'back tick' from the above example, which is an octal 264, AKA "Acute accent - spacing acute".

It is in the table, but preceded by 303 202 and 302, which are a couple of those 'A' characters with the icing on top and a "low quote" character in between. Which contradicts my prior understanding about the utility of place holders and bind variables.

So I am more confused now than before.



Solution 1:[1]

I found the problem, it was in the perl character encoding:

    $fname = decode("UTF-8", $fname);

was all I needed.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1