'PHP: remove special characters like � while importing data from CSV to database [duplicate]

I created a PHP script that allows me to upload a huge file of data from csv file. While importing, I'd like to replace the special character like to a letter c. Below is my code:

        $sql ="INSERT INTO bill_of_materials(allotment_code, category_name, activity, quantity, end_unit_quantity, unit, description,
        unit_cost, regular_labor_cost, end_unit_labor_cost, type, batch) VALUES";

        while (($line = fgets($handle)) !== false) {

          $sql .= "('".implode("', '", explode(";", sanitize($line)))."'),";
          $counter++;
        }

            $sql = substr($sql, 0, strlen($sql) - 1);
             if (mysqli_query($new_conn, $sql) === TRUE) {

                echo 1;

                //database file name
                $new_database_file = $new_database.'.sql';

                if(file_exists('backup/'.$new_database_file)) {

                    unlink('backup/'.$new_database_file);

                    // backup main database

                    $command = "C:/xampp/mysql/bin/mysqldump --host=$host --user=$user --password=$pass $database_name > backup/$new_database_file";
                    system($command);

                } else {
                    // backup main database

                    $command = "C:/xampp/mysql/bin/mysqldump --host=$host --user=$user --password=$pass $database_name > backup/$new_database_file";
                    system($command);
                }
            } else {
                echo $sql;
            }

In addition, I have a data from my CSV that is W2-A1 2/F Front Fa�ade - B and I'd like to see an output like W2-A1 2/F Front Facade - B. How can i do this?



Solution 1:[1]

First of all, make sure you are using correct database client charset collation. If database charset/collation is correct, you may use preg_replace to sanitize dirty characters like so:

function sanitize($line){
   $clean = iconv('UTF-8', 'ASCII//TRANSLIT', $line); // attempt to translate similar characters
   $clean = preg_replace('/[^\w]/', '', $clean); // drop anything but ASCII
   return $clean;
}

If that won't help (e.g. you have truly corrupted binary stream - for example saving into CSV from old Excel source file) you may want to use binary translated characters (first you must find out corrupted binary sequence e.g. by dumping it via chr(ord($line[$position]))) - example:

function sanitize($line){
    $map = [
        // corrupted chars sequence -> fixed chars
        "\xC3\xA8" => '?',
        "\xC3\x88" => '?',
        "\xC3\xB9" => '?',
        "\xC3\x99" => '?',
        "\xC3\xAC" => '?',
        "\xC3\x8C" => '?',
        "\xC3\xB8" => '?',
        "\xC3\x98" => '?',
        "\x53\xC2\x8D" => 'Š',
        "\xC2\xA9" => 'Š',
    ];
    return str_replace(array_keys($map), $map, $line);
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1