'Converting csv to array causing "Tokenization is skipped for long lines for performance reasons."

After uploading a CSV, I am trying to convert the first column (a list of domains) to an array and add the array items to a txt file.

So the CSV shows:

test.com
test2.net
test3.org

And the txt file should show them like this:

test.com, test2.net, test3.org

Everything works, except the first row (test.com) doesn't seem to be converting as text properly. All of the other rows work fine. When I look at the txt file, the content is there and doesn't seem to show an issue, but in my IDE it shows an error before the first item: Tokenization is skipped for long lines for performance reasons.

Screenshot of my txt file in VS Code:

Screenshot of txt file

I need to check each domain in the txt file before adding new domains to prevent duplicates, and it's not recognizing test.com as being there already.

So I ran a test to compare the array to a manually written array with the same values, and sure enough test.com and test.com do not equal each other. The other domains do.

// Converting a CSV UTF-8 (Comma delimited) to array
function csv_to_array( $filepath ) {
    // Get the rows
    $rows   = array_map('str_getcsv', file( $filepath ));
    
    // Store the items here
    $csv = [];

    // Grab the items
    foreach($rows as $row) {
        $csv[] = $row[0];
    }

    // Return the array
    return $csv;
}

Running my test:

$csv_url = 'test.csv';
$csv_domains = csv_to_array( $csv_url );
print_r($csv_domains);
$csv_domains_string = implode(', ', $csv_domains);
print_r('<br>'.$csv_domains_string);

echo '<br><hr><br>';

$compare_domains = ['test.com', 'test2.net', 'test3.org'];
print_r($compare_domains);
$compare_domains_string = implode(', ', $compare_domains);
print_r('<br>'.$compare_domains_string);

echo '<br><hr><br>';

if ($csv_domains[0] === $compare_domains[0]) {
    echo '<br>true: ';
    echo '$csv_domains[0] ('.$csv_domains[0].') = $compare_domains[0] ('.$compare_domains[0].')';
} else {
    echo '<br>false: ';
    echo '$csv_domains[0] ('.$csv_domains[0].') != $compare_domains[0] ('.$compare_domains[0].')';
}

echo '<br><hr><br>';

if ($csv_domains[1] === $compare_domains[1]) {
    echo '<br>true: ';
    echo '$csv_domains[1] ('.$csv_domains[1].') = $compare_domains[1] ('.$compare_domains[1].')';
} else {
    echo '<br>false: ';
    echo '$csv_domains[1] ('.$csv_domains[1].') != $compare_domains[1] ('.$compare_domains[1].')';
}

echo '<br><hr><br>';

if ($csv_domains[2] === $compare_domains[2]) {
    echo '<br>true: ';
    echo '$csv_domains[2] ('.$csv_domains[2].') = $compare_domains[0] ('.$compare_domains[2].')';
} else {
    echo '<br>false: ';
    echo '$csv_domains[2] ('.$csv_domains[2].') != $compare_domains[0] ('.$compare_domains[2].')';
}

Result:

Screenshot of results

So how do I fix this?

EDIT: var_dump returns two different values:

var_dump($csv_domains[0]); // string(11) "test.com"
var_dump($compare_domains[0]); // string(8) "test.com"


Solution 1:[1]

Thanks to @ChrisHaas, I was able to fix it by changing the for loop and removing the BOM from the first cell:

$count = 0;
foreach($rows as $row) {
    if ($count == 0) {
        $new_row = str_replace("\xEF\xBB\xBF",'', $row[0]);
    } else {
        $new_row = $row[0];
    }
    $csv[] = $new_row;
    $count++;
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Michael