'Converting csv to array causing "Tokenization is skipped for long lines for performance reasons."
After uploading a CSV, I am trying to convert the first column (a list of domains) to an array and add the array items to a txt file.
So the CSV shows:
test.com
test2.net
test3.org
And the txt file should show them like this:
test.com, test2.net, test3.org
Everything works, except the first row (test.com) doesn't seem to be converting as text properly. All of the other rows work fine. When I look at the txt file, the content is there and doesn't seem to show an issue, but in my IDE it shows an error before the first item: Tokenization is skipped for long lines for performance reasons.
Screenshot of my txt file in VS Code:
I need to check each domain in the txt file before adding new domains to prevent duplicates, and it's not recognizing test.com as being there already.
So I ran a test to compare the array to a manually written array with the same values, and sure enough test.com and test.com do not equal each other. The other domains do.
// Converting a CSV UTF-8 (Comma delimited) to array
function csv_to_array( $filepath ) {
// Get the rows
$rows = array_map('str_getcsv', file( $filepath ));
// Store the items here
$csv = [];
// Grab the items
foreach($rows as $row) {
$csv[] = $row[0];
}
// Return the array
return $csv;
}
Running my test:
$csv_url = 'test.csv';
$csv_domains = csv_to_array( $csv_url );
print_r($csv_domains);
$csv_domains_string = implode(', ', $csv_domains);
print_r('<br>'.$csv_domains_string);
echo '<br><hr><br>';
$compare_domains = ['test.com', 'test2.net', 'test3.org'];
print_r($compare_domains);
$compare_domains_string = implode(', ', $compare_domains);
print_r('<br>'.$compare_domains_string);
echo '<br><hr><br>';
if ($csv_domains[0] === $compare_domains[0]) {
echo '<br>true: ';
echo '$csv_domains[0] ('.$csv_domains[0].') = $compare_domains[0] ('.$compare_domains[0].')';
} else {
echo '<br>false: ';
echo '$csv_domains[0] ('.$csv_domains[0].') != $compare_domains[0] ('.$compare_domains[0].')';
}
echo '<br><hr><br>';
if ($csv_domains[1] === $compare_domains[1]) {
echo '<br>true: ';
echo '$csv_domains[1] ('.$csv_domains[1].') = $compare_domains[1] ('.$compare_domains[1].')';
} else {
echo '<br>false: ';
echo '$csv_domains[1] ('.$csv_domains[1].') != $compare_domains[1] ('.$compare_domains[1].')';
}
echo '<br><hr><br>';
if ($csv_domains[2] === $compare_domains[2]) {
echo '<br>true: ';
echo '$csv_domains[2] ('.$csv_domains[2].') = $compare_domains[0] ('.$compare_domains[2].')';
} else {
echo '<br>false: ';
echo '$csv_domains[2] ('.$csv_domains[2].') != $compare_domains[0] ('.$compare_domains[2].')';
}
Result:
So how do I fix this?
EDIT: var_dump returns two different values:
var_dump($csv_domains[0]); // string(11) "test.com"
var_dump($compare_domains[0]); // string(8) "test.com"
Solution 1:[1]
Thanks to @ChrisHaas, I was able to fix it by changing the for loop and removing the BOM from the first cell:
$count = 0;
foreach($rows as $row) {
if ($count == 0) {
$new_row = str_replace("\xEF\xBB\xBF",'', $row[0]);
} else {
$new_row = $row[0];
}
$csv[] = $new_row;
$count++;
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Michael |


