'Pentaho PDI - data integration CSV s3 enclosure bug

For some reason, in Pentaho PDI --- the s3 CSV input --- I'm inputting csv files of course -- they are delimted by commas and some fields contain quotation enclosures "" ... however there are commas within the quotations "test, two, three". Pentaho correctly ignores these during the Preview, but seems to not register such an enclosure at run time, separating out stuff that shouldn't be separated.

Anyone familiar with this?

I'm trying to think of a work-around.

really, the "false" commas all have spaces after them, but that doesn't provide much help. The reverse would be useful as I can specify a commma-space as a deliminator.



Solution 1:[1]

Sounds like that is a bug in the step. I think your best option is to download the file, replace all occurrences of comma-space with something else (like |) and then try again.

You could also try reporting the bug and see if any action is taken - http://jira.pentaho.com/browse/PDI

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 rhowell