'Skip blank lines while reading .csv file using opencsv (java)

Good day everyone! My target is to make csv reader to skip the blank lines while parsing a file, do nothing basically, only get me the rows with at least one value. At the moment I have two methods -> 1st is just reading all rows as List of Strings array and returns it, 2nd converts the result into List of Lists of Strings, both are bellow:

private List<String[]> readCSVFile(File filename) throws IOException {

    CSVReader reader = new CSVReader(new FileReader(filename));
    List<String[]> allRows = reader.readAll();

    return allRows;

}

public List<List<String>> readFile(File filename) throws IOException {

        List<String[]> allRows = readCSVFile(filename);     
        List<List<String>> allRowsAsLists = new ArrayList<List<String>>();      
        for (String[] rowItemsArray :  allRows) {
            List<String> rowItems = new ArrayList<String>();
            rowItems.addAll(Arrays.asList(rowItemsArray));
            allRowsAsLists.add(rowItems);

        }
    return allRowsAsLists;

}

My first thought was to check (in the 2'nd method) the length of an array if its 0 just to ignore it - which would be something like this:

for (String[] rowItemsArray :  allRows) {
            **if(rowItemArray.length == 0) continue;**
            List<String> rowItems = new ArrayList<String>();
            rowItems.addAll(Arrays.asList(rowItemsArray));
            allRowsAsLists.add(rowItems);

}  

Unfortunately that didn't work for the reason that even if the row is blank it still returns an array of elements - empty Strings in fact. Checking an individual String is not an option as there are 100+ columns and this is variable. Please suggest what’s the best way to achieve this. Thanks.

Sorted it out this way:

    public List<List<String>> readFile(File filename) throws IOException {

            List<String[]> allRows = readCSVFile(filename, includeHeaders, trimWhitespacesInFieldValues);       
            List<List<String>> allRowsAsLists = new ArrayList<List<String>>();      
            for (String[] rowItemsArray :  allRows) {
                **if(allValuesInRowAreEmpty(rowItemsArray)) continue;**
                List<String> rowItems = new ArrayList<String>();
                rowItems.addAll(Arrays.asList(rowItemsArray));
                allRowsAsLists.add(rowItems);

            }
            return allRowsAsLists;

        }

    private boolean allValuesInRowAreEmpty(String[] row) {
        boolean returnValue = true;
        for (String s : row) {
            if (s.length() != 0) {
                returnValue = false;
            }
        }
        return returnValue;
    }


Solution 1:[1]

For opencsv 5.0 there is an API-option to read CSV lines directly into a Bean.

For people who prefer using the "CsvToBean" feature, the following solution is using the (sadly deprecated) #withFilter(..) method on CsvToBeanBuilder to skip blank lines in the Inputstream:

InputStream inputStream; // provided
List<MyBean> data = new CsvToBeanBuilder(new BufferedReader(new InputStreamReader(inputStream)))
    .withType(MyBean.class)
    .withFilter(new CsvToBeanFilter() {
        /*
         * This filter ignores empty lines from the input
         */
        @Override
        public boolean allowLine(String[] strings) {
            for (String one : strings) {
                if (one != null && one.length() > 0) {
                    return true;
                }
            }
            return false;
        }
    }).build().parse();

Update: With opencsv Release 5.1 (dated 2/2/2020), CsvToBeanFilter got undeprecated as per feature request #120.

Solution 2:[2]

You could summarize all string values per row after trimming them. If the resulting string is empty, there are no values in any cell. In that case ignore the line.
Something like this:

private boolean onlyEmptyCells(ArrayList<String> check) {
    StringBuilder sb = new StringBuilder();
    for (String s : check) {
        sb.append(s.trim());
    }
    return sb.toString().isEmpty(); //<- ignore 'check' if this returns true
}

Solution 3:[3]

Here is an updated solution with lambdas based on @Martin's solution:

InputStream inputStream; // provided
List<MyBean> data = new CsvToBeanBuilder(new BufferedReader(new InputStreamReader(inputStream)))
    .withType(MyBean.class)
    // This filter ignores empty lines from the input
    .withFilter(stringValues -> Arrays.stream(stringValues)
        .anyMatch(value -> value != null && value.length() > 0))
    .build()
    .parse();

Solution 4:[4]

If you do not parse into a Bean, you can use Java Streams API to help you with filtering of invalid CSV rows. My approach is like this (where is is java.io.InputStream instance with CSV data and YourBean map(String[] row) is your mapping method that maps a CSV row to a your Java object:

CSVParser csvp = new CSVParserBuilder()
    .withSeparator(';')
    .withFieldAsNull(CSVReaderNullFieldIndicator.BOTH)
    .build();
CSVReader csvr = new CSVReaderBuilder(new InputStreamReader(is))
    .withCSVParser(csvp)
    .build();
List<YourBean> result = StreamSupport.stream(csvr.spliterator(), false)
    .filter(Objects::nonNull)
    .filter(row -> row.length > 0)
    .map(row -> map(row))
    .collect(Collectors.toList());

Solution 5:[5]

The JavaDoc for CsvToBeanFilter states "Here's an example showing how to use CsvToBean that removes empty lines. Since the parser returns an array with a single empty string for a blank line that is what it is checking." and lists an example of how to do this:

private class EmptyLineFilter implements CsvToBeanFilter {

    private final MappingStrategy strategy;

    public EmptyLineFilter(MappingStrategy strategy) {
        this.strategy = strategy;
    }

    public boolean allowLine(String[] line) {
        boolean blankLine = line.length == 1 && line[0].isEmpty();
        return !blankLine;
    }

 }

 public List<Feature> parseCsv(InputStreamReader streamReader) {
    HeaderColumnNameTranslateMappingStrategy<Feature> strategy = new HeaderColumnNameTranslateMappingStrategy();
    Map<String, String> columnMap = new HashMap();
    columnMap.put("FEATURE_NAME", "name");
    columnMap.put("STATE", "state");
    strategy.setColumnMapping(columnMap);
    strategy.setType(Feature.class);
    CSVReader reader = new CSVReader(streamReader);
    CsvToBeanFilter filter = new EmptyLineFilter(strategy);
    return new CsvToBean().parse(strategy, reader, filter);
 }

Solution 6:[6]

You can use a filter with lambda: like below:

CsvToBean<T> csvToBean = new CsvToBeanBuilder<T>(new StringReader(CSV_HEADER + "\n" + lines))
    .withType(clazz)
    .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)
    .withSeparator(delimiter)
    .withSkipLines(skipLines)
    .withIgnoreLeadingWhiteSpace(true).withFilter(strings -> {
      for (String r : strings) {
        if (r != null && r.length() > 0) {
          return true;
        }
      }
      return false;
    }).build();

Your lambda filter:

.withFilter(strings -> {
      for (String r : strings) {
        if (r != null && r.length() > 0) {
          return true;
        }
      }
      return false;
    })

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 chris
Solution 3 Tomerikoo
Solution 4 Jiri Patera
Solution 5 theINtoy
Solution 6 ikarayel