'Enum.filter not scalable?
I decode a CSV file (using https://hexdocs.pm/csv/), producing a stream, and I filter this stream with Enum.filter. My problem is that the processing time does not grow linearly with the size of the CSV file:
% wc -l long.csv
10000 long.csv
% time mix run testcvs.exs long.csv
mix run testcvs.exs long.csv 3.08s user 0.50s system 242% cpu 1.479 total
% wc -l verylong.csv
100000 verylong.csv
% time mix run testcvs.exs verylong.csv
mix run testcvs.exs verylong.csv 98.08s user 3.24s system 117% cpu 1:25.93 total
It should take ten times more but it actually takes 57 times more. Definitely not scalable. Does it mean that Enum.filter does not use streaming but instead loads everything in memory? Is there a more scalable way to filter a stream?
The code:
Enum.at(System.argv(), 0)
|> File.stream!([:read], :line)
|> CSV.decode([separator: ?;])
|> Enum.filter(fn {:ok, line} -> Enum.at(line, 11) == "" end)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
