'How large should my list of objects be to warrant the use of java 8's parallelStream?

I have a list of objects from the database and i want to filter this list using the filter() method of the Stream class. New objects will be added to the database continuously so the list of objects could potentially become very large, possibly thousands of objects. I want to use a parallelStream to speed up the filter process but i was wondering how large the object list should approximately be to make the use of parallelStream benificial. I've read this thread about it: Should I always use a parallel stream when possible? And in this thread they agree that the dataset should be really large if you want to have any benefit from using a parallel stream. But how large is large? Say I have 200 records stored in my database and i retrieve them all for filtering, is using a parallelstream justified in this case? If not, how large should the dataset be? a 1000? 2000 perhaps? I'd love to know. Thank you.



Solution 1:[1]

According to this and depending on the operation it would require at least 10_000, but not elements; instead N * Q where N = number of elements and Q = cost per element.

But this is a general formula you push against, without measuring this is close to impossible to say (read guess here); proper tests will prove you wrong or right.

For some simple operations, it is almost never the case when you would actually need parallel processing for the purpose of speed-up.

Some other things to mention here, is that this heavily depends on the source - how easy it is to split. Anything array-based or index-based are easy to split (and fast), but a Queue or lines from a File do not, so you will probably lose more time splitting rather than computing, unless, of course, there are enough elements to cover for this. And enough is something you actually measure.

Solution 2:[2]

from 'Modern java in Action': "Although it may seem odd at first, often the fastest way to filter a collection...is to convert it to a stream, process it in parallel, and then convert it back to a list"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Eugene
Solution 2