'Bloom, Cuckoo filters are too big for numbers
I'm trying to use bloom/cuckoo filter to check if i already stored a tuple of 2 numbers f.e. (7,25), (47,1576), ... etc
In [12]: from cuckoo.filter import ScalableCuckooFilter
In [13]: c = ScalableCuckooFilter(initial_capacity=1000000, error_rate=0.0001)
In [14]: c.filters[0].buckets.buffer_info()
Out[14]: (140042688086032, 8500000, 'big', 0, 8500000, 0, 0, 0)
In [15]: 8500000/(1024*1024)
Out[15]: 8.106 MB
to store the numbers i would need 2bytes per number, so 1mln * 4bytes = 3.8 Mb
So the filter is bigger than the data !
Is there a way or structure that is suitable for numbers with lower memory footprint ?
Solution 1:[1]
There is a lower bound on the size of approximate membership queries like cuckoo filters and Bloom filters. It is -log2(epsilon), which for you is 13.3 bits. You can get close to this by using bumped ribbon filters.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jbapple |
