'Partition a list of strings into n number of partitions which are deterministically reproducible, given n can change
I have this requirement wherein I need to assign every string in a list, a partition in the range [1,n] randomly. These partitions should be reproducible and should not change on varying n.
I am able to achieve the first requirement pretty easily using hashlib
in python. The issue is the second requirement.
For eg:
import hashlib
string = "johndoe"
digest = hashlib.md5(string.encode()).hexdigest()
print("Partition id:", int(digest,16)%8)
This gives me 7 for n = 8
. But if n = 9
, the partition id changes to 4
How can I retain and reproduce the partitions for a given string on varying n?
The number of partitions n is a number in [8,50] The size of the list is < 200,000
If reproducing is not possible, is a solution with minimum movement of strings between the partitions on the cards?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|