'Why does HashMap need a cryptographically secure hashing function?
I'm reading a Rust book about HashMap hashing functions, and I can't understand these two sentences.
By default, HashMap uses a cryptographically secure hashing function that can provide resistance to Denial of Service (DoS) attacks. This is not the fastest hashing algorithm available, but the trade-off for better security that comes with the drop in performance is worth it.
I know what a cryptographically secure hash function is, but don't I understand the rationale behind it. From my understanding a good hash function for HashMap should only have three properties:
- deterministic (the same object has same hash value)
- be VERY fast,
- has a uniform distribution of bits in hash value (meaning it will reduce collision)
Other properties, in cryptographically secure hash function, are not really relevant 99% (maybe even 99.99%) of the time for hash tables.
So my question is: What does "resistance to DoS attack and better security " even mean in the context of HashMap?
Solution 1:[1]
Let's say we use HashMap to store some user data in a web-application. Suppose that users can choose (part of) the key in some way – maybe the key is a username or a filename of an uploaded file or anything like that.
If we are not using a cryptographically secure hash function, this means that the attacker could possible craft multiple inputs that all map to the same output. Of course, a hash map has to deal with collisions, because they occur naturally.
But when unnaturally many collisions occur, the hash map implementation might do strange things. For example, looking up some keys could have a runtime of O(n). Or the hash map might think that it has to grow because of all the collisions; but growing won't solve the problem, so the hash map grows until all memory is used. In either case, it's bad. Hash maps just assume that statistically, collisions rarely occur.
Of course, this is not a "stealing user data" attack -- at least not directly. But if one part of a system is weak, this makes it easier for attackers to find other weaknesses.
A cryptographically secure hash function prevents this attack, since the attacker cannot possibly craft multiple keys that map to the same value (at least not without trying out all keys).
is not really relevant 99% (maybe even 99.99%) of the time for hash tables.
Yes, probably. But this is difficult to balance. I guess we all would agree that if 20% of users would have security problems in their application due to an unsecure hash function (while 80% don't care), it's still a good idea to use the "secure by default" approach. What about 5%/95%? What about 1%/99%? Hard to tell where the threshold is, right?
There has been a ton of discussion about this already. Because yes, most people only notice the slowness of the hash map. Maybe the situation I described above is incredibly rare and it isn't worth slowing down all other users' code by default. But this has been decided, the default hash function won't change, and luckily you can choose your own hash function.
Solution 2:[2]
If a server application stores user input (such as post data in a web application) in a hash table, a malicious user may try to provide a large number of inputs that all have the same hash value, leading to a large number of hash collisions and thus slowing down operations on the map significantly, to the point that it can be used as a DoS attack (as described in this article for example).
If the hash is cryptographically secure, attackers will have a much harder time trying to find inputs with the same hash value.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Lukas Kalbertodt |
| Solution 2 | sepp2k |
