'Hash Function for a 7 digits int
I'm new to hash tables and functions, so I apologize in advance if I got anything wrong.
I'm trying to create a hash table in C++ for a list of about 100k entries comprised of a 7 digit number. The thing is, I got stuck while trying to figure out what hash function to use.
When using %100000 I got ~65k unique keys, while there are ~90k unique entries. Which means that about 1/3 of the data will have collisions.
Is this a good hash function to use? Or is there a better function to use in that case in order to have less collisions? What size should my table be?
Thanks again!
Edit- The entries are numbers between 1 and 2 mil. Is it possible to use the number itself as the key? Or should keys for hash table always start at 0?
Solution 1:[1]
The standard library comes with two types internally using a hash table: std::unordered_map and std::unordered_set.
As your key type is an integral type you get a hash table pretty conveniently by std::unordered_map<YourIdType, YourDataType. You can easily access the data via theMap[someId], but be aware that if the key is not found a new object is created! If that's not desired you'd rather use theMap.find(someId), which returns an iterator.
The drawback, though, is that you store the id twice then (internally as a std::pair<YourIdType, YourDataType>). You can avoid that by using a std::unordered_set. To be able to do so, though, you need to specialise std::hash and std::equal_to for your type:
namespace std // you are not allowed to add to – with exception of specialisations
{
template<>
struct hash<YourDataType>
{
size_t operator()(YourDataType const& object) const
{
return hash<YourIdType>()(object.id);
}
};
// analogously equal_to with appropriate comparisons, possibly by
// comparing the object's addresses
Alternatively you can provide a custom hasher type (with C++20 that can even be a lambda packed into decltype) to the set as second template parameter and just implement operator== for your object type, or provide a custom equality comparer type if you need it to compare differently than the operator, maybe like:
// C++20 required:
using YourMapType = std::set
<
YourDataType,
decltype
(
[](YourDataType const& object)
{ return std::hash<YourIdType>()(object.id); }
),
decltype
(
[](YourDataType const& o1, YourDataType const& o2)
{ return &o1 == &o2; } // TODO: comparisons as you need!
)
>;
// alternatively create custom types with appropriate operator() implementations
Drawback here is – apart from additional complexity for the specialisations – that you cannot lookup objects by the id only, instead you need a complete object of your data type.
So which one is more appropriate/suitable? That depends on your concrete requirements...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
