'How can I hash multiple unordered objects? (strings)

I'm looking for a way to get a hash value from a group of strings, such that no matter which order the strings, the same hash returns.

One way I guess would be to sort them before hashing. But I wonder if there's something more elegant.



Solution 1:[1]

let's say your strings includes only lower case English letters (a-z), such that

  • lorem
  • ipsum
  • sit
  • amet

Here you could do simple character histogram for each column. characters will be the indexes and array value will be the number of that character. like

vector <int> _1st_column_hist (26,0);
_1st_col_hist['l' - 'a'] => 1
_1st_col_hist['i' - 'a'] => 1
_1st_col_hist['s' - 'a'] => 1
_1st_col_hist['a' - 'a'] => 1

//other values will be 0.

Do the same things for other columns (or letter indexes). Finally you will have 2D vector. To clarify:

vector< vector<int> > my_vector( 0, vector<int> (26, 0));
my_vector[1][25] => is the number of 'z' in second position of all words in your group.

OK! where is the hash table? I must say that your question is more related to histogram than hashing.

This 2D vector is for one group of strings. I assume you have multiple groups, so your vector needs another dimension. To check if our lookup table already has new group of strings, you need to write group_of_strings_2_hist and compare_hist_with_old_ones functions and compare it each element of your lookup table.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dharman