'Python: Calculate score of each pair for WordPiece algorithm
I have a dictionary, splits, as follows:
{'A': ['A'],
'man': ['m', '##a', '##n'],
'’': ['’'],
's': ['s'],
'favorite': ['f', '##a', '##v', '##o', '##r', '##i', '##t', '##e'],
'donkey': ['d', '##o', '##n', '##k', '##e', '##y'],
'falls': ['f', '##a', '##l', '##l', '##s'],
'into': ['i', '##n', '##t', '##o'],
'a': ['a']}
I'm trying to implement WordPiece algorithm and need to compute the score of each pair using the formula:
score = frequency of pair / (frequency of first element * frequency of second element)
frequency of pair: number of times the pair has occurred in the splits.
frequency of first element: number of times the first element has occurred.
frequency of second element: number of times the second element has occurred.
The output need to be as follows:
The problem is I can't figure out how to separate individual pairs from the dictionary and then count their instances. Can anybody help me please?
Thanks.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

