'How can I create a dictionary from an unordered list, where the list contains the keys which are then followed by multiple values?

I have multiple list which are ordered like the following list:

['SNOMEDCT:', '263681008,', '771269000', 'UMLS:', 'C0443147,', 'C1867440', 'HPO:', 'HP0000006', 'HPO:', 'HP0000006', 'UMLS:', 'C0443147']

I need to transform this list into a dictionary with the words with ":" at the end as keys. The lists are changing, so that sometimes new words with ":" are added. The corresponding values are always at the next position after the word with ":" in the list.

When I start iterating about the list it gets frustrating very quickly because there are to much possibilities for me at the moment. So I would like to ask, if anyone knows a fast transformation from such a list into a dictionary.

I tried multiple iterating processes like the one here to access the words with ':':

checkwords = []
for charnum_list in df_new.char_num:
    try:
        for charnum in charnum_list:
            math.isnan(charnum)        
    except:
        new_charnum_list = []
        for charnum in charnum_list:
            charnum_new = charnum.replace('HP:','HP')
            charnum_new = charnum_new.replace('<','').replace('>','').split(' ')
            for word in charnum_new:
                checkwords.append(word)
diagnosis_dictionaries = list(set([word for word in checkwords if ':' in word]))

output:

diagnosis_dictionaries:

['HPO:', 'ICD9CM:', 'SNOMEDCT:', 'UMLS:', 'ICD10CM:']

Then I tried to iterate again to compare the lists with the values and keys with the list with the keys (above) but at this point i am really desperate, because none of my ideas worked out well.

It would be very nice, if someone has a good idea or a better solution than mine.



Solution 1:[1]

If I interpret your question correctly then I think you're looking to do this:

lst = ['SNOMEDCT:', '263681008,', '771269000', 'UMLS:', 'C0443147,', 'C1867440', 'HPO:', 'HP0000006', 'HPO:', 'HP0000006', 'UMLS:', 'C0443147']

dct = dict()
k = None
for e in lst:
    if e[-1] == ':':
        k = e[:-1]
    else:
        if k is not None:
            dct.setdefault(k, []).append(e)
    
print(dct)

Output:

{'SNOMEDCT': ['263681008,', '771269000'], 'UMLS': ['C0443147,', 'C1867440', 'C0443147'], 'HPO': ['HP0000006', 'HP0000006']}

Note:

The test if k is not None is not necessary for the sample data in the question. However, if the list is modified and the first element does not end with colon, that element will be ignored. There is no check for the element data types - i.e., it is assumed they are strings

Solution 2:[2]

You can use itertools.groupby to create the dictionary. For example:

from itertools import groupby


lst = ['SNOMEDCT:', '263681008,', '771269000', 'UMLS:', 'C0443147,', 'C1867440', 'HPO:', 'HP0000006', 'HPO:', 'HP0000006', 'UMLS:', 'C0443147']


out = {}
for k, g in groupby(lst, lambda i: i.endswith(":")):
    if k:
        out.setdefault(key := next(g).strip(":"), [])
    else:
        out[key].extend(map(lambda s: s.strip(","), g))

print(out)

Prints:

{
    "SNOMEDCT": ["263681008", "771269000"],
    "UMLS": ["C0443147", "C1867440", "C0443147"],
    "HPO": ["HP0000006", "HP0000006"],
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Andrej Kesely