'How to I convert it to dict

the list I have -

[
    "Mathematics-2 (21SMT-125)",
    "Mid-Semester Test-1",
    "40",
    "23.5",
    "Mid-Semester Test-2",
    "40",
    "34",
    "Disruptive Technologies - 2 (21ECH-103)",
    "Experiment-1",
    "20",
    "19",
    "Experiment-2",
    "20",
    "17",
    "Experiment-3",
    "20",
    "18.5",
]

This list of stings is parsed from html using bs4

format to convert in :

{
    "Subject": {
        "Mathematics-2 (21SMT-125)": {
            "Mid-Semester Test-1": [40,23.5],
            "Mid-Semester Test-2": [40,34]
            },
        "Disruptive Technologies - 2  (21ECH-103)": {
            "Experiment-1": [20,19],
            "Experiment-2": [20,17],
            "Experiment-3": [20,18.5]
        }
    }
}


Solution 1:[1]

The problem is that the list you provided is a flat list of items with no indicator of their hierarchical position in the desired structure.

One approach you could consider is if the entries that represent a parent object (Mathematics, etc...) are the only entries that contain parentheses, you could iterate on your list and use either string matching or regex to identify the parent, create a top level object for it then you'd need to add the next two entries as the value of the key/value pair as a list.

This assumes that you'll always have two subsequent values at the child level. If the number of attributes isn't fixed but they're always numeric you could use regex to determine if it's numeric or non-numeric and keep adding items to the value list until you hit another non-numeric entry, which would be treated as the next sibling in the hierarchy.

Solution 2:[2]

I would review the approach and check whether information from bs4 can be parsed in some smarter way - try to do more scrapping steps, first to reach subject, second "Semester/Experiment" third - grades.

If it's not possible and data returned from bs4 cannot be changed.. Only thing you can do is to try determine whether string is name of subject, semester or grade/score and try to use some while loops. Name of subject seems to have special code in the end, which can be distinguished from name of the semester/experiment using regexp and grade/scrore can be always parsed to number..

Solution 3:[3]

For data exactly like yours (where a string with a ( denotes a top-level entry, and there are always two numbers per entry), you could come up with a state machine sort of thing like this -- but like I commented, you really should improve your parsing code instead, since the HTML you're scraping your data off is likely already structured.

def is_float(s):
    try:
        float(s)
        return True
    except ValueError:
        return False


def parse_inp(inp):
    flat_map = {}
    stack = []
    x = 0
    while x < len(inp):
        if "(" in inp[x]:
            stack.clear()
        if is_float(inp[x]) and is_float(inp[x + 1]):
            flat_map[tuple(stack)] = (float(inp[x]), float(inp[x + 1]))
            x += 2
            stack.pop(-1)
            continue
        stack.append(inp[x])
        x += 1
    return flat_map


def nest_flat_map(flat_map):
    root = {}
    for key_path, values_list in flat_map.items():
        dst = root
        for key in key_path[:-1]:
            dst = dst.setdefault(key, {})
        dst[key_path[-1]] = values_list
    return root

inp = [
    # ... data from original post
]
nested_map = nest_flat_map(parse_inp(inp))
print(nested_map)

This outputs the expected

{
    "Mathematics-2 (21SMT-125)": {
        "Mid-Semester Test-1": (40.0, 23.5),
        "Mid-Semester Test-2": (40.0, 34.0),
    },
    "Disruptive Technologies - 2 (21ECH-103)": {
        "Experiment-1": (20.0, 19.0),
        "Experiment-2": (20.0, 17.0),
        "Experiment-3": (20.0, 18.5),
    },
}

Solution 4:[4]

You can use a fuzzy form of itertools.groupby to find the groups in this list of strings. This assumes that every class ends with the pattern "(classref-section)", and that it is followed by test or homework names each followed by one or more numeric scores.

source_data = [
    "Mathematics-2 (21SMT-125)",
    "Mid-Semester Test-1",
    "40",
    "23.5",
    "Mid-Semester Test-2",
    "40",
    "34",
    "Disruptive Technologies - 2 (21ECH-103)",
    "Experiment-1",
    "20",
    "19",
    "Experiment-2",
    "20",
    "17",
    "Experiment-3",
    "20",
    "18.5",
]

from collections import defaultdict
import itertools
import json
import re


class_id_pattern = re.compile(r"\([A-Z0-9]+-\d+\)")

def is_class_reference(s):
    return bool(class_id_pattern.match(s.rsplit(" ", 1)[-1]))

def group_by_class(s):
    if is_class_reference(s):
        group_by_class.current_class = s
    return group_by_class.current_class

group_by_class.current_class = ""


def convert_numeric(s):
    try:
        return int(s)
    except ValueError:
        try:
            return float(s)
        except ValueError:
            return None

def is_score(s):
    return convert_numeric(s) is not None

def is_test(s):
    return not is_score(s)

def group_by_test(s):
    if is_test(s):
        group_by_test.current_test = s
    return group_by_test.current_test

group_by_test.current_test = ""


accum = defaultdict(lambda: defaultdict(list))

for class_name, class_name_and_tests in itertools.groupby(source_data, key=group_by_class):
    class_name, *tests = class_name_and_tests
    for test_name, test_name_and_scores in itertools.groupby(tests, key=group_by_test):
        test_name, *scores = test_name_and_scores
        accum[class_name][test_name].extend(convert_numeric(s) for s in scores)
print(json.dumps(accum, indent=4))

Prints:

{
    "Mathematics-2 (21SMT-125)": {
        "Mid-Semester Test-1": [
            40,
            23.5
        ],
        "Mid-Semester Test-2": [
            40,
            34
        ]
    },
    "Disruptive Technologies - 2 (21ECH-103)": {
        "Experiment-1": [
            20,
            19
        ],
        "Experiment-2": [
            20,
            17
        ],
        "Experiment-3": [
            20,
            18.5
        ]
    }
}

Read more about fuzzy groupby in my blog post: https://thingspython.wordpress.com/2020/11/11/fuzzy-groupby-unusual-restaurant-part-ii/

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 bindlegrunt
Solution 2 Robert Radzik
Solution 3 AKX
Solution 4