'How to parse a JSON/dict-like value from an Excel sheet and copy over to another column?

I have an Excel sheet, where data is entered in this format:

{"id":"03CD2B3C22558D83","imp":6667, floor:7777}

How can I in python read and extract a certain value of an attribute in Excel, and copy this on another column? For example, I want to obtain the value 6667.



Solution 1:[1]

This works, simple and no error checking:

enter image description here

MID(A1,FIND("imp",A1,1)+5,4)

Solution 2:[2]

You can use openpyxl to work with Excel files.
The openpyxl tutorial should help you get started on how to use it.

I'll assume this kind of value:

{"id":"03CD2B3C22558D83","imp":6667, floor:7777}

is stored on each cell, like this:

screenshot of Excel file with cells A1-A5 with JSON string values

Using openpyxl, you first read the value of each cell as a string.

from openpyxl import load_workbook

# Assume Excel file named "test.xlsx"
# Assume sheet named "Sheet1"
wb = load_workbook(filename="test.xlsx")
ws = wb["Sheet1"]

# Get the value of cell A1 - A5
# In openpyxl, rows/columns start at index 1
for row_idx in range(1, 6):
    cell = ws.cell(row=row_idx, column=1)
    print(type(cell.value), cell.value)
$ python test.py
<class 'str'> {"id":"03CD2B3C22558D83","imp":6667, floor:7777}
<class 'str'> {"id":"1111111111111111","imp":1111, floor:1000}
<class 'str'> {"id":"2222222222222222","imp":2222, floor:2000}
<class 'str'> {"id":"3333333333333333","imp":3333, floor:3000}
<class 'str'> {"id":"4444444444444444","imp":4444, floor:4000

Then, once you can access each cell value, convert it to a regular Python dictionary.

Now, normally you can use json module's .loads to convert. But the problem is that the JSON string is malformed, it's missing double-quotes "" around floor. You can check on https://jsonlint.com/:

JSONLint error

It would have worked nicely if floor was wrapped in double-quotes:

>>> import json
>>> s = '{"id":"03CD2B3C22558D83","imp":6667, "floor":7777}'
>>> d = json.loads(s)
>>> d["imp"]
6667

If you have no control over how the data was written to the Excel file, check the workarounds for parsing a somewhat wrong JSON with Python. A quick workaround I usually use is the dirtyjson library:

>>> import dirtyjson
>>> s = '{"id":"03CD2B3C22558D83","imp":6667, floor:7777}'
>>> d = dirtyjson.loads(s)
>>> d
AttributedDict([('id', '03CD2B3C22558D83'), ('imp', 6667), ('floor', 7777)])
>>> d = dict(d)
>>> d
{'id': '03CD2B3C22558D83', 'imp': 6667, 'floor': 7777}
>>> d["imp"]
6667

So, assuming there are no typos in the example data, let's use dirtyjson to convert to a dict, then access the imp value:

import dirtyjson
from openpyxl import load_workbook

# Assume Excel file named "test.xlsx"
# Assume sheet named "Sheet1"
wb = load_workbook(filename="test.xlsx")
ws = wb["Sheet1"]

# For each value of cell A1 - A5
#   Convert the value to a dictionary
for row_idx in range(1, 6):  # In openpyxl, rows/columns start at index 1
    cell = ws.cell(row=row_idx, column=1)
    data = dict(dirtyjson.loads(cell.value))
    print(data, data["imp"])
$ python test.py
{'id': '03CD2B3C22558D83', 'imp': 6667, 'floor': 7777} 6667
{'id': '1111111111111111', 'imp': 1111, 'floor': 1000} 1111
{'id': '2222222222222222', 'imp': 2222, 'floor': 2000} 2222
{'id': '3333333333333333', 'imp': 3333, 'floor': 3000} 3333
{'id': '4444444444444444', 'imp': 4444, 'floor': 4000} 4444

Finally, simply write the value to another column. Let's assume, you want to copy it over to the B column on the same row. You can pick any other column or cell, just remember that openpyxl processes cell rows and columns with 1-index notation, so for the B column, that would be column=2.

import dirtyjson
from openpyxl import load_workbook

# Assume Excel file named "test.xlsx"
# Assume sheet named "Sheet1"
wb = load_workbook(filename="test.xlsx")
ws = wb["Sheet1"]

# For each value of cell A1 - A5
#   Convert the value to a dictionary
#   Write the "imp" value to the B column
for row_idx in range(1, 6):  # In openpyxl, rows/columns start at index 1
    cell = ws.cell(row=row_idx, column=1)
    data = dict(dirtyjson.loads(cell.value))
    ws.cell(
        row=row_idx,
        column=2,  # B
        value=data["imp"],
    )

# DO NOT FORGET TO SAVE AND CLOSE WHEN DONE!
# Also, keep the file closed when writing
wb.save(filename="test_output.xlsx")
wb.close()

sample output Excel file with values written to B column


Tested with

  • Python 3.9.10
  • openpyxl 3.0.9
  • dirtyjson 1.0.7
  • macOS 10.15.7 with Microsoft Excel for Mac 16.6.6

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Solar Mike
Solution 2