'Python Feature Engineering - How to create dictionary for the text of words from each bounding box coordinates(Words captured via EASYOCR)

I have a requirement which consists of three steps. out of which i am able to complete the first one.

  1. Considering a resume create bounding box to each text paragraph or phrases and get the text from each bounding box. I am able to complete via OpenCV- Here i am able to output the details into a CSV or dataframe as bounding box coordinates, and the text in separate columns(This via EASYOCR).

  2. Second requirement is create a set of dictionaries from all the bounding box texts. Example, for the key value pair - the key {should contain the words from text from each bounding box} and the value {should be something similar to those words - like using word2vec(gensim library - similar method)}

  3. Then to map these dictionaries to each bounding box of the pdf/image document/resume.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source