'Python Feature Engineering - How to create dictionary for the text of words from each bounding box coordinates(Words captured via EASYOCR)
I have a requirement which consists of three steps. out of which i am able to complete the first one.
Considering a resume create bounding box to each text paragraph or phrases and get the text from each bounding box. I am able to complete via OpenCV- Here i am able to output the details into a CSV or dataframe as bounding box coordinates, and the text in separate columns(This via EASYOCR).
Second requirement is create a set of dictionaries from all the bounding box texts. Example, for the key value pair - the key {should contain the words from text from each bounding box} and the value {should be something similar to those words - like using word2vec(gensim library - similar method)}
Then to map these dictionaries to each bounding box of the pdf/image document/resume.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
