'Python handwritten text extraction
I need to extract some text from a image file but I'm not having good results with the handwritten info. It is written on a printed paper which I scanned back with proper scanner
The handwritten info follows a pattern and in most cases is in a blank space and well sized
What I've tried:
- different langs with tesseract (eng_best, eng_fast, por_best, por_fast)
- different image processings before reading (grayscale,blur and lots of others' recipes)
- color thresholding to isolate the blue (none of the ranges I've tried worked)
- erasing printed text with gimp to isolate written stuff (still tesseract wasn't able to be efficient)
- cv2.matchTemplate to find the index of a data and map it
I'm running out of ideas
Solution 1:[1]
You can take advantage of OCR through use of TensorFlow, OpenCV, and Keras. Check out this tutorial: https://www.pyimagesearch.com/2020/08/24/ocr-handwriting-recognition-with-opencv-keras-and-tensorflow/
Here are some base images for image comparison and learning: http://yann.lecun.com/exdb/mnist/
They have a great breakdown, that may help you understand!
Solution 2:[2]
easyocr is an alternative here! input image adjusted and feed like below 
import cv2
import numpy as np
import easyocr
reader = easyocr.Reader(['en'],gpu = False) # load once only in memory.
image_file_name='handwritten.png'
image = cv2.imread(image_file_name)
# sharp the edges or image.
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
sharpen_kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
sharpen = cv2.filter2D(gray, -1, sharpen_kernel)
thresh = cv2.threshold(sharpen, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
r_easy_ocr=reader.readtext(thresh,detail=0)
sample text in bold
['Prontuario=', '0000069450', 'Atendimento=', '824222', 'Nascimento: 12/12/1958', 'Convenio', 'SUS', 'AMBULATORIO', 'Data Atend,', '10/06/2019', '2.31.37', 'Sexo:', 'Masculino', 'Conselho', '41921', 'ANAMNESE', 'CONSULTA URGENCIA', 'QP: REFERE SENSACAO DE ALGO ARRANHANDO EM OD, INICIO ONTEM ,', 'T 1', 'REFERE TER OCULOS, NAQ TROUXE HQJE', 'P 6', 'HMP:', 'NEGA HAS,', 'Ao', 'J', 'NEGA DM,', 'EM USO DE;', 'NADA', 'ALERGIA MEDICAMENTOSA: NEGA', 'Ap L', 'CIRURGIAS OCULARES PREVIA: NEGA:', 'TRAUMA OCULAR PREVIA: NEGA.', '2ol50', '0lh', '1', 'HMF', 'NEGA HISTORIA DE GLAUCOMA OU CEGUEIRA', 'AV SC:', 'Bio', '3', 'OD; 20/50', '20/25P COM PH', 'OE; 20/50', '20/25P COM PH', 'BIO QD=', 'De', '3 . 1', 'PALPEBRAS E TARSOS SA', 'CA PROF, SEM RCA, SEM PKS, PFR', 'C TRANSP', 'SEM AREA CORANDO', 'Pio', '1', 'CE PERIPUPILAR AS 6H', 'BIO OE;', 'Oiag', '4', 'PALPEBRAS E TARSOS SA,', 'CA PROF', 'SEM RCA, SEM PKS, PFR,', 'C TRANSP, SEM AREA CORANDO', 'JTo', '1', 'TBD NORMAL AO', 'CD: RETIRO CE', '1', 'CURATIVO COM REGENCEL', 'REGENCEL E LUBRIFICANTE', 'ORIENTACOES GERAIS', 'RETORNO IMEDIATO SE PIORA', 'SINAIS DE ALARME', 'R1 VANESSA P', 'Jirg']"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | StevenHarvey |
| Solution 2 |
