'Python: Can't read data from png with bad quality using tesseract
In my task I have some png with Cyrillic text and each word has float indicator which can be both negative or positive. I need to save the data from png. The quality is way too bad and tesseract is not able to recognize the data, I have tried to change the resolution, put few same png samples on each other to perform a better quality in binary format, still, nothing helped. Can anyone please give me some new ideas to resolve this problem? Below is the code and my approach to resolve the issue with png quality.
import pytesseract as tess
from PIL import Image
import cv2
import sys
tess.pytesseract.tesseract_cmd = 'C:\\Users\\Admin\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'
# read the image file
img = cv2.imread('main_pic.png')
scale_percent = 200 # percent of original size
width = int(img.shape[1] * scale_percent / 100)
height = int(img.shape[0] * scale_percent / 100)
dim = (width, height)
# resize image
resized = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
gray = cv2.cvtColor(resized, cv2.COLOR_BGR2HSV)
gray[:,:,2] = [[max(pixel - 25, 0) if pixel < 190 else min(pixel + 25, 255) for pixel in row] for row in gray[:,:,2]]
first_img = cv2.cvtColor(gray, cv2.COLOR_BGR2GRAY)
cv2.imshow('contrast', first_img)
cv2.waitKey(0)
text = tess.image_to_string(first_img, lang='rus')
print(text)
cv2.imshow("Bluered", first_img)
cv2.waitKey(0)
cv2.destroyAllWindows()
exit()
The Output:
Признак
Прекрасный
Зловещий
В озвышонмьвй
Б одрый
Светлый
М сли—стольный
Я рт
Радостный
Сильный
Т яхеяый
угрюмый
С! ршитспьшй
Нитра-павший
Нетнй
П очвпьиый
Суровый
Т оск яивый
Значение
9,81
7,72
ВГП-ня
Стрмпвлыщя
Чирпан)…
Наш
Печальный
Сирони
Тиц-ный
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
