'pytesseract process Tif to hocr output getting input file error
I am extracting a tiff image which is in the D drive to .hocr format and output in the D drive.
below is my code
from cgitb import html
from distutils.command.config import config
from PIL import Image
from pytesseract import pytesseract
pytesseract.pytesseract.tesseract_cmd =r'C:\Programs\Tesseract-OCR\tesseract.exe'
pytesseract.run_tesseract('D:\image.tif' ,'output', extension="hocr",lang = None,config="hocr"')
in my case, the code is on a D drive with the input image and output folder.
The error:
pytesseract.pytesseract.TesseractError: (1, 'Error, cannot read input file D:\image.tif: Invalid argument Error during processing.')
What went wrong? I am a beginner in this program.
I have tested with sample JPG image with cv2.imshow('sample image',img) it displays.
I have tried to modify the code in a different way but even it has an error
from email.mime import image
from statistics import mode
from tkinter import W
from unittest import result
import pytesseract
from PIL import Image
img = image.open("D:/Python_OCR/OCR/Ocr_extract/input/514.png")
print (img)
pytesseract.pytesseract.tesseract_cmd ="C:/Programs/Tesseract-OCR/tesseract.exe"
result = pytesseract.image_to_string(img)
with open("D:/input/image.txt",mode ="W") as file:
file.write(result)
` The error
file.write(result)
^
IndentationError: expected an indented block after 'with' statement on line 11
help appreciated.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
