'pytesseract process Tif to hocr output getting input file error

I am extracting a tiff image which is in the D drive to .hocr format and output in the D drive.

below is my code

from cgitb import html
from distutils.command.config import config
from PIL import Image
from pytesseract import pytesseract
pytesseract.pytesseract.tesseract_cmd =r'C:\Programs\Tesseract-OCR\tesseract.exe'
pytesseract.run_tesseract('D:\image.tif' ,'output', extension="hocr",lang = None,config="hocr"')

in my case, the code is on a D drive with the input image and output folder.

The error:

pytesseract.pytesseract.TesseractError: (1, 'Error, cannot read input file D:\image.tif: Invalid argument Error during processing.')

What went wrong? I am a beginner in this program.

I have tested with sample JPG image with cv2.imshow('sample image',img) it displays.

I have tried to modify the code in a different way but even it has an error

from email.mime import image
from statistics import mode
from tkinter import W
from unittest import result
import pytesseract
from PIL import Image
img = image.open("D:/Python_OCR/OCR/Ocr_extract/input/514.png")
print (img)
pytesseract.pytesseract.tesseract_cmd ="C:/Programs/Tesseract-OCR/tesseract.exe"
result = pytesseract.image_to_string(img)
with open("D:/input/image.txt",mode ="W") as file:
file.write(result)

` The error

file.write(result)
    ^
IndentationError: expected an indented block after 'with' statement on line 11

help appreciated.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'pytesseract process Tif to hocr output getting input file error

Sources

Related Questions