'How to integrate tesseract-ocr with tika?

I need to integrate the tesseract-ocr which converts scanned image as pdf to text.

There is tesseractOCRParser already available.

But there is no invoke method given.

When I am trying to build tika with tesseract-ocr referral path I am getting the following error

Results:

Failed tests:   
testNoConfig(org.apache.tika.parser.ocr.TesseractOCRConfigTest): 
Invalid default tesseractPath value expected:<[]> but was: 
<[/home/serendio/tesseract-ocr/]>

Tests run: 569, Failures: 1, Errors: 0, Skipped: 7

Can anyone help me out ???

Or any other-way to resolve this problem??

tesseract apache-tika

Solution 1:^[1]

I think this can help : https://wiki.apache.org/tika/TikaOCR I followed this guide and I was able to easily extract the content! I simply installed Tesseract and then Tika.

Using Tika 1.9 I was easily able to : - extract the content directly calling a local Tika server - extract the content in a custom application ( you can use the tika-example project) with no effort .

No modification was needed. Everything working out of the box.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'How to integrate tesseract-ocr with tika?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]