'Getting this error 'Too many connected components for a page image : ' when using Kraken library in python on an image

I am trying to read a newspaper using OCR using tessaract. Before passing the image to tessaract, I am using Kraken to segment the actual lines and draw a line across the sentences for proper detection by tessaract. When passing the image through kraken.pageseg.segment , I am getting an empty list and this output Too many connected components for a page image : 5903 , instead it should have returned a list containg the coordinates of the bounding box around the sentences.

I looked up the source code of kraken and found this perticular error message, but I am unable to understand it. [Source code for error][1]

[1]: https://github.com/mittagessen/kraken/blob/master/kraken/pageseg.py#:~:text=connected%20components%20for%20a-,page,-image%3A%20%7Bccs%7D%27)



Solution 1:[1]

Try downgrading the package to version "2.0.1"

    pip install kraken==2.0.1

I had the same problem with higher versions and downgrading just solved it.

Solution 2:[2]

I had the same problem and solved it after looking at the Kraken API quickstart guide.

Try changing your image binarization. If you were doing binarization with PIL (Pillow), use the kraken binarization method like this:

from PIL import Image
from kraken import binarization, pageseg

im = Image.open('foo.png')
bw_im = binarization.nlbin(im)
seg_data = pageseg.segment(bw_im)

Reference: https://kraken.re/master/api.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Just Guest
Solution 2 abear