'OCR character segmentation character count for burmese characters
I am currently making OCR program for physical NRC to convert digital. Here, I am sticking with how can I know the exact number of word count in incoming input segments.
Language is burmese. In the first photo, its two character but the first character has opening at his bottom when the other doesn't. I tried with top down profile which finds for the black pixels and also I thought plotting histogram would solve the problem but as I said above because of character difference it can't always be right and some words would split. So, I would like to know some methods I can try.


Solution 1:[1]
Correct segmentation of such aggregates is impossible without knowledge of the possible characters. You need to combine segmentation with recognition.
It is even possible that several solutions exist and in such a case you can't arbitrate without a lexicon, or can't arbitrate at all.
Think of W vs VV.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Yves Daoust |
