'Does encoding of pdf affect text order?
I'm trying to parsing journal articles PDF. normally parsing text in PDF(just text) works fine. but parsing Korean article PDF have some problems. other problems are not a big deal. but incorrect text order is a big deal.
here is sample.
original text
열 노출에 의한 IN738LC의 기계적 특성 및 미세조직 변화
PDF parsing library :
열 노출에 의한의 기계적 특성 및 미세조직 변화IN738LC
sample image english text between korean goes backwards. for test, I opened this pdf by pdf.JS in web. and I got a same result. so I was about to give up.
but in Chrome pdf viewer and Mac preview showed the result in the correct order of the sentence.
I have no idea what is problem. I'm vaguely thinking it's an encoding problem.
So I want to know why this problem occurs.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
