'Python PiPDF2 get unwanted characters from PDF document using Google Colab

I'm trying to extract text from a PDF file using Python. I'm using the PyPDF2 package (version 1.27.12) I run the following code:

Casos = 0
while Casos < registros:
  PDFfile= open(archivo.iloc[Casos,3], 'rb')
  pdfread=P2.PdfFileReader(PDFfile)

  i=0
  while i<pdfread.getNumPages():
    pageinfo=pdfread.getPage(i)
    Origen.append(archivo.iloc[Casos,1])
    Info.append(pageinfo.extractText())
    #print(i)
    i=i+1
  
  #print(i)
  Casos = Casos + 1

The code worked well before, however, right now, I get the following output which is different from that included in the PDF document:

['M idió efe ctiv am ente el impacto de X e n lam arca de la em presa. Intro dujo t ácticas inno v ado ras para']

The correct text is:

Midió efectivamente el impacto de X en la marca de la empresa. Introdujo tácticas innovadoras para

Wonder to know if any update or code modification is required, any observation will be appreciated.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source