'How to deskew a scanned text page with ImageMagick?

I have scanned documents that weren't scanned perfectly straight so the text is not orientated perfectly horizontally, i.e. perhaps 10° of a slope on each line.

My understanding is that the deskew option in ImageMagick should solve this, for example

convert skewed_1500.jpeg -deskew 40% skewed_1500_not.jpg

but it doesn't have any noticeable effect on the output file.

I've attached the skewed and deskewed images for comparison.

First the original image: skewed image

Then the purportedly deskewed image: deskewed image



Solution 1:[1]

with OCRmyPDF

You can also straighten the pages after first having ImageMagick convert your JPG to PDF (convert input.jpg input.pdf) and then letting OCRmyPDF rectify the PDF:

ocrmypdf --deskew --tesseract-timeout=0 input.pdf output.pdf

Using your example page, I'd say the resulting text is straight:

straightened page, after running OCRmyPDF

As documented here, --tesseract-timeout=0 disables optical character recognition.

Of course you can also deskew the PDF and make it searchable in one go:

ocrmypdf --deskew -l fra input.pdf output.pdf

Make sure to have the French language pack from Tesseract installed before running this. Here are instructions.

Crop the PDF

To get rid of the black parts on the sides and the white part on the bottom of the PDF, you can use pdfcrop (commonly part of TeX Live):

# Remove margins at left, top, right, and bottom
pdfcrop --margins '-60 0 -50 -430' output.pdf cropped_output.pdf

The cropped and deskewed PDF:

PDF cropped with pdfcrop

Solution 2:[2]

This doesn't use Imagemagick but it does the same job of deskew-ing the scanned document/image.

Following is the piece of code that can help you deskew the image:

import numpy as np
from skimage import io
from skimage.transform import rotate
from skimage.color import rgb2gray
from deskew import determine_skew
from matplotlib import pyplot as plt

def deskew(_img):
    image = io.imread(_img)
    grayscale = rgb2gray(image)
    angle = determine_skew(grayscale)
    rotated = rotate(image, angle, resize=True) * 255
    return rotated.astype(np.uint8)

def display_before_after(_original):
    plt.subplot(1, 2, 1)
    plt.imshow(io.imread(_original))
    plt.subplot(1, 2, 2)
    plt.imshow(deskew(_original))

display_before_after('img_35h.jpg')

Reference and Source: http://aishelf.org/deskew/

Solution 3:[3]

You have the right syntax in Imagemagick, but just increase the percentage to 60%.

Input:

enter image description here

convert skewed_1500.jpeg -deskew 60% x.jpg

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Afsan Abdulali Gujarati
Solution 3 fmw42