'Compare 2 One Bit Images for Similarity

I am trying to compare a source image against thousands of images in a set to get a similarity score of the most likely matches (0 - 1). Each image is small (64x64 or smaller). Each image is 1 bit, meaning that each pixel is either off (fully transparent) or on (fully white). I am trying to create a very fast similarity algorithm to compare these images. I have found many similarity algorithms via Google search but they all involve comparing large, full color images, which I don't need to do.

I realize I can just compare pixels that match / don't match, but this can be potentially slow, given that the compare set can be very large. The compare set images will all be the same exact size as the lookup image.

Is it possible to create hashes or other fast lookups for these kinds of images where a hash or binary search lookup could be performed and similarity score created with the most likely matches?



Solution 1:[1]

One way to do this would be a binary tree. Each image's pixel could be converted to a string of 1's and 0's. Then that string could be used to construct a binary tree.

While checking for a new string, you just start following where the path takes you, if you reach a leaf node, then it was present, if you don't then its new.

The image above shows a tree constructed using 3 strings of length 4

1010
0110
0001

So, if 0001 comes again, just follow the path, if you end up in a leaf (filled circle) then the string (image) is duplicate and has occurred again. If not, then you can add it also, while knowing it is new and unique.

It will take 0(n) time for each comparison and addition where n is the length of the string. In your case n == 32*32.

Solution 2:[2]

You could implement a quadtree structure https://en.wikipedia.org/wiki/Quadtree

Segment your images recursively. At each level, store the number of 1 and/or 0 pixels (one can be computed from the other)

Ex : for this image :

0 1 1 0

0 1 0 1

0 0 0 0

0 0 1 0

You compute the following tree :

(5)

(2) - (2) - (0) - (1)

(0) - (1) - (0) - (1) - - - (1) - (0) - (0) - (1) - - - (0) - (0) - (0) - (0) - - - (0) - (0) - (1) - (0)

The higher levels of the tree are coarser versions of the image :

First level :

5/16

Second level :

2/4 2/4

0/4 1/4

Then, your similarity score could be computing whether the number of 0s and 1s is different, at different levels of recursion, with a weight at each level. And you could get an approximation of it (to quickly dismiss very different images) by not going down the whole tree.

Solution 3:[3]

If you find that comparing all images completely (using e.g. ChronoTrigger's answer) still takes too much time, consider these two strategies to reduce the number of necessary comparisons.

I will assume that the images are compared line-by-line. You start by comparing the first image completely, store its score as the maximum, then move on to the next, each time updating the maximum as necessary. While comparing each image line-by-line, you do the following after each line (or after each n lines):

  1. Check if the number of mismatched bits so far exceeds the number of mismatches in the image with the maximum score so far. If it does, this image can never reach the maximum score, and it can be discarded.
  2. If the average score per line so far is lower than the average score per line of the image with the maximum score, leave the comparison to be continued during the next run, and skip to the next image.

Repeat this until all images have been completely checked, or discarded.

Trying this strategy on 100 random 32x32-pixel images based on an example image, each with a random number of bits altered, gave this promising result:

FIRST RUN (100 images):
images checked completely:             5    (maximum score: 1015, image #52)
postponed after 1 line:               59
discarded after 1 line:               35
discarded after 10 lines:              1

SECOND RUN (59 images):
discarded without additional checks:  31    (because of new maximum score)
discarded after 1 additional line:    12
discarded after 2 additional lines:    9
discarded after 3 additional lines:    1
discarded after 4 additional lines:    3
discarded after 5 additional lines:    1
discarded after 6 additional lines:    2

Total number of lines compared:      326 out of 3200 lines (~ 10.1875 out of 100 images)

Solution 4:[4]

If your image stores pixel data in bitmap-like format, then every line is just 32-bit integer value, and you can simply compare image lines

for iy = 0 to ImageHeight - 1 do
  if   CastToInt32(Image1.Scanline[0]) <> CastToInt32(Image2.Scanline[0]) then 
     break due to inequality
//32 comparisons or less

For the case of approximate similarity you can calculate the overall number of discrepancies counting set bits in xor-ed values for all lines.

NumberOf1Bits(Value1 xor Value2)

P.S. Straightforward implementation in Delphi takes 300 nanoseconds per one image/image comparison (0.3 sec for 1 million images). Single thread, i5 processor, mismatching limit 450. Time will be significantly less for low mismatching limit (47 ns for limit 45). The main time eater - NumberOf1Bits/popcount function.

Solution 5:[5]

I made an image hashing class for the Skia4Delphi library unit tests. It generates a hash that makes it possible to compare the similarity percentage between 2 images using only the hash. The focus was on accuracy and not performance, but the performance is not bad. To use it, you must have Skia4Delphi installed. Check the source: https://github.com/skia4delphi/skia4delphi/blob/main/Tests/Source/Skia.Tests.Foundation.ImageHash.pas

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 A.N.
Solution 3
Solution 4 Community
Solution 5 vfbb