'Applying weights to KNN dimensions

When doing a KNN searches in ES/OS it seems to be recommended to normalize the data in the knn vectors to prevent single dimensions from over powering the the final scoring.

In my current example I have a 3 dimensional vector where all values are normalized to values between 0 and 1

[0.2, 0.3, 0.2]

From the perspective of Euclidian distance based scoring this seems to give equal weight to all dimensions.

In my particular example I am using an l2 vector:

"method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "nmslib",
          }

However, if I want to give more weight to one of my dimensions (say by a factor of 2), would it be acceptable to single out that dimension and normalize between 0-2 instead of the base range of 0-1?

Example:

[0.2, 0.3, 1.2] // Third vector is now between 0-2

The distance computation for this term would now be (2 * (xi - yi))^2 and lead to bigger diffs compared to the rest. As a result the overall score would be more sensitive to differences in this particular term.

In OS the score is calculated as 1 / (1 + Distance Function) so the higher the value returned from the distance function, the lower the score will be.

Is there a method to deciding what the weighting range should be? Setting the range too high would likely make the dimension too dominant?

knn euclidean-distance opensearch

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Applying weights to KNN dimensions

Sources

Related Questions