'Algorithm for ordering data so that neighbor elements are as identical as possible

I have a (potentially large) list data of 3-tuples of small non-negative integers, like

data = [
    (1, 0, 5),
    (2, 4, 2),
    (3, 2, 1),
    (4, 3, 4),
    (3, 3, 1),
    (1, 2, 2),
    (4, 0, 3),
    (0, 3, 5),
    (1, 5, 1),
    (1, 5, 2),
]

I want to order the tuples within data so that neighboring tuples (data[i] and data[i+1]) are "as similar as possible".

Define the dissimilarity of two 3-tuples as the number of elements which are unequal between them. E.g.

(0, 1, 2) vs. (0, 1, 2): Dissimilarity 0.
(0, 1, 2) vs. (0, 1, 3): Dissimilarity 1.
(0, 1, 2) vs. (0, 2, 1): Dissimilarity 2.
(0, 1, 2) vs. (3, 4, 5): Dissimilarity 3.
(0, 1, 2) vs. (2, 0, 1): Dissimilarity 3.

Question: What is a good algorithm for finding the ordering of data which minimizes the sum of dissimilarities between all neighboring 3-tuples?

Some code

Here's a function which computes the dissimilarity between two 3-tuples:

def dissimilar(t1, t2):
    return sum(int(a != b) for a, b in zip(t1, t2))

Here's a function which computes the summed total dissimilarity of data, i.e. the number which I seek to minimize:

def score(data):
    return sum(dissimilar(t1, t2) for t1, t2 in zip(data, data[1:]))

The problem can be solved by simply running score() over every permutation of data:

import itertools
n_min = 3*len(data)  # some large number
for perm in itertools.permutations(data):
    n = score(perm)
    if n < n_min:
        n_min = n
        data_sorted = list(perm)
print(data_sorted, n_min)

Though the above works, it's very slow as we explicitly check each and every permutation (resulting in O(N!) complexity). On my machine the above takes about 20 seconds when data has 10 elements.

For completeness, here's the result of running the above given the example data:

data_sorted = [
    (1, 0, 5),
    (4, 0, 3),
    (4, 3, 4),
    (0, 3, 5),
    (3, 3, 1),
    (3, 2, 1),
    (1, 5, 1),
    (1, 5, 2),
    (1, 2, 2),
    (2, 4, 2),
]

with n_min = 15. Note that several other orderings (10 in total) with a score of 15 exist. For my purposes these are all equivalent and I just want one of them.

Final remarks

In practice the size of data may be as large as say 10000.

The sought-after algorithm should beat O(N!), i.e. probably be polynomial in time (and space).

If no such algorithm exists, I would be interested in "near-solutions", i.e. a fast algorithm which gives an ordering of data with a small but not necessarily minimal total score. One such algorithm would be lexicographic sorting, i.e.

sorted(data)  # score 18

though I hope to be able to do better than this.

Edit (comments on accepted solution)

I have tried all of the below heuristic solutions given as code (I have not tried e.g. Google OR-tools). For large len(data), I find that the solution of Andrej Kesely is both quick and gives the best results.

The idea behind this method is quite simple. The sorted list of data elements (3-tuples) is built up one by one. Given some data element, the next element is chosen to be the most similar one out of the remaining (not yet part of the sorted) data.

Essentially this solves a localized version of the problem where we only "look one ahead", rather than optimizing globally over the entire data set. We can imagine a hierarchy of algorithms looking n ahead, each successively delivering better (or at least as good) results but at the cost of being much more expensive. The solution of Andrej Kesely then sits lowest in this hierarchy. The algorithm at the highest spot, looking len(data) ahead, solves the problem exactly.

Let's settle for "looking 1 ahead", i.e. the answer by Andrej Kesely. This leaves room for a) the choice of initial element, b) what to do when several elements are equally good candidates (same dissimilarity) for use as the next one. Choosing the first element in data as the initial element and the first occurrence of an element with minimal dissimilarity, both a) and b) are determined from the original order of elements within data. As Andrej Kesely points out, it then helps to (lex)sort data in advance.

In the end I went with this solution, but refined in a few ways:

I try out the algorithm for 6 initial sortings of data; lex sort for columns (0, 1, 2), (2, 0, 1), (1, 2, 0), all in ascending as well as descending order.
For large len(data), the algorithm becomes too slow for me. I suspect it scales like O(n²). I thus process chunks of the data of size n_max independently, with the final result being the different sorted chunks concatenated. Transitioning from one chunk to the next we expect a dissimilarity of 3, but this is unimportant if we keep n_max large. I go with n_max = 1000.

As an implementation note, the performance can be improved by not using data.pop(idx) as this itself is O(n). Instead, either leave the original data as is and use another data structure for keeping track of which elements/indices have been used, or replace data[idx] with some marker value upon use.

Solution 1:^[1]

Unfortunately, this problem is NP-complete. You can show this by a reduction from the Hamiltonian path problem on planar 3-regular bipartite graphs, which is also NP-complete.

As an overview of the proof: we will create a 3-Tuple for each vertex of our graph, such that pairs of corresponding 3-Tuples have dissimilarity equal to 2 if the vertices are adjacent, and dissimilarity equal to 3 if the vertices are not adjacent. The members of each vertex's 3-Tuples will correspond uniquely to the vertex's incident edges.

Proof:
Suppose we're given as input an undirected planar cubic bipartite graph G = (V, E) on which we're trying to find a Hamiltonian path. We can find a 3-coloring of the edges in linear time. Suppose our three edge-colors are 0, 1, 2. For each edge e in E, give it a unique natural number label L(e) such that L(e) mod 3 is equal to e's color.

For example, with this cubic planar bipartite graph:

We can color the edges with colors 0, 1 and 2:

And then label the edges with minimal natural numbers L(e) that respect the colors mod 3:

For each vertex v in V, create a 3-Tuple T = (t0, t1, t2), where t0, t1, t2 are the labels of edges incident to v with remainders equal to 0, 1, 2 respectively. Note that each edge label appears at the same index of each 3-Tuple it belongs to. In the example above, the top left vertex will get a 3-Tuple (0, 1, 29) and the left-most vertex will get a 3-Tuple (0, 16, 32).

Now, there is a Hamiltonian path in G if and only if there is an ordering of 3-Tuples with dissimilarity sum
2 * (|V| - 1). This is because an ordering of 3-Tuples has that dissimilarity sum if and only if the ordering corresponds to a Hamiltonian path in G.

References and addenda

The reduction used in the proof comes from an extremely specific version of the Hamiltonian path problem. The only properties of this class of graphs (i.e. the planar, cubic, bipartite class) that are used in the proof are:

The graph must have chromatic index (edge coloring number) at most 3. By a result called König's line coloring theorem, the chromatic index of bipartite graphs equals the maximum vertex degree, so a cubic bipartite graph is sufficient.
The 3-coloring of edges must be computable in polynomial time. In general, finding a 3-coloring of the edges of an arbitrary 3-edge-colorable graph is NP-complete. (In fact, by trying to color the vertices of the line graph, we can show it's NP-hard to find a 4-edge-coloring of a 3-edge-colorable cubic graph with this result by Khanna et al.). To fix this, I used a result by Cole et al on Edge-Coloring Bipartite Multigraphs in O(E log D) Time. This gives a way to construct the 3-coloring on edges in polynomial (linear, actually) time on our bipartite graph.
The Hamiltonian path problem must be NP-hard for this class. There is a large patchwork of results on NP-completeness for Hamiltonian cycles or paths for graphs which are planar, and or cubic, and or k-connected, and or bipartite, etc. The first direct proof of the NP-completeness of Hamiltonian path on planar cubic bipartite graphs that I can cite is theorem 23 from Munaro's paper, On line graphs of subcubic triangle-free graphs. It's possible that an earlier reference showed this; the NP-completeness of the Hamiltonian cycle problem on that class was proven four decades earlier.

Solution 2:^[2]

Here's a small improvement, not sure about the complexity, seems to be about O(4ⁿ). Solves N=10 in less than a second. It's too slow for your large cases, but might be useful for testing other solutions, by more quickly computing the expected results for testing inputs.

The idea is that of the N triples, one of them has to be the center. So try each as center. Let half = N // 2. Then half of the other triples must be on the left of the center and N - 1 - half on the right of the center. Try all splits into left and right and solve them recursively independently.

My helper function takes not just the triples in data but also the context in which it belongs: the data must be enclosed between head triple and tail triple (both can be None), and the score must be computed accordingly.

import itertools

data = [
    (1, 0, 5),
    (2, 4, 2),
    (3, 2, 1),
    (4, 3, 4),
    (3, 3, 1),
    (1, 2, 2),
    (4, 0, 3),
    (0, 3, 5),
    (1, 5, 1),
    (1, 5, 2),
]

def dissimilar(t1, t2):
    if t1 and t2:
        a, b, c = t1
        x, y, z = t2
        return (a != x) + (b != y) + (c != z)
    return 0

def score(data):
    return sum(dissimilar(t1, t2) for t1, t2 in zip(data, data[1:]))

def solve(data):
    def solve(head, data, tail):
        if len(data) <= 3:
           perm = min(itertools.permutations(data),
                      key=lambda perm: score([head, *perm, tail]))
           return list(perm), score([head, *perm, tail])
        half = len(data) // 2
        result = result_score = None
        for center in list(data):
            data.remove(center)
            for left in itertools.combinations(data, half):
                left = set(left)
                right = data - left
                left, score_left = solve(head, left, center)
                right, score_right = solve(center, right, tail)
                if result_score is None or score_left + score_right < result_score:
                    result = [*left, center, *right]
                    result_score = score_left + score_right
            data.add(center)
        return result, result_score
    return solve(None, set(data), None)

result, result_score = solve(data)
print(result, result_score, score(result), sorted(result) == sorted(data))

Solution 3:^[3]

Perhaps also useful: A lower bound. Any ordering is a chain of the triples. A chain is a spanning tree. So the score of a minimum spanning tree is a lower bound.

Andrej Kesely ran their solution on 100 random inputs and got a total score of 160037. I ran minimum spanning tree on 100 random inputs. Did it three times, the total scores were 153866, 154040 and 153949. So Andrej's 160037 looks fairly close (4.0% above my average lower bound). And much closer than their 178096 from lexsort (15.7% above).

Code (Try it online!):

import random

def get_data(n=1000):
    f = lambda n: random.randint(0, n)
    return [(f(n // 30), f(n // 20), f(n // 10)) for _ in range(n)]

def dissimilar(t1, t2):
    a, b, c = t1
    x, y, z = t2
    return (a != x) + (b != y) + (c != z)

def mst_score(data):
    dist = dict.fromkeys(data, 3)
    dist[data[0]] = 0
    score = 0
    while dist:
        one = min(dist, key=dist.get)
        score += dist.pop(one)
        for other in dist:
            dist[other] = min(dist[other], dissimilar(one, other))
    return score

total = 0
for i in range(100):
    data = get_data()
    score = mst_score(data)
    total += score
    print(score, total)

Solution 4:^[4]

A drop-in replacement of Andrej Kesely's method that's about 30x faster in the same test. That method builds the output list by always appending the remaining triple most similar to the last one in the output. Most of the time, there is a remaining triple that matches that last one in at least one spot. So instead of checking all remaining triples for similarity, I first check only the remaining triples that match in at least one spot. Only when that fails do I check all remaining triples. I use indexes and break similarity ties with them, so I get the same order as Andrej's.

I call a pair (spot, number) a "mark". For example the triple (17, 3, 8) has the marks (0, 17), (1, 3) and (2, 8). That allows me to then look up triples matching those numbers at those spots.

from collections import defaultdict

def heuristic(data, sort_data=False):
    data = sorted(data) if sort_data else data[:]
    out = [data.pop(0)]
    indexes = defaultdict(set)
    for i, triple in enumerate(data):
        for mark in enumerate(triple):
            indexes[mark].add(i)
    remain = set(range(len(data)))
    while remain:
        a, b, c = out[-1]
        best = 4, None
        for mark in enumerate(out[-1]):
            for i in indexes[mark]:
                x, y, z = data[i]
                candidate = (a != x) + (b != y) + (c != z), i
                if candidate < best:
                    best = candidate
        i = best[1]
        if i is None:
            i = min(remain)
        remain.remove(i)
        t = data[i]
        for pattern in enumerate(t):
            indexes[pattern].remove(i)
        out.append(t)
    return out

And it gets about 35x faster if I replace the for mark in enumerate(out[-1]) loop with this, which doesn't compare the spot that I already know matches:

        for i in indexes[0, a]:
            _, y, z = data[i]
            candidate = (b != y) + (c != z), i
            if candidate < best:
                best = candidate
        for i in indexes[1, b]:
            x, _, z = data[i]
            candidate = (a != x) + (c != z), i
            if candidate < best:
                best = candidate
        for i in indexes[2, c]:
            x, y, _ = data[i]
            candidate = (a != x) + (b != y), i
            if candidate < best:
                best = candidate

Solution 5:^[5]

You can do it in O(2?·n²) if for each pair (subset, last element) you find the answer to put all the element from the subset so that the last element is fixed. You can do it recursively (iterating over previous element) taking care not to count the same state more than once (memoization)

Solution 6:^[6]

Your code for generating test data is this:

def f(n):
    return random.randint(0, n)

n = 10_000
data = [(f(n//30), f(n//20), f(n//10)) for _ in range(n)]

I had the idea to simply find the best order of the columns (or fields) in each item to sort on, and see how that went.

I theorised that the standard deviation of numbers in each column might have a beneficial effect on overall similarity i.e if there is not much of a spead of numbers in the third column then maybe sort on the third column values first to keep them together, and similar considerations on what column to sort on to break ties when successive third column values are the same, etc...

I found that random ordering of 10_000 items gave very low similarity. Ordering either by order of field/column with least standard deviation to most gave much better similarity. Ordering by any column order looks like it will be better than random.

The code

# -*- coding: utf-8 -*-
"""
Created on Sat Apr 16 10:33:41 2022

@author: paddy3118
"""

import random
from typing import List, Tuple
import pandas as pd


def f(n):
    return random.randint(0, n)

def stddev_mean(lst: List[int]) -> Tuple[float, float]:
    "Standard deviation of numbers in lst, mean."
    sm  = sum(lst)
    sm2 = sum(i**2 for i in lst)
    n = len(lst)
    return ((sm2 - sm**2 / n) / n)**0.5, sm / n

def similarity(d) -> int:
    "Compare successive 3-tuples for similarity - higher is better"
    return sum(sum((field0 == field1) for field0, field1 in zip(d0, d1))
               for d0, d1 in zip(d, d[1:]))

#%% Test data
data = [       # SIMILARITY in fields of seccessive records
    (0, 1, 2),
    (2, 4, 2), # 1
    (3, 2, 1), # 0
    (4, 3, 4), # 0
    (3, 3, 1), # 1
    (1, 2, 2), # 0
    (4, 0, 3), # 0
    (0, 3, 5), # 0
    (1, 5, 1), # 0
    (1, 5, 2), # 2
]              # Sum = 4
assert similarity(data) == 4

#%% Randomised data
print("NOTE: Similarity figures are better when higher.\n")

n = 10_000
data = [(f(n//30), f(n//20), f(n//10)) for _ in range(n)]

print(f"Random generation of {n} items has similarity of  = {similarity(data)}")

field_stddev = [(i, stddev_mean([datum[i] for datum in data])[0])
                   for i in range(len(data[0]))]
print(f"{field_stddev = }")


#%% Order by increasing stddeviation field
index_by_inc_dev = [ind for ind, _ in
                         sorted(field_stddev, key=lambda x: x[1])]

print(f"\nField indices by INcreasing stddev of tuple field values = {index_by_inc_dev}")
sort_by_field_of_inc_dev = sorted(data,
                                       key=lambda d:[d[i]
                                                     for i in index_by_inc_dev])
print(f"{similarity(sort_by_field_of_inc_dev) = :,}")
assert set(sort_by_field_of_inc_dev) == set(data), "Whoops, data lost in sort??"


#%% Order by increasing stddeviation field
print(f"\nField indices by DEcreasing stddev of tuple field values = {index_by_inc_dev[::-1]}")
sort_by_field_of_dec_dev = sorted(data,
                                       key=lambda d:[d[i]
                                                     for i in index_by_inc_dev[::-1]])
print(f"{similarity(sort_by_field_of_dec_dev) = :,}")
assert set(sort_by_field_of_dec_dev) == set(data), "Whoops, data lost in sort??"

#%%
print("\n Same data, random sorts, gives these similarity values:")
dcopy = data.copy()
for i in range(10):
    print(f"  Random similarity({i}) = {similarity(dcopy):,}")
    random.shuffle(dcopy)

Output

NOTE: Similarity figures are better when higher.

Random generation of 10000 items has similarity of  = 55
field_stddev = [(0, 96.04927579341764), (1, 145.8145251033312), (2, 288.84656085582884)]

Field indices by INcreasing stddev of tuple field values = [0, 1, 2]
similarity(sort_by_field_of_inc_dev) = 9,949

Field indices by DEcreasing stddev of tuple field values = [2, 1, 0]
similarity(sort_by_field_of_dec_dev) = 9,135

 Same data, random sorts, gives these similarity values:
  Random similarity(0) = 55
  Random similarity(1) = 61
  Random similarity(2) = 54
  Random similarity(3) = 60
  Random similarity(4) = 68
  Random similarity(5) = 55
  Random similarity(6) = 56
  Random similarity(7) = 58
  Random similarity(8) = 56
  Random similarity(9) = 63

Solution 7:^[7]

This refines my previous answer by and also shows that sorting by columns where the column order is by increasing std-deviations of the numbers in that column does give the better result.

An extra algorithm takes that best sort and swaps adjecant rows if that increases overall similarity untill no more adjacent swaps improve the result.

Code

# -*- coding: utf-8 -*-
"""
Created on Sun Apr 17 07:30:23 2022

@author: paddy
"""

import random
from itertools import permutations
from typing import List, Tuple

DType = List[Tuple[int, int, int]]


def randomised_data_gen(n: int) -> DType:
    def f(n):
        return random.randint(0, n)

    return [(f(n // 30), f(n // 20), f(n // 10)) for _ in range(n)]


def stddev_mean(lst: List[int]) -> Tuple[float, float]:
    "Standard deviation of numbers in lst, mean."
    sm = sum(lst)
    sm2 = sum(i**2 for i in lst)
    n = len(lst)

    return ((sm2 - sm**2 / n) / n)**0.5, sm / n


def similarity(d: DType) -> int:
    "Compare successive 3-tuples for similarity - higher is better"

    return sum(sum((col0 == col1) for col0, col1 in zip(d0, d1)) for d0, d1 in zip(d, d[1:]))


def sort_by_column(data: DType, column_index: Tuple[int]) -> DType:

    return sorted(data, key=lambda d: [d[i] for i in column_index])


def similarity_when_randomized(data: DType, n: int = 10) -> None:
    dcopy = data.copy()
    sims = ", ".join([f"{similarity(dcopy):_}", random.shuffle(dcopy)][0] for _ in range(n))
    print(f"  Same data random sorts gives similarity values: {sims}")


#%% Test data
def test1():
    data = [  # SIMILARITY in columns of sucessive records
        (0, 1, 2),
        (2, 4, 2),  # 1
        (3, 2, 1),  # 0
        (4, 3, 4),  # 0
        (3, 3, 1),  # 1
        (1, 2, 2),  # 0
        (4, 0, 3),  # 0
        (0, 3, 5),  # 0
        (1, 5, 1),  # 0
        (1, 5, 2),  # 2
    ]  # Sum = 4
    assert similarity(data) == 4


test1()
#%% Randomised data

print("NOTE: Similarity figures are better when higher.\n")

n = 10_000
data = randomised_data_gen(n)

print(f"Random generation of {n} items has similarity of  = {similarity(data)}")
similarity_when_randomized(data, 10)

col_stddev = [(i, stddev_mean([datum[i] for datum in data])[0]) for i in range(len(data[0]))]

print(f"\n(Column_index, Standard_deviation) of each column of numbers: {col_stddev}")

#%% Order by increasing stddeviation column

index_by_inc_dev = [ind for ind, _ in sorted(col_stddev, key=lambda x: x[1])]

#%% Sort by columns

def score_all_column_orders(data: DType) -> Tuple[List[Tuple[int, Tuple[int, int, int]]], DType]: 
    "Print and return similarity scores for sorting data by all column orders, best first AND the best sort of data"
    sim_by_column_sort = [(similarity(sort_by_column(data, column_order)), column_order)
                        for column_order in permutations(range(len(data[0])))]
    sim_by_column_sort.sort(reverse=True)  # Highest similarity first

    print("\nSimilarity when sorting by all column orders:")
    print("  SIMILARITY  SORT_COLUMN_ORDER")
    for sim, col in sim_by_column_sort:
        col = list(col)
        if col == index_by_inc_dev[::-1]:
            comment = "  (by DEcreasing std-dev)"
        elif col == index_by_inc_dev:
            comment = "  (by INcreasing std-dev)"
        else:
            comment = ""
        print(f"  {sim:>10_}  {col}{comment}")

    return sim_by_column_sort, sort_by_column(data, sim_by_column_sort[0][1])

_, best_by_col_sort = score_all_column_orders(data)
score = best_score = similarity(best_by_col_sort)

#%% Swap with next row

print("\nNeighbourly swaps:")
twizzled = best_by_col_sort.copy()
while True:
    swaps = 0
    for i in range(1, len(twizzled) - 2):
        #breakpoint()
        i_window = [ twizzled[x] for x in [i-1, i, i+1, i+2]] 
        i_swap   = [ twizzled[x] for x in [i-1, i+1, i, i+2]] 
        if similarity(i_swap) > similarity(i_window):
            twizzled[i], twizzled[i+1] = twizzled[i+1], twizzled[i]
            swaps += 1
    if swaps == 0:
        break
    print(f"  Neighbour {swaps = } New similarity = {similarity(twizzled)}")

Sample output

NOTE: Similarity figures are better when higher.

Random generation of 10000 items has similarity of  = 70
  Same data random sorts gives similarity values: 70, 57, 65, 57, 51, 61, 64, 56, 63, 50

(Column_index, Standard_deviation) of each column of numbers: [(0, 96.5443987139596), (1, 145.326873357098), (2, 288.69842362782305)]

Similarity when sorting by all column orders:
  SIMILARITY  SORT_COLUMN_ORDER
       9_978  [0, 1, 2]  (by INcreasing std-dev)
       9_829  [0, 2, 1]
       9_802  [1, 0, 2]
       9_638  [1, 2, 0]
       9_160  [2, 0, 1]
       9_142  [2, 1, 0]  (by DEcreasing std-dev)

Neighbourly swaps:
  Neighbour swaps = 12 New similarity = 9990

Solution 8:^[8]

Adopt a stochastic combinatorial optimization algorithm, such as tabu search, a genetic algorithm, or simulated annealing. I have ordered the alternatives in decreasing efficiency, according to my experience.

All involve starting with one or more random solutions and then improving the result in successive iterations through random perturbations and a method that decides which ones to keep. An objective function (in your case the total dissimilarity) guides you toward improvements. For the numbers you mention few thousand iterations down the road you will have a near optimal solution.

Solution 9:^[9]

A reasonably good approximate (simulated annealing style) solution:

Start with an arbitrary (or random) permutation of the items. Optional: choose N random permutations (say, 10 or 100) and start with the one that has the lowest total error, discarding the others.

Choose two distinct random indices and swap the items at those indices if it would decrease the total error (you could do this the straightforward way, by computing the total error of the "before" and "after" lists, but you could also do it just by examining items i-1, i, i+1 and j-1, j, j+1 with and without the swap; all of the other contributions to the error remain the same).

Repeat the previous step until some stopping condition you choose, which could be one or more of:

You reach some predefined limit on iterations or execution time,
The total error falls below some threshold,
The percentage of "successful swaps" over the past N iterations falls below a threshold,
You go N iterations without a successful swap.

Solution 10:^[10]

Maybe this can help you. It's an approach that you must validate, I haven't time to develop it, but I think it's feasible.

If the list of elements were ordered, its ordering would be trivial: linear. It would be enough to compare each element with the next one.

If only some elements were misplaced but close to their correct location, the sorting would be somewhat slower, but very fast as well. Most of the time, each element would be compared with the next one and some elements would need a few extra comparisons to be placed in position.

Based on the above, my approach is to easily obtain a fairly ordered list. And for this it would be enough (if I'm not mistaken) to make a sorting based on linear regression of the elements. I'm going to explain it in 2 dimensions (with elements with 2 values instead of 3) because it's simpler, but passing it to 3 dimensions, as you ask in your question, isn't complex.

You make an initial run of all your elements and calculate the regression line. Then, you calculate the rotation needed for the regression line to become Y=0 and apply that rotation to your points. This allows you to run through the points in order, from left to right. And that order is what you use as the first sort order.

Obviously, two contiguous points may not be the closest to each other but there will be few that are closer so the final sorting will always be localized and require few moves.

That is, we use the regression line as an approximation to the final location that the elements should have. It's clear that two elements that are far apart on the regression line (running from one end to the other) are also far apart and this allows us to arrange them in a way that is close to the optimum.

This forces us to make a couple of passes with some calculations that are not very complex but much more complex than the comparisons between each pair of elements. That is, with few comparisons, this approach is more expensive, but there will be a point at which this approach is better than making a very large number of comparisons between elements.

To move it to 3 dimensions (and use the 3 components you need) you have to use a 3D regression line and get the rotation matrix also in 3 dimensions. The rest is similar.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2
Solution 3
Solution 4
Solution 5	Toby Speight
Solution 6	Paddy3118
Solution 7	Paddy3118
Solution 8	Diomidis Spinellis
Solution 9	hobbs
Solution 10	Victor

'Algorithm for ordering data so that neighbor elements are as identical as possible

Some code

Final remarks

Edit (comments on accepted solution)

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Solution 5:[5]

Solution 6:[6]