'How do I efficiently represent and compare my data-type (hexagons)?

Premise

I am working on a private project in python using numpy. My program works with hexagons, which edges have certain features. I currently represent these hexagons by numpy arrays with one entry for each edge. There is a need to compare hexagons for equality, which is special because some differentiable features are considered equal. Also hexagons should be equal if they are the same just in another rotation (numpy array is barrel-rolled by some amount). To solve the first problem, edges are represented by bytes, each bit marking an "equal" feature. Now two edges can simply be compared by bitwise_and and any value non zero prooves equality. To solve the second problem equality is checked by comparing all six rotations of a hex aginst the other hex. All arrays I describe here are of dtype=uint8, so 255 is the neutral element for bitwise_and.

Problem

Each arrangement of a hexagon has a value associated with it. What I mean by arrangement is how features of the hexagon are arranged, for example two adjacent edges have one feature, the other four an other. The specific features dont matter for this value. So the hex [4, 4, 2, 8, 8, 8] has the same value associated with it as hex [2, 4, 4, 1, 2, 2] (notice rotation by one and substitution of values).

Additionally edges can have "don't care" as a feature (represented by all ones in binary to make the equality check work as intended). In that case I need to find the associated values of all compatible hexagons, no matter the feature of that particular edge.

What I need is a clever representation of these "meta"-hexagons, which just care about arrangement not type of features. I then need a way for a given hexagon to find all meta-hexagons that could describe its arrangement.

Ideas

One idea is to represent the meta-hexes like every other hexagon, using numbers to differentiate each unique feature. A meta-hex like [1, 1, 2, 1, 2, 2] would need to match any hex [x, x, y, x, y, y].

How to find all meta-hexes for a given hexagon with "dont care"-edges?

A: One possibility would be to create all possible meta-hexagons for a given hexagon. The problem with this approach is that the number of possibilities can be quite large. For example, a common hexagon in my application is one with five adjacent "don't care"-edges (only one important feature). The number of features is only about 5 (it's actually more, but some features are mutually exclusive so 5 independent features is a good approximation) but even then 5^5=3125 (minus a couple because of equality under rotation) seems quite a lot. The advantage of this approach would be that I don't need any equality checks against the meta-hexes and could use a dictionary to access the values (for example using the numpy bytes as key).

B: Another possibility would be to save the meta-hexes in a numpy array, which allows fast comparisons against all meta-hexes at once. In that case one could leave "don't care"-edges as they were (all bits one) and would only need to transform the given features into meta-hex representation. So [2, 8, 8, 255, 255, 255] would become something like [1, 2, 2, 255, 255, 255] and comparison could be done with bitwise_and and a nonzero check again to make the "dont care" edges match anything. The problem in this case is that the hexagon wouldn't match meta-hexes like [2, 3, 3, 1, 1, 1] where the features are simply numbered differently, even though it should. So all possible numbering schemes and rotations would have to be created. Even with rules such that numbers are increased by one from one feature to the next, it would be a couple dozen possible representations.

Questions

  1. In general, is there a way to represent polygons so that two polygons with different rotations but otherwise equal, can be identified as such without having to compare all possible rotations?

  2. Which numpy functions should I look into to implement my idea in A. Replacing all "dont cares" with all possible features sounds like permutations to me?

  3. For my approach in B, is there a way to further reduce the amount of hexagons I have to create?

Any help is appreciated, I've thought about this for three days now, going back and forth in my head. So even remotely related links and reading material or just keywords to lookup and/or add to the tags of this question are happily received. Also I'm new here, so any tips regarding stack overflow are welcome!

If you've read this far, thank you for your time!



Solution 1:[1]

It seems like you are chasing performance here... I'll answer some of the questions keeping that in mind but be warned: premature optimization is always a bad idea. If it is optimization you are looking for it is almost always best to write any solution and then take it from there.

In general you can win in terms of time complexity if you are willing to deal with a bit of extra memory.

  1. For each hexagon create a copy of it but permuted such that the smallest "edge" is first. This has a little bit of startup cost and a bit of extra memory but then the comparison is easy as you only need to compare one array. (if the smallest element is repeated you can come up with some heuristic to create a unique order)

2.3. I would create "mask" arrays that each hexagon has. This mask array is then used by your comparison function to decide what comparison rules to check for. If you just want code neatness then this just means creating a hexagon object which contains all these extra arrays and then overloading the __eq__property.

If you want all this to be fast then you probably have no choice but to implement this "comparison" operation in C and then having your python code call it.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 DharmanBot