'Java Long to int conversion skipping gaps between longs
I'm working on a school project where I'm building a geographical map from nodes. These nodes are loaded from an XML format from OpenStreetMap.org.
There are a lot of these nodes, and therefor I have to optimize memory whenever possible. One such optimization would be storing their id's as int instead of long.
The nodes have their ID values far beyond what Integers support, therefor I cannot directly create objects from them utilizing int or Integer. However, I am going to use around 40.000.000 nodes in total (which is less than the total amount of Integers possible).
My question is therefor if there is an intelligent way to apply a mathematical function to the Long values, that will turn them into ints. I need to be able to use this function once on all nodes when I save them in memory, and then later on when something refers to them by their original long value from the XML, I'll need to (again) find their new Integer value by the function. I've attempted the following:
int myInt = (int) myLong-Integer.MIN_VALUE... in order to bring the long value back into the int spectrum, however the longs are still too large to support integer format after this.
I've also attempted keeping a Map of "Long -> Integer", that I can lookup the long in, in order to get the int value. However this is nonsensical since I would then STILL be storing the Longs in memory (which is exactly what we are trying to prevent).
EDIT: As comments below asked for, the range of long values are indeed more than 4 billion apart. I don't have the exact Maxvalue and Minvalue available right now but the difference between them are about twice as large as the amount of available integers. (Therefor no simple subtraction would work)
Solution 1:[1]
When the longs are not tied to specific ranges, like 400_000L upwards for customers, 500_000L upwards for invoices, etcetera, then you need to store a mapping from long to int.
In memory you could use:
Map<Long, Integer> longToIntIdMap = new HashMap<>();
Map<Integer, Long> intToLongIdMap = new HashMap<>();
One could search an ideal hash function to have the least of collisions mapping to the same int id. Normally one could in the case of a collision use a second hash function and so on.
One could use Long.hashCode possibly ensuring non-negative numbers.
int getId(long id) {
int intId = longToIntIdMap.computeIfAbsent(id,
(k, v) -> {
int hc = Long.hashCode(id); // First hashing.
int intId = hc;
while (intToLongIdMap.containsKey(intId)) {
++intId; // Second hashing.
if (intId == hc) {
throw new IllegalStateException("All ints used");
}
}
intToLongIdMap.put(intId, id);
// Implicit longToIntIdMap.put(id, intId);
return intId;
});
return intId;
}
I personally would take the effort to count the collisions for a couple of hash functions. There are even some solutions to do this systematically. I have no links though.
The above is expensive, especially as Long and Integer are wrappers. IDs are unique in XML, so one could count them first and use a fixed size long array and as hashCode % array.length as initial index. Maybe wrapped in a class IntLongHashMap. Also the reverse map can be created afterwards.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |

