'How to get the frequency of each time stamp during read file and reduce the memory
I stuck in memory limit exceed because I use a naive method to finish the task which is creating a array with size 2^25. By using the index of the array to represent the timestamp, I can locate the timestamp and add its occurrence in O(1) time, but it waste lots of memory. So I want to know if there are any method that can reduce the memory. The time stamp will group by hour by time_stamp - (time_stamp % 3600), it will minus start_time to avoid index overflow.
How to reduce the memory used and read the timestamp&occurrence quickly? *the input timestamps can in any order
#define max_entry 2 << 25
#define start_time 1645491600
struct pair
{
long timestamp;
int occurrence;
};
struct pair *counter_array = (struct pair *)calloc(max_entry, sizeof(struct pair)); // declear pair array
//readfile
char *temp;
while (fgets(buffer, buffer_size, common_file) != NULL)
{
char *temp;
long time_stamp = strtol(buffer, &temp, 10);
counter_array[time_stamp - (time_stamp % 3600) - start_time].timestamp = time_stamp - (time_stamp % 3600);
counter_array[time_stamp - (time_stamp % 3600) - start_time].occurrence += 1;
}
I know using max_entry is not good, but I really can't figure out how to modify it.
Solution 1:[1]
- Since you're rounding
time_stamps tohours, just use
time_stamp_index = (time_stamp - start_time) / 3600;
- Modify your
structto store frequency & hours :
struct hour_freq {
int hours;
int freq;
};
- If
time_stampcan go up toINT_MAX(231 - 1) i.e.2147483647. Thenmax_indexwill be
max_index = (2147483647 - 1645491600) / 3600 = 139,442
well within memory limits of 50 MiB. With 45 MiB you can index time_stamps upto 22,879,155,600.
(( 22879155600 ? 1645491600 ) ÷ 3600 ) × 8 = 47,185,920 octets = 45 MiB
- To find frequency of a
time_stampathourgranularity, calculatetime_stamp_indexand lookup.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
