'Find the interval that contains the largest amount of intervals and enter their number
Given lots of intervals [ai, bi], find the interval that contains the most number of intervals. I easily know how to do this in O(n^2) but I must do it in O(nlogn). I think abount making two arrays, one with sorted starts and second with sorted ends, but I really don't know what to do next. I can only use structures like an array or a tuple.
Solution 1:[1]
Interval containment is a combination of two conditions: a includes b if a.start <= b.start
and a.end >= b.end
.
If you sort intervals by (end, size)
and process them in that order, then the end condition is easy -- current.end
>= x.end
if x is an interval you've already processed.
To find out how many intervals the current interval contains, you just need to know how many previously processed intervals have start points >= current.start. Every time you process an interval, put its start point into an order statistic tree or similar. Then you can find the number of contained start points in O(log N) time, which makes your algorithm O(N log N) all together. That's the same cost as the initial sort.
Note that I've hand-waved over handling of duplicate intervals above. After sorting the interval list, you should do a pass to remove duplicates and add counts for intervals that contain copies of themselves, if that counts as containment in your problem.
Solution 2:[2]
You can do it simply in O(n log n) using simply two sorts!
First, sort every interval according to its starting time. Then, sort them in reverse order of their end time.
You know a specific interval cannot contain any of the intervals placed before them neither in the first or the second array, because intervals located before them in the first array start before him, and end after him in the second array.
Now, you know that the sum of the positions of an interval in the two arrays gives a correct upper bound for the number of intervals it does not contain. It is an upper bound because an interval starting before and ending after another interval will count twice. What's more, this estimation will be exact for the interval you are searching for, because if another interval starts before and ends after it, then it will contain even more intervals. That's why the interval containing the most number of other intervals will just be the interval minimizing the sum of those two numbers.
## Warning : not tested !
intervals = [(1, 6), (5, 6), (2, 5), (8, 9), (1, 6)]
N = len(intervals)
# link intervals to their position in the array
for i, (start, end) in enumerate(intervals):
intervals[i] = (start, end, i)
# this will contain the position in ascending order of start time of every interval
positionStart = [None] * N
sortedStart = sorted(intervals, key = lambda inter: (inter[0], -inter[1], inter[2]))
for i, (start, end, pos) in sortedStart:
positionStart[pos] = i
# this will contain the position in descending order of end time of every interval
positionEnd = [None] * N
sortedEnd = sorted(intervals, key = lambda inter: (-inter[1], inter[0], inter[2]))
for i, (start, end, pos) in sortedEnd:
positionEnd[pos] = i
best = None # interval containing the most number of intervals
score = 0 # number of intervals it contains
for i in range(N):
# Upper bound of the number of interval it does not contain
result = positionStart[i] + positionEnd[i]
if N - result - 1 >= score:
best = (intervals[i][0], intervals[i][1])
score = N - result - 1
print(f"{best} is the interval containing the most number of intervals, "
+ "containing a total of {score} intervals")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Matt Timmermans |
Solution 2 | PoustouFlan |