'Condense large list of overlapping networks in CIDR format to greatest common denominator in Python

I have a text file that stores a large list of networks (250k+) that we use internally in the following format:

10.4.5.0/30
10.4.5.0/24
10.4.7.0/24
10.4.0.0/16
10.3.5.0/24
10.3.0.0/16
172.15.51.0/24
172.0.0.0/8

I'd like to try to reduce the list to make the processing more efficient. Using Python, how can I efficiently condense the list into the largest subnet that contains all the IPs? Are there any libraries that make this easier?

For example, the list above could be reduced to:

10.4.0.0/16
10.3.0.0/16
172.0.0.0/8

As an extension, is this possible in IPv6?



Solution 1:[1]

You may want to try this github-IPy Use IPSet to merge your IPs. It works well with ipv6. However I'm not quite sure about the performance. Maybe golang is better for such job.

Solution 2:[2]

A slightly modified routing trie would definitely do it, but I think that may be overkill when the following algorithm should work:

  • turn all subnets into two integers: one for ip value IP and other for subnet length Subnet
  • for each pair, compare it to your list of root pairs
for s in subnets:
  markForRoots = False
  for r in roots:
    if ((s.IP ^ r.IP) >> s.Subnet) == 0: #if they share the same prefix
      if s.Subnet > r.Subnet: #if the current subnet is bigger
        # remove r from root
        markForRoots = True
  if markForRoots:
    # add s to roots

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Starry-OvO
Solution 2 Liam Kelly