Original address: https://www.ikaze.cn/article/65
The new requirements need to show different languages through ip. Because there are many ip addresses, the dictionary is not applicable. Here are some methods.
1. Through ip location database
Famous service providers include: ipip (paid) maxmind (paid), pure (free).
However, in this application scenario, we do not need specific location information. Similar schemes will waste unnecessary memory, so we give up.
2. Utilize ip continuity
The latter two methods have a premise: most of the ip address lists are continuous.
Here we have a list of domestic ip addresses (there are open source libraries, which are easy to find. In addition, the library I use has merged ip addresses into CIDR format).
We first convert the IP into a number that can be directly compared through binary, and then change the continuous IP into such a set (start_ip, end_ip), so that we can quickly find it by dichotomy.
import ipcalc class ChinaIp: def __init__(self): self.data = [] def load(self, cidr_file='data/china_ip_list.txt'): with open(cidr_file, 'r')as f: for s in f.readlines(): self.add(s.strip()) def add(self, cidr): n = ipcalc.Network(cidr) self.data.append((n.host_first().ip, n.host_last().ip)) def search(self, ip): l = 0 r = len(self.data) - 1 while l <= r: mid = (l + r) // 2 if self.data[mid][0] <= ip <= self.data[mid][1]: return True elif self.data[mid][0] > ip: r = mid - 1 elif self.data[mid][1] < ip: l = mid + 1 else: return False return False def __contains__(self, item): ip = ipcalc.IP(item).ip return self.search(ip) china_ip = ChinaIp() china_ip.load() print('223.70.163.83' in china_ip)
3. Use the characteristics of CIDR
CIDR ^ is an address in the form of ^ x.x.x/n ^ which represents a group of ip addresses with the same network address, where n represents the first n bits as the network address.
According to the characteristics of CIDR, we can draw the conclusion that the network address of the ip under the same CIDR is the same.
Therefore, we can take out the network addresses of all domestic cidr addresses and put them in the dictionary; For an ip, try the possible network address (i.e. n) to see if it is in the dictionary.
import ipcalc class ChinaIp(object): def __init__(self): self.data = {} def load(self, cidr_files='data/china_ip_list.txt'): with open(cidr_files, 'r')as f: cidr_list = f.readlines() for cidr in cidr_list: self.insert(cidr.strip()) def insert(self, cidr): network = ipcalc.Network(cidr) self.data[str(network.netmask())]=True def search(self, netmask: str) -> bool: for i in netmask.split('.'): node = node.children.get(i, None) if node is None: return False if node.is_end: return True return False def __contains__(self, ip): for i in range(1,33): if self.search(str(ipcalc.Network(f'{ip}/{i}').netmask())): return True return False china_ip = ChinaIp() china_ip.load() print('223.70.163.83' in china_ip)
This algorithm seems to have no problem, but in the actual test, the speed is much slower than the second one. The time-consuming place must cycle all n in the comparison, and the dichotomy can quickly eliminate the impossible part.
In this case, there are two optimization methods:
1. List of random n
class ChinaIp(object): ... def __contains__(self, ip): l = list(range(1, 33)) random.shuffle(l) for i in l: if self.search(str(ipcalc.Network(f'{ip}/{i}').netmask())): return True return False
This method reduces the test time by more than half.
2. Exclude n that will not appear
class ChinaIp(object): def __init__(self): ... self.mask_list = [] ... def insert(self, cidr): network = ipcalc.Network(cidr) self.data[str(network.netmask())] = True self.mask_set.add(network.mask) def __contains__(self, ip): for i in self.mask_set: netmask = str(ipcalc.Network(f'{ip}/{i}').netmask()) if netmask in self.data: return True return False
In this way, the optimized speed is the same as that of the second one, but in practical application, it is also necessary to judge which one to use according to the ip list.