# Determine whether the IP is in a large ip set (identify domestic IP)

## Original address: https://www.ikaze.cn/article/65

The new requirements need to show different languages through ip. Because there are many ip addresses, the dictionary is not applicable. Here are some methods.

## 1. Through ip location database

Famous service providers include: ipip (paid) maxmind (paid), pure (free).

However, in this application scenario, we do not need specific location information. Similar schemes will waste unnecessary memory, so we give up.

## 2. Utilize ip continuity

The latter two methods have a premise: most of the ip address lists are continuous.

Here we have a list of domestic ip addresses (there are open source libraries, which are easy to find. In addition, the library I use has merged ip addresses into CIDR format).

We first convert the IP into a number that can be directly compared through binary, and then change the continuous IP into such a set (start_ip, end_ip), so that we can quickly find it by dichotomy.

```import ipcalc

class ChinaIp:
def __init__(self):
self.data = []

with open(cidr_file, 'r')as f:
for s in f.readlines():

n = ipcalc.Network(cidr)
self.data.append((n.host_first().ip, n.host_last().ip))

def search(self, ip):

l = 0
r = len(self.data) - 1
while l <= r:
mid = (l + r) // 2
if self.data[mid][0] <= ip <= self.data[mid][1]:
return True
elif self.data[mid][0] > ip:
r = mid - 1
elif self.data[mid][1] < ip:
l = mid + 1
else:
return False
return False

def __contains__(self, item):
ip = ipcalc.IP(item).ip
return self.search(ip)

china_ip = ChinaIp()
print('223.70.163.83' in china_ip)```

## 3. Use the characteristics of CIDR

CIDR ＾ is an address in the form of ＾ x.x.x/n ＾ which represents a group of ip addresses with the same network address, where n represents the first n bits as the network address.

According to the characteristics of CIDR, we can draw the conclusion that the network address of the ip under the same CIDR is the same.

Therefore, we can take out the network addresses of all domestic cidr addresses and put them in the dictionary; For an ip, try the possible network address (i.e. n) to see if it is in the dictionary.

```import ipcalc

class ChinaIp(object):
def __init__(self):
self.data = {}

with open(cidr_files, 'r')as f:

for cidr in cidr_list:
self.insert(cidr.strip())

def insert(self, cidr):
network = ipcalc.Network(cidr)

def search(self, netmask: str) -> bool:

for i in netmask.split('.'):
node = node.children.get(i, None)
if node is None:
return False
if node.is_end:
return True
return False

def __contains__(self, ip):
for i in range(1,33):
return True
return False

china_ip = ChinaIp()
print('223.70.163.83' in china_ip)```

This algorithm seems to have no problem, but in the actual test, the speed is much slower than the second one. The time-consuming place must cycle all n in the comparison, and the dichotomy can quickly eliminate the impossible part.

In this case, there are two optimization methods:

### 1. List of random n

```class ChinaIp(object):

...

def __contains__(self, ip):
l = list(range(1, 33))
random.shuffle(l)
for i in l:
return True
return False```

This method reduces the test time by more than half.

### 2. Exclude n that will not appear

```class ChinaIp(object):
def __init__(self):
...
...

def insert(self, cidr):
network = ipcalc.Network(cidr)

def __contains__(self, ip):

for i in self.mask_set: