Python Network Programming

Network programming

Architecture of Software Development

  • Application class: QQ Weixinpin FTP mesh disk and so on, which belongs to the application class that needs to be installed.
  • WEB class: such as Baidu, Zhizhi, Weibo and other applications that can be directly used by browser access

C/S Architecture

C/S (Client and Server), Chinese meaning: Client and Server architecture, which is also divided from the user level (physical level).

The client here generally refers to the client application exe. The program needs to be installed before it can run on the user's computer. It relies heavily on the user's computer operating system environment.

B/S Architecture

B/S: Browser and Server, Chinese: Browser and Server architecture, which is divided from the user level.

Browser browser, in fact, is also a client client client, but this client does not need to install any applications, only the user in the browser through HTTP request server-side related resources (web resources), the client Browser can mirror the addition, deletion and modification.

Network Foundation

Network Foundation

1. How can a program find another program on the network

First, the program must be started. Second, it must have the address of the machine. In the Internet, the address of a computer is represented by a series of numbers, such as 78.5.6.29.

What is an IP address? IP address refers to Internet Protocol Address, which is the abbreviation of IP Address. IP address is a unified address format provided by IP protocol. It assigns a logical address to every network and host on the Internet to shield the difference of physical address. IP address is a 32-bit binary number, usually divided into four "8-bit binary numbers" (that is, four bytes). IP addresses are usually expressed in the form of point decimal system (a.b.c.d), where a, b, C and D are decimal integers between 0 and 255. Example: The dotted decimal IP address (100.4.5.6) is actually a 32-bit binary number (01100100.00000100.00000101.00000110). What is a port? Port is the free translation of English port, which can be regarded as the export of communication between equipment and the outside world. Viewing Port Occupancy in windows netstat -aon|findstr "49157"

So the ip address is accurate to a specific computer, and the port is accurate to a specific program.

2. Understanding Socket

Socket is the middle software abstraction layer of communication between application layer and TCP/IP protocol group. It is a group of interfaces. In the design mode, Socket is actually a facade mode. It hides the complex TCP/IP protocol behind the Socket interface. For users, a simple set of interfaces is all, allowing Socket to organize data to conform to the specified protocol.

Look at socket from your point of view In fact, from your point of view, socket is a module. We establish the connection and communication between the two processes by calling the methods already implemented in the module. Others refer to socket as ip+port, because ip is used to identify the location of a host in the Internet, and port is used to identify an application on that machine. So as long as we establish the ip and port, we can find an application and use the socket module to communicate with it.

3. The History of Socket

Sockets originated from the 1970s version of Unix at the University of California, Berkeley, known as BSD Unix. Therefore, sockets are sometimes referred to as "Berkeley sockets" or "BSD sockets". Initially, sockets were designed to communicate between multiple applications on the same host. This is also called interprocess communication, or IPC. There are two kinds of sockets (or two races), file-based sockets and network-based sockets.

Socket Family Based on File Type

The name of the socket family: AF_UNIX

unix is all files. File-based sockets call the underlying file system to fetch data. Two socket processes run on the same machine and can communicate indirectly by accessing the same file system.

Socket Family Based on Network Type

The name of the socket family: AF_INET

(There are also AF_INET6 used for ipv6, and other address families, but they are either only used for a platform, or have been abandoned, or rarely used, or not implemented at all. Of all address families, AF_INET is the most widely used one, python supports many address families, But since we only care about network programming, we only use AF_INET most of the time.

4.TCP Protocol and UDP Protocol

  • TCP: (Transmisson Control Protocol) Reliable, Connection-Oriented Protocol (Call), Low Transfer Efficiency Full Duplex Communication (Send Cache & Accept Cache), Byte Stream Oriented. Application of TCP: WEB Browser; E-mail; File Transfer Program
  • UDP: (User Data Protocol) unreliable, connectionless services, high transmission efficiency (small delay before sending), one-to-one, one-to-many, many-to-one, message-oriented, best service, no congestion control. Applications using UDP: Domain Name System; Video Stream; Vo IP.

Initial use of sockets:

Socket Based on TCP Protocol

tcp is based on links, it must start the server first, and then start the client to link the server.

Server end

import socket

sk = socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM)   # Buy a cell phone
# family = socket.AF_INET is currently network-based
# type = socket.SOCK_STREAM defaults to tcp protocol
sk.bind(('127.0.0.1',9000))    # Install a telephone card
sk.listen()                     # Boot up
while True:
    conn,addr = sk.accept() # Wait for the phone, wait for the client to link me
        # conn is a connection between server and client
    while  True:
        msg_send = input('>>>')
        conn.send(msg_send.encode('utf-8')) # Send information to client
        if msg_send.upper() == 'Q': break # Judging q exit as input
        msg = conn.recv(1024).decode('utf-8')  #Receiving client information
        if msg.upper() == 'Q': break  
        print(msg)
    conn.close()         # Hang up
sk.close()           # Turn off the cell phone
# Whether in server or client, just enter q and disconnect on both sides - hang up the phone

Client side

import socket

sk = socket.socket()  # Instantiate a socket object

sk.connect(('127.0.0.1',9000))
while True:
    msg = sk.recv(1024).decode('utf-8')    # Bytes block until data is sent
    if msg.upper() == 'Q': break
    print(msg) # Byte-to-byte string decode
    msg_send = input('>>>')    # input writes a string
    sk.send(msg_send.encode('utf-8')) # Send bytes, string to byte encode
    if msg_send.upper() == 'Q': break
sk.close()

Socket Based on UDP Protocol

udp is linkless, and can receive messages directly after starting the service without having to establish links in advance.

Server end

import socket

sk = socket.socket(type=socket.SOCK_DGRAM)   #Create a socket for a server
sk.bind(('127.0.0.1',8001))   #Binding server socket
while True:
    msg,addr = sk.recvfrom(1024)
    print(msg.decode('utf-8'))
    send_msg = input('>>>')
    sk.sendto(send_msg.encode('utf-8'),addr)  # Dialogue (Receiving and Sending)
sk.close()   # Close the server socket

Client side

import socket

sk = socket.socket(type=socket.SOCK_DGRAM)
while True:
    send_msg = input('>>>')
    sk.sendto(send_msg.encode('utf-8'),('127.0.0.1',8001))
    msg,addr = sk.recvfrom(1024)
    print(msg.decode('utf-8'))
sk.close()

Example

QQ Chat
  • Server end
import socket
ip_port = ('127.0.0.1',9001)
udp_server_sock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
udp_server_sock.bind(ip_port)

while True:
    qq_msg,addr = udp_server_sock.recvfrom(1024)
    print('Come from[%s:%s]A message:\033[1;44m%s\033[0m'%(addr[0],addr[1],qq_msg.decode('utf8')))
    back_msg = input('Reply to the message:').strip()

    udp_server_sock.sendto(back_msg.encode('utf8'),addr)
  • Client side
import socket

BUFSIZE = 1024
udp_client_socket = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)

qq_name_dic = {
    'Dad King':('127.0.0.1',9001),
    'Alex':('127.0.0.1',9001)
}

while True:
    qq_name = input('Please choose the chat object:').strip()
    while True:
        msg = input('Please enter a message,Return Delivery,input q End the chat with him: ').strip()
        if msg == 'q':break
        if not msg or not qq_name or not qq_name in qq_name_dic:continue
        udp_client_socket.sendto(msg.encode('utf8'),qq_name_dic[qq_name])

        back_msg,addr = udp_client_socket.recvfrom(BUFSIZE)
        print('Come from[%s:%s]A message:\033[1;44m%s\033[0m'%(addr[0],addr[1],back_msg.decode('utf8')))

    udp_client_socket.close()
time server
  • Server end
from socket import *
from time import strftime

ip_port = ('127.0.0.1',9000)
BUFSIZE = 1024

tcp_server = socket(AF_INET,SOCK_DGRAM)
tcp_server.setsockopt(SOL_SOCKET,SO_REUSEADDR,1)
tcp_server.bind(ip_port)


while True:
    msg,addr = tcp_server.recvfrom(BUFSIZE)
    print('===>',msg)

if not msg:
    time_fmt = '%Y-%m-%d %x'
else:
    time_fmt = msg.decode('utf8')
    back_msg = strftime(time_fmt)
    tcp_server.sendto(back_msg.encode('utf8'),addr)
    
tcp_server.close()
  • Client side
    from socket import *
    
    ip_port = ('127.0.0.1',9000)
    BUFSIZE = 1024
    
    tcp_client = socket(AF_INET,SOCK_DGRAM)
    
    while True:
        msg = input('Please enter the time format(example%Y %m %d)>>: ').strip()
        tcp_client.sendto(msg.encode('utf8'),ip_port)
    
        data = tcp_client.recv(BUFSIZE)
        print(data.decode('utf8'))
Feixin Communications
from socket import *

updsocket = socket(type = SOCK_DGRAM)
addr = ("192.168.0.168",2425)
msg = input('>>>')
updsocket.sendto(("1:111:eva:eva:32:%s"%msg).encode('gbk'),addr)

When it runs in autumn, it will listen on port 2425, so we need to establish UDP connection locally first.
1:111:eva:eva:32:The content to be sent
1 for version number, 111 for package number, eva for user name, second eva for host name, 32 for sending message, and later for sending message content.

Detailed explanation of socket parameters

socket.socket(family=AF_INET,type=SOCK_STREAM,proto=0,fileno=None)

Description of parameters for creating socket objects:

parameter Explain
family The address series should be AF_INET (default), AF_INET6,AF_UNIX,AF_CAN or AF_RDS. (The AF_UNIX domain actually uses local socket files to communicate)
type Socket type should be one of SOCK_STREAM (default), SOCK_DGRAM,SOCK_RAW or other SOCK_constants. <SOCK_STREAM is TCP-based, secure (that is, to ensure the correct transmission of data to the other party) connection-oriented SOCKET, mostly used for data transmission. SOCK_DGRAM is a UDP-based, unsecured message-oriented socket, mostly used for broadcasting information on the network.
proto The protocol number is usually zero and can be omitted, or in the case of address family AF_CAN, the protocol should be one of CAN_RAW or CAN_BCM.
fileno If fileno is specified, other parameters are ignored, resulting in socket returns with the specified file descriptor.
Unlike socket.fromfd(), fileno will return the same socket instead of repeating it.
This may help to close a separate socket using socket.close().

Sticky bag

Now let's take a look at the phenomenon.

# server side
import socket

sk = socket.socket()
sk.bind(('127.0.0.1', 9000))
sk.listen()
conn,addr = sk.accept()
for i in range(3):
    conn.send(b'sbzz')
conn.close()
sk.close()

# Client
import socket

sk = socket.socket()
sk.connect(('127.0.0.1', 9000))
for i in range(3):
    print(sk.recv(1024))

sk.close()

The results of execution of the above code

The normal acceptance should be the third sbzz, but by the second time, the second and third sbzz stick together, this phenomenon is called sticky package.

Note: Only TCP sticks, UDP never sticks.

Streaming transport: TCP protocol, like pipelining, without boundaries

Causes of sticking

Data Transfer in TCP Protocol

Unpacking mechanism of tcp protocol

When the length of the sender buffer is longer than the MTU of the network card, tcp will split the data sent into several packets and send them out.
MTU is the abbreviation of Maximum Transmission Unit. This means the largest data packet transmitted over the network. The unit of MTU is bytes. Most network devices have MTUs of 1500. If the MTU of the local machine is larger than that of the gateway, the large data packets will be disassembled and transmitted, which will generate a lot of data packet fragments, increase the packet loss rate and reduce the network speed.

Flow-Oriented Communication Characteristics and Nagle Algorithms

TCP (transport control protocol) is connection-oriented, flow-oriented and provides high reliability services.
There must be a pair of socket s at both ends (client and server). Therefore, in order to send multiple packets to each other more effectively, the sender uses an optimization method (Nagle algorithm), which combines data with smaller intervals and smaller amounts of data into a large data block, and then packages it. .
In this way, it is difficult to distinguish the receiver, and a scientific unpacking mechanism must be provided. That is to say, flow-oriented communication has no message protection boundary.
For empty messages: tcp is based on data stream, so the messages sent and received can not be empty. This requires adding empty message processing mechanism to both client and server to prevent the program from getting stuck. udp is based on datagram, even if you input empty content (direct return), it can also be sent. udp protocol will help you. Encapsulate the message header and send it to you.
Reliable sticky tcp protocol: tcp protocol data will not be lost, did not receive the package, the next time received, will continue to receive the last time, his end always receives ack before clearing the buffer content. The data is reliable, but sticky.

The cause of sticky phenomenon based on the characteristics of tcp protocol

The sender can send data one K, while the receiver's application can pick up data two K and two K. Of course, it is also possible to pick up 3K or 6K data at a time, or only a few bytes of data at a time.
That is to say, the data that an application sees is a whole, or a stream. How many bytes of a message are invisible to the application, so TCP is a flow-oriented protocol, which is also the reason why sticky packets are easy to occur.
UDP is a message-oriented protocol. Every UDP segment is a message. The application must extract data as a unit of information. It can not extract any byte of data at a time. This is very different from TCP.
How do you define messages? It can be considered that the data of the other party's one-time write/send is a message. It needs to be understood that when the other party sends a message, no matter how fragmented the underlying layer is, the TCP protocol layer will sort the data segments that make up the whole message before presenting them in the kernel buffer.

Explanation of User State and Kernel State in socket Data Transmission

For example, a socket client based on tcp uploads a file to the server. When the file is sent, the content of the file is sent according to a sequence of byte streams. When the receiver sees it, he has no idea where the byte stream of the file starts and ends.

In addition, the sticky packets caused by the sender are caused by the TCP protocol itself. In order to improve the transmission efficiency, the sender often needs to collect enough data before sending a TCP segment. If send data is scarce for several consecutive times, TCP is usually optimized algorithm These data are synthesized into a TCP segment and sent out once, so that the receiver receives the sticky packet data.

UDP does not stick

UDP (user datagram protocol) is connectionless, message-oriented and provides efficient services.
Since UDP supports one-to-many mode, skbuff (socket buffer) at the receiving end uses a chain structure to record every arriving UDP packet. In each UDP packet, there is a header (message source address, port, etc.). So, for the receiving end, skbuff (socket buffer) at the receiving end uses a chain structure to record every arriving UDP packet. It's easy to distinguish between them. That is, message-oriented communication has message protection boundaries.
For empty messages: tcp is based on data stream, so the messages sent and received can not be empty. This requires adding empty message processing mechanism to both client and server to prevent the program from getting stuck. udp is based on datagram, even if you input empty content (direct return), it can also be sent. udp protocol will help you. Encapsulate the message header and send it to you.
Unreliable and non-sticky UDP protocol: udp's recvfrom is blocked, and a recvfrom(x) must complete the only sendinto(y) after receiving x bytes of data, if y;x data is lost, which means UDP will not stick the package at all, but it will lose data, unreliable.

Supplementary Notes:

When sending data with UDP protocol, the maximum length of sending data with sendto function is 65535- IP header (20) - UDP header (8) = 65507 bytes. When sending data with sendto function, if the length of sending data is longer than that value, the function returns an error. (Discard this package and do not send it)

When sending with TCP protocol, because TCP is a data flow protocol, there is no restriction on the size of the packet (without considering the size of the buffer), which means that when using send function, the data length parameter is not limited. In fact, the specified data is not necessarily sent out at one time. If the data is long, it will be sent in segments. If it is short, it may wait for the next data to be sent together.

There are two situations in which stickiness occurs

The sender needs to wait until the buffer is full to send out, resulting in sticky packets (the time interval between sending data is very short, the data is very small, will join together to produce sticky packets).

Case 1 Sender's Caching Mechanism

Server end
#_*_coding:utf-8_*_
from socket import *
ip_port=('127.0.0.1',8080)

tcp_socket_server=socket(AF_INET,SOCK_STREAM)
tcp_socket_server.bind(ip_port)
tcp_socket_server.listen(5)


conn,addr=tcp_socket_server.accept()


data1=conn.recv(10)
data2=conn.recv(10)

print('----->',data1.decode('utf-8'))
print('----->',data2.decode('utf-8'))

conn.close()

Client side
#_*_coding:utf-8_*_
import socket
BUFSIZE=1024
ip_port=('127.0.0.1',8080)

s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
res=s.connect_ex(ip_port)


s.send('hello'.encode('utf-8'))
s.send('egg'.encode('utf-8'))

Case 2 Receiver's Caching Mechanism

Receiver fails to receive buffer packets in time, resulting in multiple packet receipts (the client sends a piece of data, the server only receives a small part of the data, the next time the server receives it, it still takes the last legacy data from the buffer and produces sticky packets).

Server end
#_*_coding:utf-8_*_
from socket import *
ip_port=('127.0.0.1',8080)

tcp_socket_server=socket(AF_INET,SOCK_STREAM)
tcp_socket_server.bind(ip_port)
tcp_socket_server.listen(5)


conn,addr=tcp_socket_server.accept()


data1=conn.recv(2) #Not complete at one time
data2=conn.recv(10)#Next time you collect it, you will first retrieve the old data and then retrieve the new one.

print('----->',data1.decode('utf-8'))
print('----->',data2.decode('utf-8'))

conn.close()

Client side
#_*_coding:utf-8_*_
import socket
BUFSIZE=1024
ip_port=('127.0.0.1',8080)

s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
res=s.connect_ex(ip_port)


s.send('hello egg'.encode('utf-8'))

summary

The sticky phenomenon only occurs in the tcp protocol:

  1. On the surface, sticky packet problem is mainly due to the buffer mechanism of sender and receiver, and the characteristics of tcp protocol oriented to traffic communication.
  2. In fact, it's mainly because the receiver doesn't know the boundaries between messages and how many bytes of data to extract at one time.

Solution of sticky package

Solution 1

The root of the problem is that the receiver does not know the length of the byte stream to be transmitted by the sender, so the solution to the sticky packet is around how to let the sender know the total size of the byte stream it will send before sending the data, and then the receiver receives all the data in a dead cycle.

Server end
import socket
import struct

def my_send(conn,msg):
    msgb = msg.encode('utf-8')
    len_msg = len(msgb)
    pack_len = struct.pack('i', len_msg)
    conn.send(pack_len)
    conn.send(msgb)

sk = socket.socket()
sk.bind(('127.0.0.1',9002))
sk.listen()

conn,addr = sk.accept()
msg1 = 'Hello'
msg2 = 'Have you eaten yet?'
my_send(conn,msg1)
my_send(conn,msg2)
conn.close()
sk.close()

Client side
import time
import socket
import struct
sk = socket.socket()

def my_recv(sk):
    pack_len = sk.recv(4)
    len_msg = struct.unpack('i', pack_len)[0]
    msg = sk.recv(len_msg).decode('utf-8')
    return msg

sk.connect(('127.0.0.1',9002))
for i in range(100000):i*2
msg = my_recv(sk)
print(msg)
msg = my_recv(sk)
print(msg)
sk.close()

Existing problems:
The program runs much faster than the network transmission speed, so before sending a byte, send the byte stream length first, which will amplify the performance loss caused by network delay.

Solution Advancement

Just now, the problem is that we are sending

We can use a module that can convert the length of data to a fixed length of bytes. In this way, as long as the client accepts the fixed-length byte before receiving the message to see the size of the information to be received next, the final accepted data will stop as long as it reaches this value, and it will just receive a lot of complete data.

struct module

The module can convert a type, such as a number, to a fixed length of bytes.

>>> struct.pack('i',1111111111111)

struct.error: 'i' format requires -2147483648 <= number <= 2147483647 #This is the scope.

import json,struct
#Suppose you upload the file a.txt of 1T:1073741824000 through the client

#To avoid sticking, you have to customize your header
header={'file_size':1073741824000,'file_name':'/a/b/c/d/e/a.txt','md5':'8f6fbf8347faa4924a76856701edb0f3'} #1T data, file path and md5 value

#In order for the header to be transmitted, it needs to be serialized and converted to bytes.
head_bytes=bytes(json.dumps(header),encoding='utf-8') #Serialization and conversion to bytes for transmission

#To let the client know the length of the header, use struck to convert the number of header lengths to a fixed length: 4 bytes
head_len_bytes=struct.pack('i',len(head_bytes)) #These four bytes contain only one number, which is the length of the header.

#Client starts sending
conn.send(head_len_bytes) #The length of the first header, 4 bytes
conn.send(head_bytes) #Byte format for retransmitting headers
conn.sendall(Document content) #Then send the real content in byte format

#Server begins to receive
head_len_bytes=s.recv(4) #Receive 4 bytes of the header first, and get the byte format of the header length
x=struct.unpack('i',head_len_bytes)[0] #Extraction header length

head_bytes=s.recv(x) #The bytes format for receiving headers according to header length x
header=json.loads(json.dumps(header)) #Extract header

#Finally, real data is extracted from the content of the header, such as
real_data_len=s.recv(header['file_size'])
s.recv(real_data_len)

More detailed usage

#_*_coding:utf-8_*_
#http://www.cnblogs.com/coser/archive/2011/12/17/2291160.html
__author__ = 'Linhaifeng'
import struct
import binascii
import ctypes

values1 = (1, 'abc'.encode('utf-8'), 2.7)
values2 = ('defg'.encode('utf-8'),101)
s1 = struct.Struct('I3sf')
s2 = struct.Struct('4sI')

print(s1.size,s2.size)
prebuffer=ctypes.create_string_buffer(s1.size+s2.size)
print('Before : ',binascii.hexlify(prebuffer))
# t=binascii.hexlify('asdfaf'.encode('utf-8'))
# print(t)


s1.pack_into(prebuffer,0,*values1)
s2.pack_into(prebuffer,s1.size,*values2)

print('After pack',binascii.hexlify(prebuffer))
print(s1.unpack_from(prebuffer,0))
print(s2.unpack_from(prebuffer,s1.size))

s3=struct.Struct('ii')
s3.pack_into(prebuffer,0,123,123)
print('After pack',binascii.hexlify(prebuffer))
print(s3.unpack_from(prebuffer,0))

Using struct to solve sticky packages

With the help of struct module, we know that the length number can be converted into a standard size of 4 bytes. Therefore, this feature can be used to pre-send data length.

When sending When receiving
Send 4 bytes of struct-converted data first Accept four bytes first and use struct to convert them into numbers to get the length of the data to be received
Re-send data Receive data according to length

Example of File Transfer

Server
# receive files
import json
import socket
sk = socket.socket()
sk.bind(('127.0.0.1',9001))
sk.listen()

conn,addr = sk.accept()
file_dic = conn.recv(1024).decode('utf-8')
dic = json.loads(file_dic)

with open(dic['filename'],mode='wb') as f:
    while dic['filesize']>0:
        file_content = conn.recv(1024)
        dic['filesize'] -= len(file_content)
        f.write(file_content)
conn.close()
sk.close()
Client
import os
import json
import socket

sk = socket.socket()
sk.connect(('127.0.0.1',9001))
# Enter the file to be sent, get and send the file size
file_path = r'D:\ev Video saved on video\20190719_150518.mp4'
file_name = os.path.basename(file_path)
file_size = os.path.getsize(file_path)
dic = {'filename':file_name,'filesize':file_size}
str_dic = json.dumps(dic)
dic_b = str_dic.encode('utf-8')
sk.send(dic_b)
with open(file_path,mode = 'rb') as f:
    content = f.read()
    sk.send(content)
sk.close()

We can also make headers into dictionaries, which contain detailed information about the real data to be sent, and then json serialization, and then use struck to package the serialized data length into four bytes (four are enough for ourselves).

When sending When receiving
Start-up header length Receive the length of the header first and take it out with struct
Re-encoding header content and sending The header content is collected according to the length of the retrieved header, then decoded and deserialized.
Final delivery of real content Extract the details of the data to be retrieved from the deserialized results, and then retrieve the real data content

Keywords: Python socket network JSON Unix

Added by imcomguy on Mon, 09 Sep 2019 14:21:10 +0300