Network programming
Architecture of Software Development
- Application class: QQ Weixinpin FTP mesh disk and so on, which belongs to the application class that needs to be installed.
- WEB class: such as Baidu, Zhizhi, Weibo and other applications that can be directly used by browser access
C/S Architecture
C/S (Client and Server), Chinese meaning: Client and Server architecture, which is also divided from the user level (physical level).
The client here generally refers to the client application exe. The program needs to be installed before it can run on the user's computer. It relies heavily on the user's computer operating system environment.
B/S Architecture
B/S: Browser and Server, Chinese: Browser and Server architecture, which is divided from the user level.
Browser browser, in fact, is also a client client client, but this client does not need to install any applications, only the user in the browser through HTTP request server-side related resources (web resources), the client Browser can mirror the addition, deletion and modification.
Network Foundation
1. How can a program find another program on the network
First, the program must be started. Second, it must have the address of the machine. In the Internet, the address of a computer is represented by a series of numbers, such as 78.5.6.29.
What is an IP address? IP address refers to Internet Protocol Address, which is the abbreviation of IP Address. IP address is a unified address format provided by IP protocol. It assigns a logical address to every network and host on the Internet to shield the difference of physical address. IP address is a 32-bit binary number, usually divided into four "8-bit binary numbers" (that is, four bytes). IP addresses are usually expressed in the form of point decimal system (a.b.c.d), where a, b, C and D are decimal integers between 0 and 255. Example: The dotted decimal IP address (100.4.5.6) is actually a 32-bit binary number (01100100.00000100.00000101.00000110). What is a port? Port is the free translation of English port, which can be regarded as the export of communication between equipment and the outside world. Viewing Port Occupancy in windows netstat -aon|findstr "49157"So the ip address is accurate to a specific computer, and the port is accurate to a specific program.
2. Understanding Socket
Socket is the middle software abstraction layer of communication between application layer and TCP/IP protocol group. It is a group of interfaces. In the design mode, Socket is actually a facade mode. It hides the complex TCP/IP protocol behind the Socket interface. For users, a simple set of interfaces is all, allowing Socket to organize data to conform to the specified protocol.
Look at socket from your point of view In fact, from your point of view, socket is a module. We establish the connection and communication between the two processes by calling the methods already implemented in the module. Others refer to socket as ip+port, because ip is used to identify the location of a host in the Internet, and port is used to identify an application on that machine. So as long as we establish the ip and port, we can find an application and use the socket module to communicate with it.3. The History of Socket
Sockets originated from the 1970s version of Unix at the University of California, Berkeley, known as BSD Unix. Therefore, sockets are sometimes referred to as "Berkeley sockets" or "BSD sockets". Initially, sockets were designed to communicate between multiple applications on the same host. This is also called interprocess communication, or IPC. There are two kinds of sockets (or two races), file-based sockets and network-based sockets.
Socket Family Based on File Type
The name of the socket family: AF_UNIX
unix is all files. File-based sockets call the underlying file system to fetch data. Two socket processes run on the same machine and can communicate indirectly by accessing the same file system.
Socket Family Based on Network Type
The name of the socket family: AF_INET
(There are also AF_INET6 used for ipv6, and other address families, but they are either only used for a platform, or have been abandoned, or rarely used, or not implemented at all. Of all address families, AF_INET is the most widely used one, python supports many address families, But since we only care about network programming, we only use AF_INET most of the time.
4.TCP Protocol and UDP Protocol
- TCP: (Transmisson Control Protocol) Reliable, Connection-Oriented Protocol (Call), Low Transfer Efficiency Full Duplex Communication (Send Cache & Accept Cache), Byte Stream Oriented. Application of TCP: WEB Browser; E-mail; File Transfer Program
- UDP: (User Data Protocol) unreliable, connectionless services, high transmission efficiency (small delay before sending), one-to-one, one-to-many, many-to-one, message-oriented, best service, no congestion control. Applications using UDP: Domain Name System; Video Stream; Vo IP.
Initial use of sockets:
Socket Based on TCP Protocol
tcp is based on links, it must start the server first, and then start the client to link the server.
Server end
import socket sk = socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM) # Buy a cell phone # family = socket.AF_INET is currently network-based # type = socket.SOCK_STREAM defaults to tcp protocol sk.bind(('127.0.0.1',9000)) # Install a telephone card sk.listen() # Boot up while True: conn,addr = sk.accept() # Wait for the phone, wait for the client to link me # conn is a connection between server and client while True: msg_send = input('>>>') conn.send(msg_send.encode('utf-8')) # Send information to client if msg_send.upper() == 'Q': break # Judging q exit as input msg = conn.recv(1024).decode('utf-8') #Receiving client information if msg.upper() == 'Q': break print(msg) conn.close() # Hang up sk.close() # Turn off the cell phone # Whether in server or client, just enter q and disconnect on both sides - hang up the phone
Client side
import socket sk = socket.socket() # Instantiate a socket object sk.connect(('127.0.0.1',9000)) while True: msg = sk.recv(1024).decode('utf-8') # Bytes block until data is sent if msg.upper() == 'Q': break print(msg) # Byte-to-byte string decode msg_send = input('>>>') # input writes a string sk.send(msg_send.encode('utf-8')) # Send bytes, string to byte encode if msg_send.upper() == 'Q': break sk.close()
Socket Based on UDP Protocol
udp is linkless, and can receive messages directly after starting the service without having to establish links in advance.
Server end
import socket sk = socket.socket(type=socket.SOCK_DGRAM) #Create a socket for a server sk.bind(('127.0.0.1',8001)) #Binding server socket while True: msg,addr = sk.recvfrom(1024) print(msg.decode('utf-8')) send_msg = input('>>>') sk.sendto(send_msg.encode('utf-8'),addr) # Dialogue (Receiving and Sending) sk.close() # Close the server socket
Client side
import socket sk = socket.socket(type=socket.SOCK_DGRAM) while True: send_msg = input('>>>') sk.sendto(send_msg.encode('utf-8'),('127.0.0.1',8001)) msg,addr = sk.recvfrom(1024) print(msg.decode('utf-8')) sk.close()
Example
QQ Chat
- Server end
import socket ip_port = ('127.0.0.1',9001) udp_server_sock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM) udp_server_sock.bind(ip_port) while True: qq_msg,addr = udp_server_sock.recvfrom(1024) print('Come from[%s:%s]A message:\033[1;44m%s\033[0m'%(addr[0],addr[1],qq_msg.decode('utf8'))) back_msg = input('Reply to the message:').strip() udp_server_sock.sendto(back_msg.encode('utf8'),addr)
- Client side
import socket BUFSIZE = 1024 udp_client_socket = socket.socket(socket.AF_INET,socket.SOCK_DGRAM) qq_name_dic = { 'Dad King':('127.0.0.1',9001), 'Alex':('127.0.0.1',9001) } while True: qq_name = input('Please choose the chat object:').strip() while True: msg = input('Please enter a message,Return Delivery,input q End the chat with him: ').strip() if msg == 'q':break if not msg or not qq_name or not qq_name in qq_name_dic:continue udp_client_socket.sendto(msg.encode('utf8'),qq_name_dic[qq_name]) back_msg,addr = udp_client_socket.recvfrom(BUFSIZE) print('Come from[%s:%s]A message:\033[1;44m%s\033[0m'%(addr[0],addr[1],back_msg.decode('utf8'))) udp_client_socket.close()
time server
- Server end
from socket import * from time import strftime ip_port = ('127.0.0.1',9000) BUFSIZE = 1024 tcp_server = socket(AF_INET,SOCK_DGRAM) tcp_server.setsockopt(SOL_SOCKET,SO_REUSEADDR,1) tcp_server.bind(ip_port) while True: msg,addr = tcp_server.recvfrom(BUFSIZE) print('===>',msg) if not msg: time_fmt = '%Y-%m-%d %x' else: time_fmt = msg.decode('utf8') back_msg = strftime(time_fmt) tcp_server.sendto(back_msg.encode('utf8'),addr) tcp_server.close()
- Client side
from socket import * ip_port = ('127.0.0.1',9000) BUFSIZE = 1024 tcp_client = socket(AF_INET,SOCK_DGRAM) while True: msg = input('Please enter the time format(example%Y %m %d)>>: ').strip() tcp_client.sendto(msg.encode('utf8'),ip_port) data = tcp_client.recv(BUFSIZE) print(data.decode('utf8'))
Feixin Communications
from socket import * updsocket = socket(type = SOCK_DGRAM) addr = ("192.168.0.168",2425) msg = input('>>>') updsocket.sendto(("1:111:eva:eva:32:%s"%msg).encode('gbk'),addr)
When it runs in autumn, it will listen on port 2425, so we need to establish UDP connection locally first.
1:111:eva:eva:32:The content to be sent
1 for version number, 111 for package number, eva for user name, second eva for host name, 32 for sending message, and later for sending message content.
Detailed explanation of socket parameters
socket.socket(family=AF_INET,type=SOCK_STREAM,proto=0,fileno=None)
Description of parameters for creating socket objects:
parameter | Explain |
---|---|
family | The address series should be AF_INET (default), AF_INET6,AF_UNIX,AF_CAN or AF_RDS. (The AF_UNIX domain actually uses local socket files to communicate) |
type | Socket type should be one of SOCK_STREAM (default), SOCK_DGRAM,SOCK_RAW or other SOCK_constants. <SOCK_STREAM is TCP-based, secure (that is, to ensure the correct transmission of data to the other party) connection-oriented SOCKET, mostly used for data transmission. SOCK_DGRAM is a UDP-based, unsecured message-oriented socket, mostly used for broadcasting information on the network. |
proto | The protocol number is usually zero and can be omitted, or in the case of address family AF_CAN, the protocol should be one of CAN_RAW or CAN_BCM. |
fileno | If fileno is specified, other parameters are ignored, resulting in socket returns with the specified file descriptor. Unlike socket.fromfd(), fileno will return the same socket instead of repeating it. This may help to close a separate socket using socket.close(). |
Sticky bag
Now let's take a look at the phenomenon.
# server side import socket sk = socket.socket() sk.bind(('127.0.0.1', 9000)) sk.listen() conn,addr = sk.accept() for i in range(3): conn.send(b'sbzz') conn.close() sk.close() # Client import socket sk = socket.socket() sk.connect(('127.0.0.1', 9000)) for i in range(3): print(sk.recv(1024)) sk.close()
The results of execution of the above code
The normal acceptance should be the third sbzz, but by the second time, the second and third sbzz stick together, this phenomenon is called sticky package.
Note: Only TCP sticks, UDP never sticks.
Streaming transport: TCP protocol, like pipelining, without boundaries
Causes of sticking
Data Transfer in TCP Protocol
Unpacking mechanism of tcp protocol
When the length of the sender buffer is longer than the MTU of the network card, tcp will split the data sent into several packets and send them out.
MTU is the abbreviation of Maximum Transmission Unit. This means the largest data packet transmitted over the network. The unit of MTU is bytes. Most network devices have MTUs of 1500. If the MTU of the local machine is larger than that of the gateway, the large data packets will be disassembled and transmitted, which will generate a lot of data packet fragments, increase the packet loss rate and reduce the network speed.
Flow-Oriented Communication Characteristics and Nagle Algorithms
TCP (transport control protocol) is connection-oriented, flow-oriented and provides high reliability services.
There must be a pair of socket s at both ends (client and server). Therefore, in order to send multiple packets to each other more effectively, the sender uses an optimization method (Nagle algorithm), which combines data with smaller intervals and smaller amounts of data into a large data block, and then packages it. .
In this way, it is difficult to distinguish the receiver, and a scientific unpacking mechanism must be provided. That is to say, flow-oriented communication has no message protection boundary.
For empty messages: tcp is based on data stream, so the messages sent and received can not be empty. This requires adding empty message processing mechanism to both client and server to prevent the program from getting stuck. udp is based on datagram, even if you input empty content (direct return), it can also be sent. udp protocol will help you. Encapsulate the message header and send it to you.
Reliable sticky tcp protocol: tcp protocol data will not be lost, did not receive the package, the next time received, will continue to receive the last time, his end always receives ack before clearing the buffer content. The data is reliable, but sticky.
The cause of sticky phenomenon based on the characteristics of tcp protocol
The sender can send data one K, while the receiver's application can pick up data two K and two K. Of course, it is also possible to pick up 3K or 6K data at a time, or only a few bytes of data at a time.
That is to say, the data that an application sees is a whole, or a stream. How many bytes of a message are invisible to the application, so TCP is a flow-oriented protocol, which is also the reason why sticky packets are easy to occur.
UDP is a message-oriented protocol. Every UDP segment is a message. The application must extract data as a unit of information. It can not extract any byte of data at a time. This is very different from TCP.
How do you define messages? It can be considered that the data of the other party's one-time write/send is a message. It needs to be understood that when the other party sends a message, no matter how fragmented the underlying layer is, the TCP protocol layer will sort the data segments that make up the whole message before presenting them in the kernel buffer.
Explanation of User State and Kernel State in socket Data Transmission
For example, a socket client based on tcp uploads a file to the server. When the file is sent, the content of the file is sent according to a sequence of byte streams. When the receiver sees it, he has no idea where the byte stream of the file starts and ends.
In addition, the sticky packets caused by the sender are caused by the TCP protocol itself. In order to improve the transmission efficiency, the sender often needs to collect enough data before sending a TCP segment. If send data is scarce for several consecutive times, TCP is usually optimized algorithm These data are synthesized into a TCP segment and sent out once, so that the receiver receives the sticky packet data.
UDP does not stick
UDP (user datagram protocol) is connectionless, message-oriented and provides efficient services.
Since UDP supports one-to-many mode, skbuff (socket buffer) at the receiving end uses a chain structure to record every arriving UDP packet. In each UDP packet, there is a header (message source address, port, etc.). So, for the receiving end, skbuff (socket buffer) at the receiving end uses a chain structure to record every arriving UDP packet. It's easy to distinguish between them. That is, message-oriented communication has message protection boundaries.
For empty messages: tcp is based on data stream, so the messages sent and received can not be empty. This requires adding empty message processing mechanism to both client and server to prevent the program from getting stuck. udp is based on datagram, even if you input empty content (direct return), it can also be sent. udp protocol will help you. Encapsulate the message header and send it to you.
Unreliable and non-sticky UDP protocol: udp's recvfrom is blocked, and a recvfrom(x) must complete the only sendinto(y) after receiving x bytes of data, if y;x data is lost, which means UDP will not stick the package at all, but it will lose data, unreliable.
Supplementary Notes:
When sending data with UDP protocol, the maximum length of sending data with sendto function is 65535- IP header (20) - UDP header (8) = 65507 bytes. When sending data with sendto function, if the length of sending data is longer than that value, the function returns an error. (Discard this package and do not send it)
When sending with TCP protocol, because TCP is a data flow protocol, there is no restriction on the size of the packet (without considering the size of the buffer), which means that when using send function, the data length parameter is not limited. In fact, the specified data is not necessarily sent out at one time. If the data is long, it will be sent in segments. If it is short, it may wait for the next data to be sent together.
There are two situations in which stickiness occurs
The sender needs to wait until the buffer is full to send out, resulting in sticky packets (the time interval between sending data is very short, the data is very small, will join together to produce sticky packets).
Case 1 Sender's Caching Mechanism
Server end
#_*_coding:utf-8_*_ from socket import * ip_port=('127.0.0.1',8080) tcp_socket_server=socket(AF_INET,SOCK_STREAM) tcp_socket_server.bind(ip_port) tcp_socket_server.listen(5) conn,addr=tcp_socket_server.accept() data1=conn.recv(10) data2=conn.recv(10) print('----->',data1.decode('utf-8')) print('----->',data2.decode('utf-8')) conn.close()
Client side
#_*_coding:utf-8_*_ import socket BUFSIZE=1024 ip_port=('127.0.0.1',8080) s=socket.socket(socket.AF_INET,socket.SOCK_STREAM) res=s.connect_ex(ip_port) s.send('hello'.encode('utf-8')) s.send('egg'.encode('utf-8'))
Case 2 Receiver's Caching Mechanism
Receiver fails to receive buffer packets in time, resulting in multiple packet receipts (the client sends a piece of data, the server only receives a small part of the data, the next time the server receives it, it still takes the last legacy data from the buffer and produces sticky packets).
Server end
#_*_coding:utf-8_*_ from socket import * ip_port=('127.0.0.1',8080) tcp_socket_server=socket(AF_INET,SOCK_STREAM) tcp_socket_server.bind(ip_port) tcp_socket_server.listen(5) conn,addr=tcp_socket_server.accept() data1=conn.recv(2) #Not complete at one time data2=conn.recv(10)#Next time you collect it, you will first retrieve the old data and then retrieve the new one. print('----->',data1.decode('utf-8')) print('----->',data2.decode('utf-8')) conn.close()
Client side
#_*_coding:utf-8_*_ import socket BUFSIZE=1024 ip_port=('127.0.0.1',8080) s=socket.socket(socket.AF_INET,socket.SOCK_STREAM) res=s.connect_ex(ip_port) s.send('hello egg'.encode('utf-8'))
summary
The sticky phenomenon only occurs in the tcp protocol:
- On the surface, sticky packet problem is mainly due to the buffer mechanism of sender and receiver, and the characteristics of tcp protocol oriented to traffic communication.
- In fact, it's mainly because the receiver doesn't know the boundaries between messages and how many bytes of data to extract at one time.
Solution of sticky package
Solution 1
The root of the problem is that the receiver does not know the length of the byte stream to be transmitted by the sender, so the solution to the sticky packet is around how to let the sender know the total size of the byte stream it will send before sending the data, and then the receiver receives all the data in a dead cycle.
Server end
import socket import struct def my_send(conn,msg): msgb = msg.encode('utf-8') len_msg = len(msgb) pack_len = struct.pack('i', len_msg) conn.send(pack_len) conn.send(msgb) sk = socket.socket() sk.bind(('127.0.0.1',9002)) sk.listen() conn,addr = sk.accept() msg1 = 'Hello' msg2 = 'Have you eaten yet?' my_send(conn,msg1) my_send(conn,msg2) conn.close() sk.close()
Client side
import time import socket import struct sk = socket.socket() def my_recv(sk): pack_len = sk.recv(4) len_msg = struct.unpack('i', pack_len)[0] msg = sk.recv(len_msg).decode('utf-8') return msg sk.connect(('127.0.0.1',9002)) for i in range(100000):i*2 msg = my_recv(sk) print(msg) msg = my_recv(sk) print(msg) sk.close()
Existing problems:
The program runs much faster than the network transmission speed, so before sending a byte, send the byte stream length first, which will amplify the performance loss caused by network delay.
Solution Advancement
Just now, the problem is that we are sending
We can use a module that can convert the length of data to a fixed length of bytes. In this way, as long as the client accepts the fixed-length byte before receiving the message to see the size of the information to be received next, the final accepted data will stop as long as it reaches this value, and it will just receive a lot of complete data.
struct module
The module can convert a type, such as a number, to a fixed length of bytes.
>>> struct.pack('i',1111111111111) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 #This is the scope.
import json,struct #Suppose you upload the file a.txt of 1T:1073741824000 through the client #To avoid sticking, you have to customize your header header={'file_size':1073741824000,'file_name':'/a/b/c/d/e/a.txt','md5':'8f6fbf8347faa4924a76856701edb0f3'} #1T data, file path and md5 value #In order for the header to be transmitted, it needs to be serialized and converted to bytes. head_bytes=bytes(json.dumps(header),encoding='utf-8') #Serialization and conversion to bytes for transmission #To let the client know the length of the header, use struck to convert the number of header lengths to a fixed length: 4 bytes head_len_bytes=struct.pack('i',len(head_bytes)) #These four bytes contain only one number, which is the length of the header. #Client starts sending conn.send(head_len_bytes) #The length of the first header, 4 bytes conn.send(head_bytes) #Byte format for retransmitting headers conn.sendall(Document content) #Then send the real content in byte format #Server begins to receive head_len_bytes=s.recv(4) #Receive 4 bytes of the header first, and get the byte format of the header length x=struct.unpack('i',head_len_bytes)[0] #Extraction header length head_bytes=s.recv(x) #The bytes format for receiving headers according to header length x header=json.loads(json.dumps(header)) #Extract header #Finally, real data is extracted from the content of the header, such as real_data_len=s.recv(header['file_size']) s.recv(real_data_len)
More detailed usage
#_*_coding:utf-8_*_ #http://www.cnblogs.com/coser/archive/2011/12/17/2291160.html __author__ = 'Linhaifeng' import struct import binascii import ctypes values1 = (1, 'abc'.encode('utf-8'), 2.7) values2 = ('defg'.encode('utf-8'),101) s1 = struct.Struct('I3sf') s2 = struct.Struct('4sI') print(s1.size,s2.size) prebuffer=ctypes.create_string_buffer(s1.size+s2.size) print('Before : ',binascii.hexlify(prebuffer)) # t=binascii.hexlify('asdfaf'.encode('utf-8')) # print(t) s1.pack_into(prebuffer,0,*values1) s2.pack_into(prebuffer,s1.size,*values2) print('After pack',binascii.hexlify(prebuffer)) print(s1.unpack_from(prebuffer,0)) print(s2.unpack_from(prebuffer,s1.size)) s3=struct.Struct('ii') s3.pack_into(prebuffer,0,123,123) print('After pack',binascii.hexlify(prebuffer)) print(s3.unpack_from(prebuffer,0))
Using struct to solve sticky packages
With the help of struct module, we know that the length number can be converted into a standard size of 4 bytes. Therefore, this feature can be used to pre-send data length.
When sending | When receiving |
---|---|
Send 4 bytes of struct-converted data first | Accept four bytes first and use struct to convert them into numbers to get the length of the data to be received |
Re-send data | Receive data according to length |
Example of File Transfer
Server
# receive files import json import socket sk = socket.socket() sk.bind(('127.0.0.1',9001)) sk.listen() conn,addr = sk.accept() file_dic = conn.recv(1024).decode('utf-8') dic = json.loads(file_dic) with open(dic['filename'],mode='wb') as f: while dic['filesize']>0: file_content = conn.recv(1024) dic['filesize'] -= len(file_content) f.write(file_content) conn.close() sk.close()
Client
import os import json import socket sk = socket.socket() sk.connect(('127.0.0.1',9001)) # Enter the file to be sent, get and send the file size file_path = r'D:\ev Video saved on video\20190719_150518.mp4' file_name = os.path.basename(file_path) file_size = os.path.getsize(file_path) dic = {'filename':file_name,'filesize':file_size} str_dic = json.dumps(dic) dic_b = str_dic.encode('utf-8') sk.send(dic_b) with open(file_path,mode = 'rb') as f: content = f.read() sk.send(content) sk.close()
We can also make headers into dictionaries, which contain detailed information about the real data to be sent, and then json serialization, and then use struck to package the serialized data length into four bytes (four are enough for ourselves).
When sending | When receiving |
---|---|
Start-up header length | Receive the length of the header first and take it out with struct |
Re-encoding header content and sending | The header content is collected according to the length of the retrieved header, then decoded and deserialized. |
Final delivery of real content | Extract the details of the data to be retrieved from the deserialized results, and then retrieve the real data content |