PGSQL protocol parsing and C/C + + code implementation

PostgreSQL uses message based protocols to communicate between the front end and the back end (client and server). T port number 5432 is registered with IANA as the preferred TCP port number of servers supporting this protocol, but actually any non privileged port number can be used.

What is PostgreSQL?

PostgreSQL, also known as Postgres, is an open source relational database, which enjoys a high reputation for its reliability, flexibility and support for open technology standards. PostgreSQL supports non relational and relational data types. It is known as one of the most compatible, stable and mature relational databases available today, and can easily handle complex queries.

PostgreSQL communication process

PostgreSQL performs the communication process after three TCP handshakes. The PostgreSQL protocol has separate stages for starting and querying operations.

The communication device initiated by the client sends a TCP packet to the server. This initial packet does not contain any data except the underlying protocol header. The TCP header of this packet sets the SYN ID and contains the initial sequence number and maximum segment size that will be used in the communication process. The server replied to the packet with a similar set of SYN and ACK flags and the packet containing its initial serial number.

Finally, the client sends the last packet with only the ACK flag set to the server. After this process is completed, both parties should have the information required to start normal communication.

Start the first phase

In the startup phase, the front end opens the connection to the server and verifies itself to satisfy the server. If everything goes well, the server sends the status information to the front end and finally enters normal operation. In addition to the initial startup request message, this part of the protocol is server driven.

  • Start-up Message

To start a session, the PostgreSQL startup package is issued by the client. The client opens a connection to the server and sends a startup message. The message includes the user name and the name of the database to which the user wants to connect, and it also identifies the specific protocol version to use.


Type: message type
Length: length
Protocol major version: identifies the specific version used
Protocol minor version: identifies the specific version used
Parameter name: database login user name
Parameter value: the name of the connected database

In the startup phase, the client sends a StartupMessage to the server

Authentication request

After receiving it, the server will judge whether the user needs password authentication. If so, it will send an Authentication request to request password authentication information.


Type: Authentication request(R)
Length: length
Authentication type:5(MD5 password)

  • Password message

Type: Password messageĀ§
Length: length
Password:password value

  • Parameter Status

After the front end sends the password authentication successfully, the back end will start sending various Parameter Status, that is, a series of back-end parameters, such as.


ParameterStatus this message tells the front end the current configuration parameters, such as client_encoding or DataStyle. The front end can ignore this message or record it for future use. The front end does not need to respond to this message, but should continue to wait for ReadyForQuery message.

Query phase II

During normal operation, the front end sends commands such as query to the back end, and the back end sends back responses such as query results.

  • Simple Query

A simple query cycle is started by sending a query message from the front end to the back end. The message includes an SQL command (or commands) in a text string. Then, the back-end sends one or more response messages according to the content of the query command string, and finally sends a ReadyForQuery response message.


Parse: 'P' preprocessing statement. The message includes message length, command type and specific command statement, etc. In PostgreSQL, it is called extended query. The preprocessing query will input a statement for preprocessing. The statement contains some identifiers representing parameters, which will be filled and executed later.

Responses to SELECT queries typically include RowDescription, zero or more DataRow messages, and then CommandComplete. Copy to or call from the front end to a special protocol, and all other query types usually produce only one CommandComplete message.

CommandComplete: an SQL command ends normally.

Since a query string may contain multiple queries (separated by semicolons), there may be multiple such response sequences before the backend completes processing the query string. ReadyForQuery is issued when the entire string is processed and the back end is ready to accept a new query string.

If you receive a completely empty (nothing but white space) query string, the response is EmptyQueryResponse followed by ReadyForQuery.

If an error occurs, an ErrorResponse is issued, followed by ReadyForQuery. All further processing of the query string is aborted by ErrorResponse (even if there are more queries in it). Note that this can occur halfway through a sequence of messages generated by a single query.

In simple query mode, the format of the retrieved value is always text unless the given command is a FETCH from a cursor declared with the BINARY option. In this case, the retrieved value is in BINARY format. The format code given in the RowDescription message describes which format is being used.

Client command type for PostgreSQL

It mainly includes the following information:

    { 'p', "Authentication message" },
    { 'Q', "Simple query" },
    { 'P', "Parse" },
    { 'B', "Bind" },
    { 'E', "Execute" },
    { 'D', "Describe" },
    { 'C', "Close" },
    { 'H', "Flush" },
    { 'S', "Sync" },
    { 'F', "Function call" },
    { 'd', "Copy data" },
    { 'c', "Copy completion" },
    { 'f', "Copy failure" },
    { 'X', "Termination" },

Server command type for PostgreSQL

It mainly includes the following information:

    { 'R', "Authentication request" },
    { 'K', "Backend key data" },
    { 'S', "Parameter status" },
    { '1', "Parse completion" },
    { '2', "Bind completion" },
    { '3', "Close completion" },
    { 'C', "Command completion" },
    { 't', "Parameter description" },
    { 'T', "Row description" },
    { 'D', "Data row" },
    { 'I', "Empty query" },
    { 'n', "No data" },
    { 'E', "Error" },
    { 'N', "Notice" },
    { 's', "Portal suspended" },
    { 'Z', "Ready for query" },
    { 'A', "Notification" },
    { 'V', "Function call response" },
    { 'G', "CopyIn response" },
    { 'H', "CopyOut response" },
    { 'd', "Copy data" },
    { 'c', "Copy completion" },
    { 'v', "Negotiate protocol version" },

PGSQL protocol parsing and C/C + + code implementation

void dissect_pgsql_msg(u_char *pgsql_data,int offset,struct tcphdr *pTcpHdr,int payload_len)
{

    conn_data = (pgsql_conn_data_t *)malloc(sizeof(pgsql_conn_data_t));
	if ( conn_data == NULL)
	{
		printf("conn_data fail!!!\n");
		return ;
	}

	conn_data->ssl_requested = false;
	conn_data->auth_state = PGSQL_AUTH_STATE_NONE;

    type = pgsql_data[n + 0];

    if (type != '\0')
        n += 1;
    length = ntohl(pgsql_data + n);
    if (fe) 
	{
        if (type == '\0') 
		{
            unsigned int tag = ntohl(pgsql_data + n + 4);

            if (length == 16 && tag == 80877102)
                typestr = "Cancel request";
            else if (length == 8 && tag == 80877103)
                typestr = "SSL request";
            else if (length == 8 && tag == 80877104)
                typestr = "GSS encrypt request";
            else if (tag == 196608)
                typestr = "Startup message";
            else
                typestr = "Unknown";
        } 
		else if (type == 'p') 
		{
            switch (conn_data->auth_state) 
			{
                case PGSQL_AUTH_SASL_REQUESTED:
                    typestr = "SASLInitialResponse message";
                    break;
                case PGSQL_AUTH_SASL_CONTINUE:
                    typestr = "SASLResponse message";
                    break;
                case PGSQL_AUTH_GSSAPI_SSPI_DATA:
                    typestr = "GSSResponse message";
                    break;
                default:
                    typestr = "Password message";
                    break;
            }
        } 
		else
            typestr = val_to_str_const(type, fe_messages, "Unknown");
    }
    else 
	{
        typestr = val_to_str_const(type, be_messages, "Unknown");
    }
	

...
   
}

Analysis result:




What are the benefits of PostgreSQL?

Some of the main advantages of PostgreSQL:

1. Performance and scalability - including geospatial support and unlimited concurrency - and in-depth, extensive data analysis across multiple data types.
2. Concurrency is supported by using multi version concurrency control (MVCC), which enables write and read operations to occur simultaneously.
3. Due to its compatibility and support for a variety of programming languages, including Python, Java, JavaScript, C/C + + and Ruby, it has deep language support.
4. Business continuity, providing high availability services through asynchronous or synchronous replication across servers.
5. Achieve greater flexibility and cost-effective innovation through open source database management technology.

Welcome to WeChat official account [program ape encoding], PGSQL complete source code and message, please add my micro signal (c17865354792).

reference resources: https://www.postgresql.org/

Keywords: C C++ PostgreSQL pgsql

Added by podarum on Sun, 16 Jan 2022 17:47:02 +0200