Write an HTTP forward proxy

concept

First, let's understand the related concepts of HTTP proxy. Generally speaking, there are two types of HTTP proxy, one is forward proxy and the other is reverse proxy. Although they are all proxies, there are still differences.

The VPN we usually use is the forward proxy. We specify a server, and then connect to the server through the forward proxy to obtain resources

Nginx is a typical reverse proxy server, which can be used for load balancing and caching. We don't know the address of the server, but when we visit the reverse proxy server, it will automatically request the server for us and return the corresponding content.

Summary: under forward proxy, we know the specific address of the server. Under reverse proxy, we only need to know the address of the proxy service, not the specific server address.

Here, we talk about forward proxy. First, let's briefly recall the HTTP protocol. You can also refer to the previous article

HTTP protocol

HTTP is an application layer protocol built based on the transport layer protocol. In fact, there is no magical part of HTTP request and response. It is a Client/Server model. The client sends data through the socket, the server parses it, processes it, and then returns the response.

Here is only a brief introduction to the protocol format. If you want to know more, you can refer to it HTTP/1.1 , the following contents are obtained through wireshark packet capture.

request

The format is as follows:

Each line ends with \ r\n and the request body can be empty

response

The format is as follows:

Each line also ends with \ r\n, and the response body can be empty

The main difference in format between request and response is in the first line, that is, the contents of request line and response line are inconsistent. In addition, the header values of request and response are also different. Some headers are only used for request and some headers are only used for response, but most common headers can be used in response and request.

If we want to construct a request, we can create a socket and send the data in the above format:

// Connect server
conn, err := net.Dial("tcp", "httpbin.org:80")
if err != nil {
  fmt.Println("Dial tcp err: ", err)
  return 
}

// Construction request
msg := strings.Builder{}
msg.WriteString("GET /get HTTP/1.1\r\n")
msg.WriteString("Host: httpbin.org\r\n")
msg.WriteString("Accept: application/json\r\n")
msg.WriteString("Connection: close\r\n")
msg.WriteString("\r\n")

// send content
_, err = conn.Write([]byte(msg.String()))
if err != nil {
  fmt.Println("Send msg err: ", err)
  return 
}

Similarly, the response is the same. It will not be repeated here. Readers can write their own code or use wireshark to capture packets for testing.

Forward proxy implementation

As mentioned above, under the forward proxy, we will first connect to the proxy server, and then the proxy service will request the corresponding resources on the server. As a proxy server, how do we know what resources the client needs to request?

There are different concerns. When the client connects to the proxy, there are certain specifications, not random connection. For the HTTP protocol, the request proxy server is roughly the same as the ordinary request server, but the request path is generally set to the absolute path, such as GET http://httpbin.org/ HTTP/1.1 instead of GET / HTTP/1.1, for HTTPS, First, CONNECT to the proxy server through CONNECT. After receiving the 200 response, the actual encrypted data will be sent.

First, let's consider HTTP. Here are the requests to connect to the server:

What we need to do is to get the corresponding server address, that is, the Host field. After parsing the header data, we can get the field.

type Request struct {
	Method  string
	Path    string
	Version string
	Headers http.Header
	Body    []byte
	raw     []byte  // Original request
}

func (r Request) Host() (string, bool) {
	if r.Headers.Get("Host") != "" {
		return r.Headers.Get("Host"), true
	}
	return "", false
}

func ParseRequest(conn io.Reader) (*Request, error) {
	br := bufio.NewReader(conn)
	// ... omit some code

  // Parse request header
	for {
		line, err := br.ReadBytes('\n')
		if err != nil {
			if err == io.EOF {
				break
			}
		}
		req.raw = append(req.raw, line...)
		line = bytes.TrimSpace(line)
		// \r\n
		if len(line) == 0 {
			break
		}
		colon := bytes.IndexByte(line, ':')
    // Bytes2Str converts [] byte to string
		req.Headers.Add(byteconv.Bytes2Str(bytes.TrimSpace(line[:colon])), byteconv.Bytes2Str(bytes.TrimSpace(line[colon+1:])))
	}
	
    // ... omit some code
    return req, nil
}

After obtaining the address of the server, establish a TCP connection, and then send the request

if !strings.Contains(host, ":") {
	host += ":80"
}

server, err := net.Dial("tcp", host)
if err != nil {
    conn.Close()
    log.Println("Dial server failed: ", err)
    return
}

_, err = server.Write(request.Raw())
if err != nil {
    log.Println("Write server failed: ", err)
    conn.Close()
    server.Close()
    return
}

Finally, return the response to the client and call io.Copy directly

tunnel(conn, server)

func tunnel(client net.Conn, server net.Conn) {
	go io.Copy(server, client)
	go io.Copy(client, server)
}

If we want to filter or perform other operations on the response, we should parse it. The parsing process is similar to the request. If we need to filter it out, we will not return the result, but return some error codes, such as 403.

In fact, the HTTPS protocol is almost the same, but because HTTPS uses SSL/TLS to encrypt data, we can't parse the actual request, but parsing the CONNECT request can meet our needs.

After receiving the CONNECT request, we should first return 2xx to indicate that the connection is successful, and then send the requested data to the server intact. The data returned by the server is also returned to the client intact. We don't care about the data, even if we care, we can't decrypt it.

if request.Method ==  "CONNECT"{
    conn.Write([]byte("HTTP/1.1 200 OK\r\n\r\n"))
    tunnel(conn, server)
    return
}

Test

Download the plug-in Proxy SwitchyOmega. If you use Edge, see here , if you use Google, see here

Open the plug-in for configuration, as follows

Visit a website, such as http://httpbin.org , if it can be accessed, the proxy server is effective~

See for complete code GitHub , original text: Wechat push

Keywords: Go network server computer networks http

Added by ununium on Tue, 07 Dec 2021 08:06:39 +0200