go dns parsing process and tuning

background

Through zipkin, some students found that dns parsing occasionally took 40ms (expected to be within 1ms), and speculated that it was related to the alpine image.

The first reaction is unlikely to be the problem of the alpine image (the alpine image is used so frequently, and if there is a problem, it should have been fixed long ago). The following is an analysis of this problem.

dns parsing process in Go

First, let's learn how to perform dns resolution in golang. Look directly at the code. The key function is goLookupIPCNAMEOrder

// src/net/dnsclient_unix.go
func (r *Resolver) goLookupIPCNAMEOrder(ctx context.Context, network, name string, order hostLookupOrder) (addrs []IPAddr, cname dnsmessage.Name, err error) {
   // Omit check code

   // Read / etc / resolv Conf to prevent frequent reading. It takes effect once every 5 seconds
   resolvConf.tryUpdate("/etc/resolv.conf")

  // ...

  // ipv4 and ipv6 are resolved by default
   qtypes := []dnsmessage.Type{dnsmessage.TypeA, dnsmessage.TypeAAAA}
  // [key] depending on the network, only ipv4 is parsed at the end of 4, and only ipv6 is parsed at the end of 6
   switch ipVersion(network) {
   case '4':
      qtypes = []dnsmessage.Type{dnsmessage.TypeA}
   case '6':
      qtypes = []dnsmessage.Type{dnsmessage.TypeAAAA}
   }

   // ...

   // Judge / etc / resolv If the single request and single request open parameters in conf are set, they are serial requests. Otherwise, they are parallel requests
   if conf.singleRequest {
      queryFn = func(fqdn string, qtype dnsmessage.Type) {}
      responseFn = func(fqdn string, qtype dnsmessage.Type) result {
         dnsWaitGroup.Add(1)
         defer dnsWaitGroup.Done()
         p, server, err := r.tryOneName(ctx, conf, fqdn, qtype)
         return result{p, server, err}
      }
   } else {
      queryFn = func(fqdn string, qtype dnsmessage.Type) {
         dnsWaitGroup.Add(1)
         // See the go keyword? If single request is not set, it means concurrent resolution
         go func(qtype dnsmessage.Type) {
            p, server, err := r.tryOneName(ctx, conf, fqdn, qtype)
            lane <- result{p, server, err}
            dnsWaitGroup.Done()
         }(qtype)
      }
      responseFn = func(fqdn string, qtype dnsmessage.Type) result {
         return <-lane
      }
   }

  // The following code is also important
   var lastErr error
  // len(namelist) = len(search domain) + 1
  // Traverse nameserver, resolv Multiple nameservers can be configured in conf. for example, the following configuration namelist length is 4:
  // nameserver 169.254.20.10
  // nameserver 172.16.0.10
  // search meipian-test.svc.cluster.local svc.cluster.local cluster.local
   for _, fqdn := range conf.nameList(name) {
      // ...
      // Traverse and resolve types, here are ipv4 and ipv6
      for _, qtype := range qtypes {
        // ....
      }
   }
   // ...
   return addrs, cname, nil
}

From the above code, we can draw the following conclusions:

go implements dns parsing

DNS resolution has nothing to do with whether it is an alpine image, because DNS resolution in go is implemented by itself and does not depend on system calls. go build tag also proves this

//go:build aix || darwin || dragonfly || freebsd || linux || netbsd || openbsd || solaris
// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris

The built-in parser reads the configuration file

The go program reads and parses / etc / resolv Conf file, and Standard options Both are implemented, including single request and single request open option settings.

// src/net/dnsconfig_unix.go
case s == "single-request" || s == "single-request-reopen":
  // Linux option:
  // http://man7.org/linux/man-pages/man5/resolv.conf.5.html
  // "By default, glibc performs IPv4 and IPv6 lookups in parallel [...]
  //  This option disables the behavior and makes glibc
  //  perform the IPv6 and IPv4 requests sequentially."
  conf.singleRequest = true

The single request parameter is valid

If the single request option is set, dns parsing is serial

if conf.singleRequest {
        queryFn = func(fqdn string, qtype dnsmessage.Type) {}
        responseFn = func(fqdn string, qtype dnsmessage.Type) result {
            dnsWaitGroup.Add(1)
            defer dnsWaitGroup.Done()
            p, server, err := r.tryOneName(ctx, conf, fqdn, qtype)
            return result{p, server, err}
        }
    }

If the single request option is not set, dns parsing is parallel (the real situation is a combination of parallel and serial).

if conf.singleRequest {
        // ...
    } else {
        queryFn = func(fqdn string, qtype dnsmessage.Type) {
            dnsWaitGroup.Add(1)
            go func(qtype dnsmessage.Type) {
                p, server, err := r.tryOneName(ctx, conf, fqdn, qtype)
                lane <- result{p, server, err}
                dnsWaitGroup.Done()
            }(qtype)
        }
        responseFn = func(fqdn string, qtype dnsmessage.Type) result {
            return <-lane
        }
    }

The parsing process is configuration related

dns resolution policies and times are strongly related to the configuration of ndots, search domain and nameserver:

  1. By default, dns queries will resolve both IPv4 and IPv6 addresses (whether the container supports IPv6 or not)

  2. Ndots and the domain name to be resolved decide whether to give priority to the use of search domain. To put it more generally, if the number of points in your domain name request parameters is smaller than the configured ndots, you will give priority to splicing search domain and resolving it. For example, the following configuration is available:

    search meipian-test.svc.cluster.local svc.cluster.local cluster.local
    options ndots:3

    If the domain name resolved now is www.baidu COM and ndots are configured with 3. The number of points in the domain name to be resolved (2) is smaller than that in ndots, so the search domain name will be spliced first for resolution. The resolution order is as follows:

    If ndots in the configuration file is equal to 2, the parsing order is as follows:

  3. serach domain and nameserver determine the maximum number of dns queries, that is, the number of queries is equal to the number of search prime domains + 1 times the number of DNSservers. For example, there are the following configurations:

    nameserver 169.254.20.10
    nameserver 172.16.0.10
    search meipian-test.svc.cluster.local svc.cluster.local cluster.local
    options ndots:3

    When we parse www.baidu.com COM domain name, the resolution order is as follows:

    Resolve domain name Query type dns server
    www.baidu.com.meipian-test.svc.cluster.local. A 169.254.20.10
    www.baidu.com.meipian-test.svc.cluster.local. A 172.16.0.10
    www.baidu.com.meipian-test.svc.cluster.local. AAAA 169.254.20.10
    www.baidu.com.meipian-test.svc.cluster.local. AAAA 172.16.0.10
    www.baidu.com.svc.cluster.local. A 169.254.20.10
    www.baidu.com.svc.cluster.local. A 172.16.0.10
    www.baidu.com.svc.cluster.local. AAAA 169.254.20.10
    www.baidu.com.svc.cluster.local. AAAA 172.16.0.10
    www.baidu.com.cluster.local. A 169.254.20.10
    www.baidu.com.cluster.local. A 172.16.0.10
    www.baidu.com.cluster.local. AAAA 169.254.20.10
    www.baidu.com.cluster.local. AAAA 172.16.0.10
    www.baidu.com. A 169.254.20.10
    www.baidu.com. A 172.16.0.10
    www.baidu.com. AAAA 169.254.20.10
    www.baidu.com. AAAA 172.16.0.10

    A total of 16 times, isn't it terrible? Of course, only in the worst case (such as when the domain name does not exist) will there be so many requests.

    ⚠️ How are serial and parallel requests combined?

    Parallel means that the same domain name goes to the same dns server to resolve different types in parallel, and different domain names are still serial.

    Put the request on the timeline as follows:

The above figure shows the worst case. In fact, it will be returned as long as the parsing is successful once in the process.

Default value of built-in parser parameters

ndots:    1,
timeout:  5 * time.Second, // dns parsing timeout is 5 seconds, which is a little too long
attempts: 2, // Parsing failed, retry twice
defaultNS   = []string{"127.0.0.1:53", "[::1]:53"} // Default dns server
search: os.Hostname // 

Note that timeout is recommended in resolv Add this parameter to conf and write a smaller value. Because the default dns resolution is udp request (unreliable), it will wait for 5s in case of packet loss.

Dns parsing strategy

As mentioned above, go uses a built-in parser, which is not true in all cases.

Two parsers

golang has two domain name resolution methods: built-in go parser and cgo based system parser.

// src/net/cgo_stub.go
//go:build !cgo || netgo
// +build !cgo netgo
func init() { netGo = true }

// src/net/conf_netcgo.go
//go:build netcgo
// +build netcgo
func init() { netCgo = true }

By default, built-in parsing is used. If you want to specify cgo parser, you can specify it when build ing.

export GODEBUG=netdns=go    # force pure Go resolver
export GODEBUG=netdns=cgo   # force cgo resolver

Built in parser parsing strategy

When goos=linux, hostlookupfiles dns is used, that is, hosts resolution takes precedence over dns resolution (go1.17.5).

const (
    // hostLookupCgo means defer to cgo.
    hostLookupCgo      hostLookupOrder = iota
    hostLookupFilesDNS                 // files first
    hostLookupDNSFiles                 // dns first
    hostLookupFiles                    // only files
    hostLookupDNS                      // only DNS
)

var lookupOrderName = map[hostLookupOrder]string{
    hostLookupCgo:      "cgo",
    hostLookupFilesDNS: "files,dns",
    hostLookupDNSFiles: "dns,files",
    hostLookupFiles:    "files",
    hostLookupDNS:      "dns",
}

Depending on the operating system, the parsing strategy used will be slightly different. For example, the android platform will force the use of cgo

// src/net/conf.go

fallbackOrder := hostLookupCgo
// ...
if c.forceCgoLookupHost || c.resolv.unknownOpt || c.goos == "android" {
        return fallbackOrder
    }

Disable IPv6 resolution

In go1 There was no way to disable ipv6 parsing before 17. After 1.17, go provides some ways

// By default, both IPv4 and IPv6 are resolved
qtypes := []dnsmessage.Type{dnsmessage.TypeA, dnsmessage.TypeAAAA}

// Depending on the network, you can only parse ipv4 or ipv6
switch ipVersion(network) {
case '4':
    qtypes = []dnsmessage.Type{dnsmessage.TypeA}
case '6':
    qtypes = []dnsmessage.Type{dnsmessage.TypeAAAA}
}

// ipVersion returns the provided network's IP version: '4', '6' or 0
// if network does not end in a '4' or '6' byte.
func ipVersion(network string) byte {
    if network == "" {
        return 0
    }
    n := network[len(network)-1]
    if n != '4' && n != '6' {
        n = 0
    }
    return n
}

Therefore, it is easy to disable IPv6 resolution. We only need to specify the network type when establishing a connection. Take http as an example, rewrite the DialContext method of Transport and force the original network (TCP by default) to be written as tcp4.

&http.Client{
        Transport: &http.Transport{
         // ....
            DialContext: func(ctx context.Context, network, addr string) (net.Conn, error) {
          // Force ipv4 parsing
          return zeroDialer.DialContext(ctx, "tcp4", addr)
            },
        }
    }

summary

  1. go uses the built-in dns parser by default, does not depend on the operating system, and has nothing to do with the basic image
  2. go's built-in parser reads / etc / resov Conf configuration, Standard configuration Both are implemented. Manually modifying the configuration takes effect after 5 seconds
  3. Go1. ipv6 parsing can be disabled after 17
  4. go built-in parser parsing process is a combination of parallel and serial by default
    • Different request types for the same domain name are parallel
    • Different domain names are serial

Optimization suggestions

  1. Modify ndots to the appropriate value

    How to configure dnsPolicy in k8s is clusterfirst, and the default ndots will be 5`

    • If the microservice previously requested to use the service name, it does not need to be modified (it can be successfully resolved after splicing the search domain name)
    • If the domain name is requested between microservices (or if it cannot be resolved after splicing the search domain name), it is necessary to set the ndots to an appropriate value to resolve the original domain name in front (splicing the search domain name in the back)
  2. Modify the timeout to the appropriate value

    go defaults to 5s, because udp requests are unreliable. Once packet loss occurs, the program will wait until the end of time

  3. Disable Ipv6 parsing and enable single request

    For the go built-in parser, single request and single request request open mean the same thing, which determines whether different parsing requests (A or AAAA) are concurrent or serial. Parallel is the default. If IPv6 is disabled, there is no need for concurrent resolution. It is recommended to start single request

Optimization effect

dns parsing has only valid A record query, and the world is suddenly quiet.

Keywords: Go

Added by supratwinturbo on Wed, 12 Jan 2022 14:15:42 +0200