Source code analysis of Golang math/rand & pit avoidance Guide

Article catalog

1. Preface

Go version is go 1.17.

go version go1.17 darwin/amd64

This paper takes type rand struct as the starting point to see the implementation principle of Go pseudo-random number.

// A Rand is a source of random numbers.
type Rand struct {
	src Source
	s64 Source64 // non-nil if src is source64

	// readVal contains remainder of 63-bit integer used for bytes
	// generation during most recent Read call.
	// It is saved so next Read call can start where the previous
	// one finished.
	readVal int64
	// readPos indicates the number of low-order bytes of readVal
	// that are still valid.
	readPos int8
}

2. Analysis

If the seed of pseudo-random number is the same every time, the generated random sequence is also the same. Let's create a random number generator by giving different seeds.

r := rand.New(rand.NewSource(time.Now().UnixNano()))

r.Intn(62)	// Generate an integer in [0, n)

Let's take rand.Intn() as an example to see the implementation of random numbers.

// Intn returns, as an int, a non-negative pseudo-random number in the half-open interval [0,n).
// It panics if n <= 0.
func (r *Rand) Intn(n int) int {
	if n <= 0 {
		panic("invalid argument to Intn")
	}
	if n <= 1<<31-1 {
		return int(r.Int31n(int32(n)))
	}
	return int(r.Int63n(int64(n)))
}

I passed in less than 62, so I called r.Int31n().

// Int31n returns, as an int32, a non-negative pseudo-random number in the half-open interval [0,n).
// It panics if n <= 0.
func (r *Rand) Int31n(n int32) int32 {
	if n <= 0 {
		panic("invalid argument to Int31n")
	}
	if n&(n-1) == 0 { // n is power of two, can mask
		return r.Int31() & (n - 1)
	}
	max := int32((1 << 31) - 1 - (1<<31)%uint32(n))
	v := r.Int31()
	for v > max {
		v = r.Int31()
	}
	return v % n
}

Because our n=62 is not a power of 2, we follow the following logic. The max operation needs to understand its function.

max := int32((1 << 31) - 1 - (1<<31)%uint32(n))

max is to remove the part in the int32 range [0, (1 < < 31) - 1] where the last module cannot cover [0, n), so as to ensure that the probability of occurrence of each integer in [0, n) is the same. See the following specific values.

var n int32 = 62
tail := (1<<31) % uint32(n)
max := int32((1 << 31) - 1 - (1<<31)%uint32(n))
fmt.Println(tail)		// 2
fmt.Println(max)		// 2147483645
fmt.Println(max % n)	// 61

Let's look at the function r.Int31() that really produces random numbers.

// Int31 returns a non-negative pseudo-random 31-bit integer as an int32.
func (r *Rand) Int31() int32 { return int32(r.Int63() >> 32) }

It also calls r.Int63(), taking the upper 31 bits as the random value of int32.

// Int31 returns a non-negative pseudo-random 31-bit integer as an int32.
func (r *Rand) Int31() int32 { return int32(r.Int63() >> 32) }

It also calls r.src.Int63(). Let's first look at the definition of type Source interface.

// A Source represents a source of uniformly-distributed
// pseudo-random int64 values in the range [0, 1<<63).
type Source interface {
	Int63() int64
	Seed(seed int64)
}

When we initialize Rand, we create it through rand.New(rand.NewSource(seed)). Let's take a look at the implementation of rand.New().

// New returns a new Rand that uses random values from src
// to generate other random values.
func New(src Source) *Rand {
	s64, _ := src.(Source64)
	return &Rand{src: src, s64: s64}
}

It can be seen that Rand uses the Source passed in by rand.NewSource(). Take a look at the implementation of rand.NewSource().

// NewSource returns a new pseudo-random Source seeded with the given value.
// Unlike the default Source used by top-level functions, this source is not
// safe for concurrent use by multiple goroutines.
func NewSource(seed int64) Source {
	var rng rngSource
	rng.Seed(seed)
	return &rng
}

It can be seen that the actual type of Source is rngSource, which implements the interface Source. Its definition is as follows:

type rngSource struct {
	tap  int           // index into vec
	feed int           // index into vec
	vec  [rngLen]int64 // current feedback register
}

Let's look at the implementation of rngSource.Int63().

// Int63 returns a non-negative pseudo-random 63-bit integer as an int64.
func (rng *rngSource) Int63() int64 {
	return int64(rng.Uint64() & rngMask)
}

The definition of rngMask is as follows, which represents the maximum value of Int64 and serves as a mask.

rngMax   = 1 << 63
rngMask  = rngMax - 1

So far, we have found two core functions for generating random numbers. One is the function rngSource.Seed() that initializes the array rngSource.vec according to the seed, and the other is rngSource.Uint64() that takes random numbers from the array rngSource.vec.

3. Core function

Let's look at the real generating function of random numbers rngSource.Uint64().

// Uint64 returns a non-negative pseudo-random 64-bit integer as an uint64.
func (rng *rngSource) Uint64() uint64 {
	rng.tap--
	if rng.tap < 0 {
		rng.tap += rngLen
	}

	rng.feed--
	if rng.feed < 0 {
		rng.feed += rngLen
	}

	x := rng.vec[rng.feed] + rng.vec[rng.tap]
	rng.vec[rng.feed] = x
	return uint64(x)
}

In fact, we are calling other functions such as Intn(), Int31n(), Int63(), Int63n(), and finally the function rngSource.Uint64() is called It can be seen that each call is to use rng.feed and rng.tap to get the result of adding two values from rng.vec and return it. At the same time, the result is put back into rng.vec. The purpose of this is obvious. It is to enrich the random number, rather than limited to the values in the rng.vec array.

In addition, the initialization of rng.tap, rng.feed and rng.vec is completed in the function rngSource.Seed().

// Seed uses the provided seed value to initialize the generator to a deterministic state.
func (rng *rngSource) Seed(seed int64) {
	rng.tap = 0
	rng.feed = rngLen - rngTap

	seed = seed % int32max
	if seed < 0 {
		seed += int32max
	}
	if seed == 0 {
		seed = 89482311
	}

	x := int32(seed)
	for i := -20; i < rngLen; i++ {
		x = seedrand(x)
		if i >= 0 {
			var u int64
			u = int64(x) << 40
			x = seedrand(x)
			u ^= int64(x) << 20
			x = seedrand(x)
			u ^= int64(x)
			u ^= rngCooked[i]
			rng.vec[i] = u
		}
	}
}

Related constants are defined as follows.

const (
	rngLen   = 607
	rngTap   = 273
	rngMax   = 1 << 63
	rngMask  = rngMax - 1
	int32max = (1 << 31) - 1
)

The function seedrand() also needs attention, which is to complete the transformation of the seed. This also leads to the same seed. The final value set in rng.vec is the same, and the same value obtained through rngSource.Uint64().

// seed rng x[n+1] = 48271 * x[n] mod (2**31 - 1)
func seedrand(x int32) int32 {
	const (
		A = 48271
		Q = 44488
		R = 3399
	)

	hi := x / Q
	lo := x % Q
	x = A*lo - R*hi
	if x < 0 {
		x += int32max
	}
	return x
}

So far, we understand the generation process of random numbers of math/rand.

4. Pits to be avoided

Through the above analysis of math/rand, we should know the pits to avoid when using.

(1) For the same seed, the result of each run is the same. Because the random number is taken from the rng.vec array, which is generated according to the seed, the rng.vec array generated by the same seed is the same.

(2) The results of each run may be the same for different seeds, because there will be a modulo operation when generating rng.vec array according to seeds, and the results after modulo may be the same, resulting in the same rng.vec array.

(3) Rand initialized by rand.New is not concurrency safe. Because every time rng.feed is used, rng.tap takes a random value from rng.vec, it will put the random value back into rng.vec. If you want concurrency safety, you can use the global random number generator rand.globalRand.

(4) For different seeds, the collision probability of random sequence is higher than the product of single collision probability Birthday question.

For example, I want to randomly obtain an invitation code with a length of 6 from the alphanumeric set (62 characters), and the seed uses the user ID. if 100W invitation codes are generated, assuming that none of the first 100W are repeated, the probability of the next repetition is ((1/62)^6 * 100W) ≈ 1/5.6W, and the conflict rate has reached one in ten thousandth, which is much greater than the imagined (1 / 62) ^ 6.

reference

CSDN. Fully master Go math/rand CSDN. Record pits encountered using Golang math/rand random numbers

Added by madcat on Wed, 08 Dec 2021 10:28:58 +0200