Bayesian optimizer

One based on github Bayesian Optimization The usage of the open source project is recorded in detail in English in the description of the project. Here, it is mainly sorted and simplified, and reference is made to other literature to record the mathematical functions used in the project and the description of some words in the paper.

principle

This article article (reference 1) the core idea and process of the project are described in detail, including the introduction of a priori function and acquisition function used in the process. Both blog and github project mentioned the words "exploration" and "exploration". It can be interpreted as a trade-off between the exploration of uncertain strategies and the exploration of current strategies understand Explore is to explore in unknown areas, and exploit is to develop and utilize in existing areas; Exploration is to search in the global scope to avoid falling into the local optimum. Exploration is to search near the current optimum to find a better solution. Namely:

Exploration: simply put, it is to try to select the point far away from the known point as the reference point for the next iteration, that is, try to explore the unknown area, and the distribution of points will be as average as possible.
Exploitation: simply put, it is to select the point close to the known point as the reference point for the next iteration, that is, excavate the points around the known point as much as possible. There will be a dense area in the distribution of points, which is easy to enter the local maximum

def BayesianOptimization(target,x,Y):
    IF initialization?
        Yes:  
            xold The independent variable value that obtains the maximum value of the objective function for all known points
            IF target(xold)>Y?     
                Yes:   return xold,target(xold)    #It's lucky that the iteration is over before it starts
                No:    Pass
        No:  Random initialization
    While    target(xnew)<Y:
        utilize PF(Gaussian process regression)Solving unknown points(There are thousands of unknown points in the range of pre-defined independent variables)Mean and variance of
        utilize AC(EI or PI or UCB)Find the point where the Bayesian optimizer guesses the maximum value xnew,Generally AC Maximum point of function
    return   xnew,target(new)

x: Independent variable value range, Y: acceptable black box function dependent variable value. The two core processes are a priori function (PF) and acquisition function (AC). The acquisition function can also be called utility function. A priori function (Gaussian process regression) is to calculate the mean and variance of the function value at each point, and construct the acquisition function according to the mean and variance to determine which point you will sample at this iteration.

The specific Gaussian process regression equation and acquisition function can be referred to the second chapter here

The above-mentioned articles are cited to summarize the main process. For the details of the mathematical process of the equation, please refer to the original author directly.

The idea of the algorithm is to first generate an initial candidate solution set, then find the next point that may be the extreme value according to these points, add the point to the set, and repeat this step until the iteration ends. Finally, the extreme points are found out from these points as the solution of the problem.

The key problem here is how to determine the next search point according to the searched point. Bayesian Optimization estimates the mean and variance (i.e. fluctuation range) of the real objective function value according to the function value of the searched points, as shown in the figure. The red curve in the figure above is the estimated objective function value, that is, the mean value of the objective function value at each point. There are now three searched points, represented by black solid points. The area sandwiched by the two dotted lines is the variation range of the function value at each point, which fluctuates in the interval with the mean value, i.e. the red curve as the center and proportional to the standard deviation. At the search point, the red curve passes through the search point, and the variance is the smallest, and the variance is greater far away from the search point, which is also in line with our intuitive understanding, and the estimation of the function value far away from the sampling point is more unreliable.

According to the mean and variance, the acquisition function can be constructed, that is, the estimation of the possibility that each point is the extreme point of the function, which reflects the search degree of each point. The extreme point of the function is the next search point, as shown in the figure below. The point represented by the rectangular box in the figure below is the maximum point of the acquisition function and the next search point.

The core of the algorithm consists of two parts: modeling the objective function, that is, calculating the mean and variance of the function value at each point, which is usually realized by Gaussian process regression; The acquisition function is constructed to determine which point to sample at this iteration.

 

After understanding the summary here, it is better to look at the picture of step 1-9 in the reference article again.

Mode of use

Elementary introduction

The next step is to copy the tutorial on github.

To use Bayesian optimization, only three parts are needed: optimization parameters, objective equation and optimizer

Objective equation: the example gives a binary function, which can also be an item of machine learning training.

from bayes_opt import BayesianOptimization

def black_box_function(x, y):

    return -x ** 2 - (y - 1) ** 2 + 1

Optimization parameters: machine learning generally refers to super parameters. This is X, Y. And set the interval of parameters.

The settings of the optimizer are initialized directly with Bayesian optimization

pbounds = {'x': (2, 4), 'y': (-3, 3)}
optimizer = BayesianOptimization(
    f=black_box_function,
    pbounds=pbounds,
    verbose=2, # verbose = 1 prints only when a maximum is observed, verbose = 0 is silent
    random_state=1,
)

f=black_box_function here is relatively simple and refers to a function. If it is machine learning, the function defined here can be a complete project. Call this function to start the project, and finally return the corresponding results.

Next, we can carry out iterative optimization

optimizer.maximize(
    init_points=2,
    n_iter=3,
)

This involves two parameters, init_point and n_iter

n_iter: total number of iterations.

init_point: This is the number of random points executed. As mentioned in the basic theory above, Bayesian optimization is based on the previous optimization. This is the expansion section. But to explore and expand the possibility of global optimization, use init_ Instead of relying on the last control point, use the last control point to calculate the result directly.

Finally, print out the optimal result after the whole iteration (the last few times are not necessarily the optimal solution)

print(optimizer.max)

Want to print out the results of each iteration

for i, res in enumerate(optimizer.res):
    print("Iteration {}: \n\t{}".format(i, res))

You can reset the boundary of parameters with the following command

optimizer.set_bounds(new_bounds={"x": (-2, 3)})
optimizer.maximize(
    init_points=0,
    n_iter=5,
)

Another case is to guide optimization. We think there may be a maximum value within a certain value of the parameter. You can use this command to set:

optimizer.probe(
    params={"x": 0.5, "y": 0.7},
    lazy=True,
)

lazy indicates that this delays writing the value. At this time, the set parameter value will be written to the optimizer and calculated the next time the max function is called.

Can write continuously:

optimizer.probe(
    params=[-0.3, 0.1],
    lazy=True,
)

Call maximize

optimizer.maximize(init_points=0, n_iter=0)

result:

|   iter    |  target   |     x     |     y     |
-------------------------------------------------
|  11       |  0.66     |  0.5      |  0.7      |
|  12       |  0.1      | -0.3      |  0.1      |
=================================================

Preservation and recovery of operation process data

Save: an optimizer class has been used before. You can also see the running state by setting the initialization parameter verbose of this class. If you want to save to a local file, you can use the following process:

from bayes_opt.logger import JSONLogger
from bayes_opt.event import Events

logger = JSONLogger(path="./logs.json")
optimizer.subscribe(Events.OPTIMIZATION_STEP, logger)
optimizer.maximize(
    init_points=2,
    n_iter=3,
)

It should be noted that only current and future data will be recorded.

Loading: on the one hand, it can load the optimized process data, on the other hand, it can directly act on another new optimizer.

from bayes_opt.util import load_logs
new_optimizer = BayesianOptimization(
    f=black_box_function,
    pbounds={"x": (-2, 2), "y": (-2, 2)},
    verbose=2,
    random_state=7,
)
print(len(new_optimizer.space))

load_logs(new_optimizer, logs=["./logs.json"]);
new_optimizer.maximize(
    init_points=0,
    n_iter=10,
)

Advanced

In fact, the above introductory tutorial has been applicable to most scenes. The advanced part is mainly to refine the specific process of each step performed by the optimizer. For example, optimizer Maximize actually implements the suggest, probe, and register methods.

Based on the theoretical basis of references 1 and 2 above, we know that an iteration of the optimizer roughly goes through the following steps: defining the target, initializing the variable (updated variable value), collecting the function, determining the update position of the variable, and calculating the target value. In the introductory tutorial, you can directly use maximize a method to include all of these. In fact, it can be divided into the following steps:

from bayes_opt import UtilityFunction
# Define objective function
def black_box_function(x, y):
    return -x ** 2 - (y - 1) ** 2 + 1

optimizer = BayesianOptimization(
    f=None,  # First defined as null
    pbounds={'x': (-2, 2), 'y': (-3, 3)},#Custom variable range
    verbose=2,
    random_state=1,
)
#The selection of acquisition function and the determination of equilibrium parameters of exploration and exploitation mentioned above
utility = UtilityFunction(kind="ucb", kappa=2.5, xi=0.0)
# Determine the next location of the variable
next_point_to_probe = optimizer.suggest(utility)
print("Next point to probe is:", next_point_to_probe)
# Calculate the objective function value
target = black_box_function(**next_point_to_probe)
print("Found the target value to be:", target)
# Write optimizer
optimizer.register(
    params=next_point_to_probe,
    target=target,
)

After the above procedure is written, it will be executed only once. The method of loop execution is:

for _ in range(5):
    next_point = optimizer.suggest(utility)
    target = black_box_function(**next_point)
    optimizer.register(params=next_point, target=target)
    print(target, next_point)
print(optimizer.max)

Keywords: Algorithm

Added by Kiwii on Wed, 09 Feb 2022 23:49:45 +0200