Contents of this chapter:
- How to use Bayes_opt to achieve parameter optimization, and cases?
- How to use HyperOpt to realize parameter optimization, and cases?
- How to use Optuna to realize parameter optimization, and cases?
HPO Library | Pros and cons evaluation | Recommendation Index |
---|---|---|
bayes_opt | ✅ Bayesian Optimization Based on Gaussian process ✅ When the parameter space is composed of a large number of continuous parameters ⛔ Avoid when containing a large number of discrete parameters ⛔ Avoid when computing power / time is scarce | ⭐⭐ |
hyperopt | ✅ Bayesian Optimization Based on TPE ✅ Support various efficiency improvement tools ✅ The progress bar is clear and beautiful, and there are fewer strange warnings or errors ✅ It can be extended / extended to the field of in-depth learning ⛔ Bayesian Optimization Based on Gaussian process is not supported ⛔ The code is limited, complex and less flexible | ⭐⭐⭐⭐ |
optuna | ✅ (it may need to be combined with other libraries) realize Bayesian Optimization Based on various algorithms ✅ The code is the most concise and flexible ✅ It can be extended / extended to the field of in-depth learning ⛔ Poor maintenance of non critical functions, strange warnings and errors | ⭐⭐⭐⭐ |
📖 The above three libraries do not support parallel or acceleration based on Python environment. Most optimization algorithm libraries can only support parallel or acceleration based on database (such as MangoDB, mySQL), but the above libraries can be deployed on distributed computing platforms.
Some points to be learned about the implementation of Bayesian parameter optimization:
- Bayesian optimization needs to customize the objective function, parameter space and optimizer, and usually does not directly adjust the database;
- Under different Bayesian methods, the rules for defining objective function, parameter space and optimizer are different, and each has its own rules
Based on the above two points, when introducing three HPO libraries, the main contents are ① introducing custom rules and ② the whole process of the case;
In addition, in the following cases, the weak evaluator adopts random forest.
1 based on Bayes_opt for GP optimization
📖 Using Bayes_opt is usually the following:
- If and only if Bayesian Optimization Based on Gaussian process must be realized;
- There are a large number of continuous parameters in the parameter space of the algorithm;
Because bayes_opt's processing method of parameter space is primitive, lack of corresponding promotion / monitoring supply and marketing, and has high requirements for computing power, so it is often not the first choice for parameter adjustment.
📖 bayes_opt features
- Running time (the smaller the better): Bayes_ Opt < random grid search < grid search
- Model effect (the bigger the better): Bayes_ Opt > Random grid search > grid search
- The optimization process cannot be reproduced, but the optimization results can be reproduced
- Inefficient.
In fact, Bayesian optimization has found the minimum loss when iterating to 170 times, but since there is no early stop mechanism, the model continues to iterate for 130 times before stopping. If Bayes_ The Bayesian optimization mechanism may take less time to stop the actual iteration.
At the same time, due to Bayes_ Opt can only extract floating-point numbers in parameter space, Bayes_ The search efficiency of opt in random forest is low. Even if [88.89, 88.23...] equivalents are obtained in 10 different iterations, only one alternative value 88 can be obtained after rounding, but Bayes_ Opt cannot distinguish this difference, so it may take out many invalid observation points. If other Bayesian optimizers are used, Bayesian optimization will be more efficient. - It supports flexible modification of various parameters in the acquisition function and Gaussian process. For details, please refer to: https://github.com/fmfn/BayesianOptimization/blob/master/examples/advanced-tour.ipynb
one point one 📖 bayes_opt rules for objective functions
-
The input of the objective function must be a specific hyperparameter, not the whole hyperparameter space, let alone elements other than data, algorithms and other hyperparameters. Therefore, when defining the objective function, we need to take the super parameter as the input of the objective function.
Example: in parentheses must be the super parameter of the weak evaluatordef bayesopt_objective(n_estimators,max_depth):
-
The input value of a super parameter can only be a floating-point number. Integers and strings are not supported. Therefore, when the actual parameter of the algorithm needs to input a string, Bayes cannot be used for this parameter_ Opt is adjusted. When the actual parameters of the algorithm need to input integers, the type of parameters needs to be specified in the objective function.
Example: the super parameter in parentheses can only be a floating point number: for example, the input content of the parameter criterion of random forest is' gini ', which is a string, so the parameter criterion cannot be put in it.def bayesopt_objective(n_estimators,max_depth):
Example: the integer parameter needs to be changed to floating-point format when setting, as shown in the red part: the number of trees can only be integers, so you need to use int() to change to floating-point number when setting.
def bayesopt_objective(n_estimators): model=RFR(n_estimators=int(n_estimators)
-
bayes_opt only supports finding the maximum value of 𝑓 (𝑥), not the minimum value. Therefore, when the defined objective function is a loss, the output of the objective function needs to be negative (that is, if RMSE is used, the objective function should output negative RMSE, so that the real RMSE can be minimized only after maximizing the negative RMSE.) When the defined objective function is accuracy or auc, the output of the objective function can be kept as it is.
one point two 📖 bayes_opt rules for parameter space
- The parameter space must be defined by dictionary, where the key is the parameter name and the value is the value range of the parameter;
- Only the upper and lower bounds of the parameter space can be filled in, and parameters such as step size can not be filled in, and it is a two-way closed interval;
- All parameters will be treated as continuous hyperparameters, so bayes_opt will directly take any floating-point number in the closed interval as an alternative parameter (which is why int() is used when setting the objective function).
Because of the above rules, enter Bayes_ The parameter space of OPT is naturally larger / denser than other Bayesian Optimization libraries, so it requires more iterations.
one point three 📖 bayes_ Notes on randomness of OPT
- Randomness cannot be controlled. Even if the random number seed is filled in, the optimization algorithm will be different every time, that is, the optimization algorithm cannot be reproduced.
- The results of the optimal hyperparameters can be reproduced.
Take out the best parameter combination and the best score. After the best parameter combination is input into the cross validation, the best score can be reproduced.
If the best score cannot be reproduced, there is a problem with the random number seed setting in the cross validation process, or there is a problem with the iterative process of the optimization algorithm.
one point four 🗣 Case: bayes_opt parameter optimization_ House price data set_ python
# pip install bayesian-optimization from bayes_opt import BayesianOptimization from sklearn.ensemble import RandomForestRegressor as RFR from sklearn.model_selection import KFold,cross_validate
- 🗣 Customize the random forest model and cross validation model to return the root mean square error of the test set
# Customize the random forest model and cross validation model to return the root mean square error of the test set def bayesopt_objective(n_estimators,max_depth,max_features,min_impurity_decrease): model=RFR(n_estimators=int(n_estimators) ,max_depth=int(max_depth) ,max_features=int(max_features) ,min_impurity_decrease=min_impurity_decrease ,random_state=7 ,n_jobs=-1 ) cv=KFold(n_splits=5,shuffle=True,random_state=7) validation_loss=cross_validate(model ,X,y ,scoring='neg_root_mean_squared_error' ,cv=cv ,n_jobs=-1 ,error_score='raise' # When an error occurs, an error will be reported, but the iteration will not be stopped. If it is set to nan, the iteration will be stopped ) return np.mean(validation_loss['test_score'])
- 🗣 Custom optimizer
# Custom optimizer def param_bayes_opt(init_points,n_iter): opt=BayesianOptimization(bayesopt_objective ,param_grid_simple ,random_state=7) # update statistics opt.maximize(init_points=init_points # How many initial observations are extracted ,n_iter=n_iter # Total observations / iterations ) # Return optimization results params_best=opt.max['params'] # Return best parameters score_best=opt.max['target'] # Return best score # Print results print("\n","best params: ", params_best, "\n","best cvscore: ", score_best) return params_best,score_best
- 🗣 User defined optimal parameter verification
# Custom verification function, return Bayes_ RMSE of OPT optimal parameters def bayes_opt_validation(params_best): model=RFR(n_estimators=int(params_best['n_estimators']) ,max_depth=int(params_best['max_depth']) ,max_features=int(params_best['max_features']) ,min_impurity_decrease=int(params_best['min_impurity_decrease']) ,random_state=7 ,n_jobs=-1 ) cv=KFold(n_splits=5,shuffle=True,random_state=7) validation_loss=cross_validate(model ,X,y ,scoring='neg_root_mean_squared_error' ,cv=cv ,n_jobs=-1 ) return np.mean(validation_loss['test_score'])
- 🗣 function
data=pd.read_csv(r'C:\Users\EDZ\test\ML-2 courseware\Lesson 9.Stochastic forest model\datasets\House Price\train_encode.csv',index_col=0) X=data.iloc[:,:-1] y=data.iloc[:,-1] param_grid_simple={'n_estimators':(80,100) ,'max_depth':(15,25) ,'max_features':(10,20) ,'min_impurity_decrease':(20,24) } params_best,score_best=param_bayes_opt(20,280) params_best # Print optimal parameter combination score_best # Print optimal parameter score validation_score=bayes_opt_validation(params_best) # Parameter combination verification validation_score # day
2. TPE optimization based on HyperOpt
📖 HyperOpt features
- The most general optimizer;
- Running time (the smaller the better): hyperopt < Bayes_ Opt < random grid search < grid search
- Model effect (bigger is better): hyperopt > Bayes_ Opt > Random grid search > grid search;
- The code requires high precision and poor flexibility. A slight change may make the code crazy and difficult to run through.
- Compared with Bayesian Optimization Based on Gaussian process, TPE based on Gaussian mixture model obtains better results with higher efficiency in most cases;
- HyperOpt does not support enough optimization algorithms. If you focus on the TPE method, you can master HyperOpt and get deeper into the Optuna library.
two point one 📖 HyperOpt rules for objective functions
- The parameter space input of the objective function must be a dictionary conforming to the regulations of hyperopt
- Hyperopt only supports finding the minimum value of 𝑓 (𝑥), not the maximum value
two point two 📖 HyperOpt rules for parameter space
📖 HyperOpt defines the parameter space in the following dictionary forms
- hp.quniform("parameter name", lower bound, upper bound, step size) - applies to evenly distributed floating-point numbers
- hp.uniform("parameter name", lower bound, upper bound) - applies to randomly distributed floating-point numbers
- hp.randint("parameter name", upper bound) - an integer applicable to [0, upper bound). The interval is closed before open
- hp.choice("parameter name", ["string 1", "string 2",...]) - applies to string types, and the optimal parameter is represented by an index
- hp.choice("parameter name", [* range (lower bound, upper bound, step)]) - applicable to integer type, and the optimal parameter is represented by index
- hp.choice("parameter name", [integer 1, integer 2, integer 3,...]) - applicable to integer type, and the optimal parameter is represented by index
- hp.choice("parameter name", ["string 1", integer 1,...]) - applicable to the mixing of characters and integers, and the optimal parameter is represented by the index
Unless otherwise specified, the definition methods of parameter space in hp should be front closed and back open interval.
HyperOpt defines the parameter space and selects the dictionary form:
- For the parameter value that needs to be taken as an integer, the parameter space is constructed by quniform.
- quniform can obtain evenly distributed floating-point numbers to replace integers;
- You need to use the int function in the objective function to qualify the input type. For example, when taking values in the range [0,5], you can take out the uniform floating-point number [0.0, 1.0, 2.0, 3.0,...]. When inputting the objective function, you must ensure that there is an int function before the parameter value. If using HP Choice will not have this problem.
- hp.choice will eventually return the index of the optimal parameter, which is easy to be confused with the specific value of numerical parameters;
- hp.randint only supports counting from 0.
two point three 📖 Introduction to HyperOpt optimizer
📖 Functions / libraries involved when HyperOpt optimizes the objective function
- fmin: basic function for optimization
- In fmin, we can customize the proxy model (parameter algo). Generally speaking, we have TPE Suggest and Rand Suggest two options, the former refers to TPE method and the latter refers to random grid search method.
- partial: modify the specific parameters involved in the algorithm
- It includes how many initial observations are used in the model (parameter n_start_jobs) and how many samples are considered when calculating the collection function value (parameter n_EI_candidates).
- trials: record the whole iteration process
- Generally, enter the method Trials() imported from the hyperopt library
- After the optimization is completed, you can view various intermediate information such as loss and parameters from the saved trials;
- early_stop_fn: early stop
- Generally, enter the method of importing from the hyperopt library_ progress_ loss()
- In this method, the specific number n can be input, which means that when the loss does not decrease for N consecutive times, the algorithm will stop in advance.
- Due to the high randomness of Bayesian method, when the sample size is insufficient, it needs many iterations to find the optimal solution, so it is generally No_ progress_ The value in loss () will not be set too high. In our course, due to the small amount of data, I set a higher value to avoid stopping the iteration too early.
two point four 🗣 Case: HyperOpt parameter optimization_ House price data set_ python
# pip install hyperopt # pip install optuna import optuna print(optuna.__version__) import hyperopt print(hyperopt.__version__) from sklearn.ensemble import RandomForestRegressor as RFR from sklearn.model_selection import KFold,cross_validate from bayes_opt import BayesianOptimization from hyperopt import hp,fmin,tpe,Trials,partial from hyperopt.early_stop import no_progress_loss
- 🗣 Set parameter space
# Set parameter space param_grid_simple={'n_estimators':hp.quniform('n_estimators',80,100,1) ,'max_depth':hp.quniform('max_depth',10,25,1) ,'max_features':hp.quniform('max_features',10,20,1) ,'min_impurity_decrease':hp.quniform('min_impurity_decrease',20,25,1) } # Calculate the size of the parameter space len([*range(80,100,1)])*len([*range(10,25,1)])*\ len([*range(10,20,1)])*len([range(20,25,1)])
- 🗣 Set objective function
# Set objective function_ Random forest selection based on evaluator def hyperopt_objective(params): model=RFR(n_estimators=int(params['n_estimators']) ,max_depth=int(params['max_depth']) ,max_features=int(params['max_features']) ,min_impurity_decrease=params['min_impurity_decrease'] ,random_state=7 ,n_jobs=4) cv=KFold(n_splits=5,shuffle=True,random_state=7) validate_loss=cross_validate(model,X,y ,cv=cv ,scoring='neg_root_mean_squared_error' ,n_jobs=-1 ,error_score='raise') return np.mean(abs(validate_loss['test_score']))
- 🗣 Set optimization process
# Set optimization process def param_hyperopt(max_evals=100): # Record the iteration process trials=Trials() # Early stop early_stop_fn=no_progress_loss(100) # When the continuous iteration of the loss function does not drop for 100 times, it stops; Normal 10-50 # Define proxy model # algo=partial(tpe.suggest # Algorithm for setting agent model # ,n_sratup_jobs=20 # Set initial sample size # ,n_EI_candidates=50) # Sets how many sample points are used to calculate the collection function params_best=fmin(hyperopt_objective # Set objective function ,space=param_grid_simple # Set parameter space ,algo=tpe.suggest # Set the proxy model. If you need to customize the proxy model, use the previous code algo = ,max_evals=max_evals # Set the number of iterations ,trials=trials ,early_stop_fn=early_stop_fn # Control early stop ) print('best parmas:',params_best) return params_best,trials
- 🗣 Set validation function
# Set the verification function (consistent with the code for setting the objective function) def hyperopt_validation(params): model=RFR(n_estimators=int(params['n_estimators']) ,max_depth=int(params['max_depth']) ,max_features=int(params['max_features']) ,min_impurity_decrease=params['min_impurity_decrease'] ,random_state=7 ,n_jobs=4) cv=KFold(n_splits=5,shuffle=True,random_state=7) validate_loss=cross_validate(model,X,y ,cv=cv ,scoring='neg_root_mean_squared_error' ,n_jobs=-1 ) return np.mean(abs(validate_loss['test_score']))
- 🗣 Implement the actual optimization process
# Implement the actual optimization process # 1. The optimization process when calculating 1% space returns the optimal parameter combination and iterative process params_best,trials=param_hyperopt(30) #When calculating the optimal combination of parameters in 3% space, the optimization process returns the best combination of parameters params_best, trials = param_hyperopt(100) #3. Calculate the optimization process in 10% space and return to the optimal parameter combination and iterative process params_best, trials = param_hyperopt(300) # Verify the model according to the best parameter combination and return RMSE hyperopt_validation(params_best) #Print all search related records trials.trials[0] #Print objective function values for all searches trials.losses()[:10]
3. Implement multiple Bayesian Optimization Based on Optuna
📖 Optuna features
- The advantage of Optuna is that it can be seamlessly connected to PyTorch, Tensorflow and other deep learning frameworks, and can also be used in combination with sklearn's optimization library scikit optimize. Therefore, Optuna can be used in a variety of optimization scenarios.
- Bayesian Optimization Based on Gaussian process runs more slowly than Bayesian Optimization Based on TPE.
- Early stop is not supported;
- Optuna may have a sampling BUG, that is, continuously draw the parameter combination that has been drawn and display a warning. At this time, the iteration may be invalid. Consider increasing the range or density of the parameter space to eliminate this problem.
three point one 📖 Optuna's rules for objective function and parameter space
- Instead of inputting parameters or parameter space into the objective function, you need to define the parameter space directly in the objective function
- The Optuna optimizer will generate a variable trial that refers to alternative parameters. The variable cannot be obtained or opened by the user, but the variable survives in the optimizer and is input into the objective function. In the objective function, we can construct the parameter space through the method carried by the variable trail.
- Both output f ( x ) f(x) The maximum value of f(x) or the minimum value can be output
three point two 📖 Introduction to Optuna optimizer
- Adjust the parameter algo to define the specific algorithm used to perform Bayesian optimization;
- Set the sample sampling algorithm as TPE, which is faster than GP (Gaussian) iteration.
3.3 🗣 case: Optuna parameter optimization_ House price data set_ python
# Prepare data and database # pip install optuna # pip install scikit-optimize import optuna optuna.__version__ data=pd.read_csv(r'E:\jupyter_notebook\Courseware of machine learning phase II\Lesson 9.Stochastic forest model\datasets\House Price\train_encode.csv',index_col=0) X=data.iloc[:,:-1] y=data.iloc[:,-1] X.head() from sklearn.ensemble import RandomForestRegressor as RFR from sklearn.model_selection import KFold,cross_validate
# Define objective function def optuna_objective(trial): n_estimators=trial.suggest_int('n_estimators',80,100,1) # Integer type: suggest_int('parameter name ', lower bound, upper bound, step size) max_depth=trial.suggest_int('max_depth',10,25,1) max_features=trial.suggest_int('max_features',10,20,1) # max_features=trial.suggest_categorical('max_features',['log2','sqrt','auto']) # character min_impurity_decrease=trial.suggest_int('min_impurity_decrease',20,25,1) # min_impurity_decrease=trial.suggest_float('min_impurity_decrease',20,25,log=False) # float model=RFR(n_estimators=n_estimators ,max_depth=max_depth ,max_features=max_features ,min_impurity_decrease=min_impurity_decrease ,random_state=7 ,n_jobs=12 ) cv=KFold(n_splits=5,shuffle=True,random_state=7) validate_loss=cross_validate(model,X,y ,cv=cv ,scoring='neg_root_mean_squared_error' ,n_jobs=12 ,error_score='raise' ) return np.mean(abs(validate_loss['test_score']))
# Define optimization process def optimizer_optuna(n_trials,algo): if algo=='TPE': algo=optuna.samplers.TPESampler(n_strarup_trials=10,n_ei_candidates=24) elif algo=='GP': from optuna.integration import SkoptSampler import skopt algo=SkoptSampler(skopt_kwargs={'base_estimator':'GP' ,'n_initial_points':10 ,'acq_func':'EI' } ) study=optuna.create_study(sampler=algo # Define the algorithm of sample sampling ,direction='minimize' # Defines whether the optimization direction of the objective function is the maximum value or the minimum value ) study.optimize(optuna_objective # objective function ,n_trials=n_trials # Set the maximum number of iterations (including initial observations) ,show_progress_bar=True # Do you want to show the progress bar ) print('best parmas:',study.best_trial.params, '\n','best score:',study.best_trial.values) return study.best_trial.params,study.best_trial.values
# Execution process import warnings warnings.filterwarnings('ignore',message='The objective has been evaluated at this point before.') best_params,best_score=optimizer_optuna(10,'GP') # Code testing with small iterations optuna.logging.set_verbosity(optuna.logging.ERROR) # Close the print iteration process best_params,best_score=optimizer_optuna(300,'GP')