LightGBM+OPTUNA super parameter automatic tuning tutorial (with code framework)

Hello, I'm brother Shuai. The original series is continuously updated. Welcome to wechat search "Python data science" to read the machine learning series.

Recently, a parameter adjusting artifact on kaggle is very popular and appears frequently in the top scheme. It is OPTUNA. I know that many small partners are worried about the long time of parameter adjustment. This time, combined with some of their own experience, I will bring you a tutorial on the use of LGBM model + OPTUNA parameter adjustment, which is a very practical and easy to divide artifact combination, and can also be used in practical work.

about LightGBM Not much to say. I've shared many articles before. It's in XGBoost Based on the optimized version of efficiency improvement, released by Microsoft, the operation efficiency is very high and the accuracy is not reduced. At present, it is recognized as a good and widely used machine learning model, and classification regression can be satisfied.

As for parameter tuning, that is, the super parameter tuning of the model, you may think of GridSearch. Indeed, at the beginning, I also used GridSearch. Although the violence aesthetics is good, its disadvantages are obvious, the operation is too time-consuming and the time cost is too high. In contrast, the parameter adjustment tool based on Bayesian framework is much more comfortable. There are many such open source tools, such as HyperOPT. Of course, today's protagonist is not it, but another more fragrant OPTUNA, lightweight and more powerful, and the speed is fast to take off!

Because it is necessary to use LGBM to explain with examples, let's start with several main super parameters of LGBM, and then set Optuna to adjust parameters according to these super parameters.

LightGBM Parameter overview

Generally, the hyperparameters of tree based models can be divided into four categories:

Parameters affecting decision tree structure and learning
Parameters affecting training speed
Parameters to improve accuracy
Parameters to prevent overfitting

Most of the time, these categories have a lot of overlap. Improving the efficiency of one category may reduce the efficiency of another. It will be more painful if you completely rely on manual parameter adjustment. Therefore, in the early stage, we can use some automatic parameter adjustment tools to give a rough result, and the core of the automatic parameter adjustment tool is how to give a suitable parameter range. If an appropriate parameter grid can be given, Optuna can automatically find the most balanced parameter combination among these categories.

The following describes the four types of super parameters of LGBM.

1. Hyperparameters of control tree structure

max_depth and num_leaves

In LGBM, the first parameter to be adjusted for the control tree structure is max_depth and num_leaves (number of leaf nodes). These two parameters are the most direct control over the tree structure, because LGBM is leaf wise. If the tree depth is not controlled, it will be very easy to over fit. max_ The depth general setting can be set to 3 to 8.

There is also a certain relationship between the two parameters. Because it is a binary tree, num_ The maximum value of leaves should be 2^(max_depth). So, Max is determined_ Depth means that num is determined_ The value range of leaves.

min_data_in_leaf

Another important structural parameter of the tree is min_data_in_leaf, its size is also related to whether it is over fitted. It specifies the minimum number of samples for the leaf node to split down. For example, set 100. If the number of node samples is less than 100, the growth will stop. Of course, min_ data_ in_ The setting of leaf also depends on the number of training samples and num_leaves. For large data sets, it is generally set above 1000.

Super parameters to improve accuracy

learning_rate and n_estimators

A common way to achieve higher accuracy is to use more subtrees and reduce the learning rate. In other words, find n in LGBM_ Estimators and learning_ The best combination of rate.

n_estimators control the number of decision trees, while learning_rate is the step parameter of gradient descent. Empirically, LGBM is easier to over fit and learn_ Rate can be used to control the speed of gradient learning, and the general value can be set between 0.01 and 0.3. The general practice is to use a little more subtree, such as 1000, and set a lower learning_rate, and then through early_stopping finds the optimal number of iterations.

max_bin

In addition, Max can also be increased_ Bin (the default value is 255) to improve the accuracy. Because the more variable boxes, the more detailed the information is retained. On the contrary, the lower the number of variable boxes, the more information is lost, but it is easier to generalize. This is the same as the box division of Feature Engineering, but it is processed by the internal hist histogram algorithm. If Max_ If bin is too high, there is also a risk of over fitting.

More super parameters to control over fitting

lambda_l1 and lambda_l2

lambda_l1 and lambda_l2 corresponds to L1 and L2 regularization, and reg of XGBoost_ Lambda and reg_alpha is the same. The higher the value, the greater the penalty for the number of leaf nodes and the weight of leaf nodes. The optimal values of these parameters are more difficult to adjust because their size is not directly related to over fitting, but will have an impact. The general search range can be (0, 100).

min_gain_to_split

This parameter defines the minimum gain of splitting. This parameter also shows the quality of the data. If the calculated gain is not high, it cannot be split down. If the depth you set is very deep, but you can't split down, LGBM will prompt warning, and you can't find anything to split, indicating that the data quality has reached the limit. The meaning of the parameter is the same as the gamma of XGBoost. The more conservative search range is (0, 20), which can be used as additional regularization in large parameter grids.

bagging_fraction and feature_fraction

The value range of these two parameters is between (0,1).

feature_fraction specifies the percentage of features to be sampled when training each tree. Its significance is also to avoid over fitting. Because some features gain very high, it may cause each subtree to use the same feature when splitting, so that each subtree is homogenized. If the feature sampling with low probability can avoid encountering these strong features every time, so that the features of the subtree become differentiated, that is, generalization.

bagging_fraction specifies the percentage of training samples used to train each tree. To use this parameter, you also need to set bagging_freq, truth and feature_ Like fraction, it also makes every sub tree better and different.

Create a search grid in Optuna

The optimization process in Optuna first requires an objective function, which includes:

Parameter grid in dictionary form
Create a model (which can be combined with cross validation kfold) to try the super parameter combination set
Data set for model training
Use this model to generate forecasts
Score the forecast according to the user-defined indicators and return

A common framework is given below. The model is a 5-fold Kfold, which can ensure the stability of the model. The last line returns the average of the CV scores that need to be optimized. The objective function can be set by itself, such as minimum logloss, maximum auc, maximum ks, minimum auc gap between training set and test set, etc.

import optuna  # pip install optuna
from sklearn.metrics import log_loss
from sklearn.model_selection import StratifiedKFold

def objective(trial, X, y):
    # Back fill
    param_grid = {}
    cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1121218)

    cv_scores = np.empty(5)
    for idx, (train_idx, test_idx) in enumerate(cv.split(X, y)):
        X_train, X_test = X.iloc[train_idx], X.iloc[test_idx]
        y_train, y_test = y[train_idx], y[test_idx]

        model = lgbm.LGBMClassifier(objective="binary", **param_grid)
        model.fit(
            X_train,
            y_train,
            eval_set=[(X_test, y_test)],
            eval_metric="binary_logloss",
            early_stopping_rounds=100,
        )
        preds = model.predict_proba(X_test)
        cv_scores[idx] = preds

    return np.mean(cv_scores)

The following is the parameter setting. Optuna uses the common method of suggest_categorical，suggest_int，suggest_float. Among them, suggest_int and suggest_ The setting method of float is (parameter, minimum value, maximum value, step = step size).

def objective(trial, X, y):
    # Parameter grid in dictionary form
    param_grid = {
        "n_estimators": trial.suggest_categorical("n_estimators", [10000]),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3),
        "num_leaves": trial.suggest_int("num_leaves", 20, 3000, step=20),
        "max_depth": trial.suggest_int("max_depth", 3, 12),
        "min_data_in_leaf": trial.suggest_int("min_data_in_leaf", 200, 10000, step=100),
        "max_bin": trial.suggest_int("max_bin", 200, 300),
        "lambda_l1": trial.suggest_int("lambda_l1", 0, 100, step=5),
        "lambda_l2": trial.suggest_int("lambda_l2", 0, 100, step=5),
        "min_gain_to_split": trial.suggest_float("min_gain_to_split", 0, 15),
        "bagging_fraction": trial.suggest_float(
            "bagging_fraction", 0.2, 0.95, step=0.1
        ),
        "bagging_freq": trial.suggest_categorical("bagging_freq", [1]),
        "feature_fraction": trial.suggest_float(
            "feature_fraction", 0.2, 0.95, step=0.1
        ),
    }

Create Optuna auto tune up

The following is a complete objective function framework for reference:

from optuna.integration import LightGBMPruningCallback

def objective(trial, X, y):
    # Parameter grid
    param_grid = {
        "n_estimators": trial.suggest_categorical("n_estimators", [10000]),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3),
        "num_leaves": trial.suggest_int("num_leaves", 20, 3000, step=20),
        "max_depth": trial.suggest_int("max_depth", 3, 12),
        "min_data_in_leaf": trial.suggest_int("min_data_in_leaf", 200, 10000, step=100),
        "lambda_l1": trial.suggest_int("lambda_l1", 0, 100, step=5),
        "lambda_l2": trial.suggest_int("lambda_l2", 0, 100, step=5),
        "min_gain_to_split": trial.suggest_float("min_gain_to_split", 0, 15),
        "bagging_fraction": trial.suggest_float("bagging_fraction", 0.2, 0.95, step=0.1),
        "bagging_freq": trial.suggest_categorical("bagging_freq", [1]),
        "feature_fraction": trial.suggest_float("feature_fraction", 0.2, 0.95, step=0.1),
        "random_state": 2021,
    }
    # 5-fold cross validation
    cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1121218)

    cv_scores = np.empty(5)
    for idx, (train_idx, test_idx) in enumerate(cv.split(X, y)):
        X_train, X_test = X.iloc[train_idx], X.iloc[test_idx]
        y_train, y_test = y[train_idx], y[test_idx]
        
        # LGBM modeling
        model = lgbm.LGBMClassifier(objective="binary", **param_grid)
        model.fit(
            X_train,
            y_train,
            eval_set=[(X_test, y_test)],
            eval_metric="binary_logloss",
            early_stopping_rounds=100,
            callbacks=[
                LightGBMPruningCallback(trial, "binary_logloss")
            ],
        )
        # model prediction 
        preds = model.predict_proba(X_test)
        # Optimization index logloss minimum
        cv_scores[idx] = log_loss(y_test, preds)

    return np.mean(cv_scores)

In the above grid, lightgbmprunningcallback is also added. This callback class is very convenient. It can detect bad hyperparameter sets before training the data, thus significantly reducing the search time.

After setting the objective function, now let the parameters be adjusted!

study = optuna.create_study(direction="minimize", study_name="LGBM Classifier")
func = lambda trial: objective(trial, X, y)
study.optimize(func, n_trials=20)

The direction can be either minimize or maximize, such as maximizing auc. Then you can set trials to control the number of attempts. Theoretically, the more times, the better the result, but also consider the running time.

After the search is completed, call best_. Value and bast_params attribute, the parameter will be called.

print(f"\tBest value (rmse): {study.best_value:.5f}")
print(f"\tBest params:")

for key, value in study.best_params.items():
    print(f"\t\t{key}: {value}")
    
-----------------------------------------------------
Best value (binary_logloss): 0.35738
	Best params:
		device: gpu
		lambda_l1: 7.71800699380605e-05
		lambda_l2: 4.17890272377219e-06
		bagging_fraction: 0.7000000000000001
		feature_fraction: 0.4
		bagging_freq: 5
		max_depth: 5
		num_leaves: 1007
		min_data_in_leaf: 45
		min_split_gain: 15.703519227860273
		learning_rate: 0.010784015325759629
		n_estimators: 10000

After we get this parameter combination, we can take it to run the model, see the results, and then manually fine tune it, which can save a lot of time.

epilogue

This paper presents a code framework for tuning LGBM through Optuna, which is very convenient to use. The range of parameter interval needs to be adjusted according to the data situation, and the optimization objective can be defined by itself, which is not limited to the logloss of the above code.

About the power of Optuna, we will compare the introduction of similar parameter adjustment tools later. Please look forward to it.

It's not easy to be original. Praise me and let me continue to insist.

Original articles are constantly updated. You can search "Python data science" on wechat for the first time.

Keywords: Python Machine Learning AI

Added by oprpg on Sat, 20 Nov 2021 01:20:28 +0200

Programming VIP