[machine learning] how to use random grid search to shorten the grid search speed?

Random grid search RandomSearchCV learning notes, including:

Basic principle of random grid search
skelarn application of random grid search (case: house price dataset _python)
Application of continuous distribution in random grid search (case: house price data set _python)

Indexes

🔣 Functions and parameters

🔑 formula

🗣 case

📌 Noun interpretation

📖 Extract

1 basic principle of random grid search

📖 Factors affecting the speed of enumeration grid search

1. Size of parameter space: the larger the parameter space is, the more modeling is required

2. Size of data volume: the larger the data volume, the more computing power and time required for each modeling

🗣 Case: global parameter space VS partial parameter space (schematic diagram)

n_e_list=range(50,350,50)
m_d_list=range(2,7)

comb=pd.DataFrame([(n_estimators, max_depth)
                   for n_estimators in n_e_list
                   for max_depth in m_d_list]) # Create n_e_list and M_ d_ Cartesian product of list

fig,[ax1,ax2]=plt.subplots(1,2,dpi=100)
ax1.scatter(comb.iloc[:,0],comb.iloc[:,1])
ax1.set_title('GridSearch')

ax2.scatter(comb.iloc[:,0],comb.iloc[:,1])
ax2.scatter([50,250,200,200,300,100,150,150],[4,2,6,3,2,3,2,5],cmap='red',s=50)
ax2.set_title('RandomSearch')
plt.show()

📌 Random grid search
 The method of randomly extracting parameter subspace and searching in self space.

Advantages over enumeration grid search:

- Fast operation speed
- Large coverage space
- The minimum loss is close to the minimum loss of enumeration network



📖 Sampling characteristics of random grid search

Random grid search adopts "cyclic iteration".

In this iteration, a group of parameters is randomly selected for modeling, and in the next iteration, a group of parameters is randomly selected for modeling. Since this random sampling is not put back, there will be no problem of sampling the same set of parameters twice.

The number of iterations of random grid search can be controlled to control the size of the parameter subspace extracted as a whole. This practice is often referred to as "giving random grid search a fixed amount of calculation. When all the calculation is consumed, the random grid search stops".

In practice, random grid search does not sample out the subspace first, and then search the subspace.

2. Implementation of random grid search

🔣 Random grid search in skelarn

from sklearn.model_selection import RandomizedSearchCV

RandomizedSearchCV(
    estimator, # Evaluator
    param_distributions, # Global parameter space
    *,
    n_iter=10, # Number of iterations
    scoring=None, # Evaluation index
    n_jobs=None, 
    refit=True, # Whether to select and evaluate the best data set
    cv=None, # Cross validation mode
    verbose=0,
    pre_dispatch='2*n_jobs', # Number of task divisions when multitasking is parallel
    random_state=None,
    error_score=nan, # When the grid search reports an error, the result will be returned. When 'raise' is selected, the error will be reported directly and the training process will be interrupted. In other cases, the training will continue after a warning message is displayed
    return_train_score=False, # Whether to display parameter scores in training set
)

Name	Description
estimator	Parameter adjustment object, an evaluator
param_distributions	The global parameter space can be a dictionary or a list composed of dictionaries
n_iter	The number of iterations. The more iterations, the larger the extracted sub parameter space
scoring	Evaluation indicators, supporting simultaneous output of multiple parameters
n_jobs	Set the number of threads participating in the calculation when working
refit	Select the evaluation indicators and the best parameters for training on the complete data set
cv	Discount of cross validation
verbose	Output work log form
pre_dispatch	Number of task divisions when multitasking is parallel
random_state	Random number seed
error_score	When the grid search reports an error, the result will be returned. When 'raise' is selected, the error will be reported directly and the training process will be interrupted. In other cases, the training will continue after a warning message is displayed
return_train_score	Whether to display parameter scores in training set in cross validation

🔣 Case: application of random grid in random forest_ House price data set

📖 Under the same parameter space and model, the search speed of random grid is faster than that of ordinary grid.

Running time ≈ n_iter / number of global space combinations * grid search

from sklearn.ensemble import RandomForestRegressor as RFR
from sklearn.model_selection import KFold

param_grid_simple = {'n_estimators': range(50,150,10)
                     , 'max_depth': range(10,25,2)
                     , "max_features": ["sqrt",16,32,64,"auto"]
                     , "min_impurity_decrease": np.arange(0,5,2)
                    }

#Calculate parameter space size
def count_space(param):
    no_option = 1
    for i in param_grid_simple:
        no_option *= len(param_grid_simple[i])
    print(no_option)
    
count_space(param_grid_simple)

# Training model
model = RFR(random_state=7,verbose=True,n_jobs=4)
cv = KFold(n_splits=5,shuffle=True,random_state=7)
search = RandomizedSearchCV(estimator=model
                            ,param_distributions=param_grid_simple
                            ,n_iter = 600 #The size of the subspace is about half of the global space
                            ,scoring = "neg_mean_squared_error"
                            ,verbose = True
                            ,cv = cv
                            ,random_state=1412
                            ,n_jobs=-1
                           )

search.fit(X,y)

search.best_estimator_ # View model parameter results
# RandomForestRegressor(max_depth=18, max_features=16, min_impurity_decrease=0,
#                       n_jobs=4, random_state=7, verbose=True)

abs(search.best_score_)**0.5 # View model RMSE score
# 29160.978459432965

# View the model effect of the optimal parameters
from sklearn.model_selection import cross_validate
ad_reg=RFR(max_depth=18
           , max_features=16
           , min_impurity_decrease=0
           , random_state=7
           , n_jobs=-1)

def RMSE(cvresult,key):
    return (abs(cvresult[key])**0.5).mean()

def rebuild_on_best_param(ad_reg):
    cv = KFold(n_splits=5,shuffle=True,random_state=7)
    result_post_adjusted = cross_validate(ad_reg,X,y
                                          ,cv=cv
                                          ,scoring="neg_mean_squared_error"
                                          ,return_train_score=True
                                          ,verbose=True
                                          ,n_jobs=-1)
    print("train RMSE:{:.3f}".format(RMSE(result_post_adjusted,"train_score")))
    print("test RMSE:{:.3f}".format(RMSE(result_post_adjusted,"test_score")))

rebuild_on_best_param(ad_reg)
# Training RMSE:10760.565
# Test RMSE:28265.808

3 continuous parameter space

📖 Continuous type may bring better value
Grid search: only combined parameters can be used to combine points;
Random search: accept distribution as input

As shown in the figure above, for grid search, if the lowest point of the loss function is between two sets of parameters, it is impossible to find the minimum value by enumerating grid search; However, for random grid search, because the parameter points are randomly selected on a section of distribution, it is more likely to get better values in the same parameter space.

📖 When the parameter space contains a distribution, the size of the global parameter space cannot be estimated.

🗣 Case: min_impurity_decrease for continuous distribution search

📖 Effect of using continuous distribution in random search
Compared with grid search, it runs faster in the same search space, and the cross validation results of search and reconstruction are slightly better than RMSE;
Compared with small space grid search, the running time is longer and RMSE is slightly better;
Compared with large space grid search, the running time is longer and RMSE is slightly worse (the model effect is not necessarily).

Effect: continuous random mesh > large space random mesh > random mesh > mesh search
Operation speed: grid search > continuous random grid > large space random grid > Random grid

When the global parameter space used in enumeration grid search is large enough / dense enough, the optimal solution of enumeration grid search is the upper limit of random grid search. Therefore, in theory, random grid search will not get better results than enumeration grid search.
```
param_grid_simple={'n_estimators':range(50,150,10)
                   ,'max_depth':range(10,25,2)
                   ,'max_features':range(10,20,2)
                   ,'min_impurity_decrease':scipy.stats.uniform(0,50)}

model=RFR(random_state=7)
cv=KFold(n_splits=5,shuffle=True,random_state=7)

search=RandomizedSearchCV(estimator=model
                          ,param_distributions=param_grid_simple
                          ,n_iter=600
                          ,scoring='neg_mean_squared_error'
                          ,cv=cv
                          ,random_state=7
                          ,n_jobs=4)

search.fit(X,y)

search.best_estimator_
# RandomForestRegressor(max_depth=18, max_features=16,
#                       min_impurity_decrease=34.80143424780533, random_state=7)

abs(search.best_score_)**0.5
# 29155.5402993104

rebuild_on_best_param(search.best_estimator_)
# Training RMSE:10733.842
# Test RMSE:28285.986
```

Keywords: Python Machine Learning

Added by Fed51 on Tue, 01 Mar 2022 14:05:02 +0200

Programming VIP

[machine learning] how to use random grid search to shorten the grid search speed?

Popular Keywords