[Python] use kfold in sklearn to realize cross validation in the model

In the previous article, the data set is divided into 37 points in order, which will lead to inaccurate results. Therefore, this paper uses the kfold method in sklearn to realize cross validation, so as to make the results more accurate Last article ----- > Python post processing data format run model (pycruise) - verify data validity ...

Added by POGRAN on Sat, 12 Feb 2022 18:52:26 +0200

Three common hyperparametric tuning methods and codes

Super parameter optimization methods: grid search, random search, Bayesian Optimization (BO) and other algorithms. ​ reference material: Three super parameter optimization methods are explained in detail, as well as code implementation ​ Experimental basic code import numpy as np import pandas as pd from lightgbm.sklearn import LGBMRegr ...

Added by JOWP on Mon, 07 Feb 2022 20:20:17 +0200

Overview of characteristic Engineering

Characteristic Engineering 1. Definitions 1.1 why feature engineering is needed The features in the sample data may have missing values, duplicate values, abnormal values, etc., so the relevant noise data in the features need to be processed Processing purpose: there is a purer sample set, so that the model can have better prediction abil ...

Added by terandle on Thu, 03 Feb 2022 06:09:30 +0200

[data preparation and Feature Engineering] feature transformation

1. Feature digitization 1.1 Replace() function import pandas as pd df = pd.DataFrame({"gene_segA": [1, 0, 0, 1, 1, 1, 0, 0, 1, 0], "gene_segB": [1, 0, 1, 0, 1, 1, 0, 0, 1, 0], "hypertension": ["Y", 'N', 'N', 'N', 'N', 'N', 'Y', 'N', 'Y', 'N'], "Gallstones": ['Y', 'N', 'N', 'N', 'Y', 'Y', 'Y', 'N', 'N', 'Y'] }) df df.replace({"N": 0, 'Y': ...

Added by croakingtoad on Wed, 02 Feb 2022 15:20:32 +0200

Machine learning artifact scikit learn nanny level introductory tutorial

Scikit learn nanny level introductory tutorial Scikit learn is a well-known Python machine learning library, which is widely used in data science fields such as statistical analysis and machine learning modeling. Modeling invincible: users can realize various supervised and unsupervised learning models through scikit learnVarious functions: a ...

Added by MattG on Sun, 30 Jan 2022 20:15:07 +0200

Clustering evaluation index in sklearn

Measuring the performance of clustering algorithm is not simply counting the number of errors or calculating the precision and recall in supervised classification algorithm. There are many evaluation indicators for clustering algorithms. This paper is mainly based on the sklearn machine learning library, which provides a series of measu ...

Added by srini_r_r on Thu, 13 Jan 2022 07:26:02 +0200

Machine learning sklearn random forest

catalogue 1 integrated learning 2 random forest classifier 2.1 random forest classifier function and its parameters 2.2 construction of random forest 2.3 comparison of random forest and decision tree under cross validation 2.4 drawing n_ Learning curve of estimators 3 random forest regressor 3.1 random forest classifier function and its ...

Added by xenooreo on Tue, 11 Jan 2022 14:22:59 +0200

sklearn-Section 6 (PCA)

1. Principal Component Analysis (PCA) Thought and Principle 1.1 What is principal component analysis PCA(Principal Component Analysis), a principal component analysis method, is the most widely used data dimension reduction algorithm (unsupervised machine learning method). Its main purpose is to "reduce dimensionality", by disjunct ...

Added by mania on Sat, 01 Jan 2022 09:17:53 +0200

Explanation of Python scikit learn feature extraction

Feature extraction is simply to convert a series of data into digital features that can be used for machine learning. sklearn.feature_extraction is a module of scikit learn feature extraction This paper summarizes the following contents: Onehot encodingDictVectorizer usesCountVectorizer useTfidfVectorizer usesHashingVectorizer uses 1.O ...

Added by j4v1 on Wed, 29 Dec 2021 09:42:42 +0200

Integrated learning: XGBoost

Rimeng Society AI AI:Keras PyTorch MXNet TensorFlow PaddlePaddle deep learning real combat (irregular update) Integrated learning: Bagging, random forest, Boosting, GBDT Integrated learning: XGBoost Integrated learning: lightGBM (I) Integrated learning: lightGBM (II) 5.1 principle of xgboost algorithm XGBoost (Extreme Gradient Boosting ...

Added by Pikachu2000 on Tue, 28 Dec 2021 10:41:22 +0200