[Python] use kfold in sklearn to realize cross validation in the model
In the previous article, the data set is divided into 37 points in order, which will lead to inaccurate results. Therefore, this paper uses the kfold method in sklearn to realize cross validation, so as to make the results more accurate
Last article ----- > Python post processing data format run model (pycruise) - verify data validity
...
Added by POGRAN on Sat, 12 Feb 2022 18:52:26 +0200
Three common hyperparametric tuning methods and codes
Super parameter optimization methods: grid search, random search, Bayesian Optimization (BO) and other algorithms.
reference material: Three super parameter optimization methods are explained in detail, as well as code implementation
Experimental basic code
import numpy as np
import pandas as pd
from lightgbm.sklearn import LGBMRegr ...
Added by JOWP on Mon, 07 Feb 2022 20:20:17 +0200
Overview of characteristic Engineering
Characteristic Engineering
1. Definitions
1.1 why feature engineering is needed
The features in the sample data may have missing values, duplicate values, abnormal values, etc., so the relevant noise data in the features need to be processed
Processing purpose: there is a purer sample set, so that the model can have better prediction abil ...
Added by terandle on Thu, 03 Feb 2022 06:09:30 +0200
[data preparation and Feature Engineering] feature transformation
1. Feature digitization
1.1 Replace() function
import pandas as pd
df = pd.DataFrame({"gene_segA": [1, 0, 0, 1, 1, 1, 0, 0, 1, 0],
"gene_segB": [1, 0, 1, 0, 1, 1, 0, 0, 1, 0],
"hypertension": ["Y", 'N', 'N', 'N', 'N', 'N', 'Y', 'N', 'Y', 'N'],
"Gallstones": ['Y', 'N', 'N', 'N', 'Y', 'Y', 'Y', 'N', 'N', 'Y']
})
df
df.replace({"N": 0, 'Y': ...
Added by croakingtoad on Wed, 02 Feb 2022 15:20:32 +0200
Machine learning artifact scikit learn nanny level introductory tutorial
Scikit learn nanny level introductory tutorial
Scikit learn is a well-known Python machine learning library, which is widely used in data science fields such as statistical analysis and machine learning modeling.
Modeling invincible: users can realize various supervised and unsupervised learning models through scikit learnVarious functions: a ...
Added by MattG on Sun, 30 Jan 2022 20:15:07 +0200
Clustering evaluation index in sklearn
Measuring the performance of clustering algorithm is not simply counting the number of errors or calculating the precision and recall in supervised classification algorithm. There are many evaluation indicators for clustering algorithms. This paper is mainly based on the sklearn machine learning library, which provides a series of measu ...
Added by srini_r_r on Thu, 13 Jan 2022 07:26:02 +0200
Machine learning sklearn random forest
catalogue
1 integrated learning
2 random forest classifier
2.1 random forest classifier function and its parameters
2.2 construction of random forest
2.3 comparison of random forest and decision tree under cross validation
2.4 drawing n_ Learning curve of estimators
3 random forest regressor
3.1 random forest classifier function and its ...
Added by xenooreo on Tue, 11 Jan 2022 14:22:59 +0200
sklearn-Section 6 (PCA)
1. Principal Component Analysis (PCA) Thought and Principle
1.1 What is principal component analysis
PCA(Principal Component Analysis), a principal component analysis method, is the most widely used data dimension reduction algorithm (unsupervised machine learning method).
Its main purpose is to "reduce dimensionality", by disjunct ...
Added by mania on Sat, 01 Jan 2022 09:17:53 +0200
Explanation of Python scikit learn feature extraction
Feature extraction is simply to convert a series of data into digital features that can be used for machine learning. sklearn.feature_extraction is a module of scikit learn feature extraction
This paper summarizes the following contents:
Onehot encodingDictVectorizer usesCountVectorizer useTfidfVectorizer usesHashingVectorizer uses
1.O ...
Added by j4v1 on Wed, 29 Dec 2021 09:42:42 +0200
Integrated learning: XGBoost
Rimeng Society
AI AI:Keras PyTorch MXNet TensorFlow PaddlePaddle deep learning real combat (irregular update)
Integrated learning: Bagging, random forest, Boosting, GBDT
Integrated learning: XGBoost
Integrated learning: lightGBM (I)
Integrated learning: lightGBM (II)
5.1 principle of xgboost algorithm
XGBoost (Extreme Gradient Boosting ...
Added by Pikachu2000 on Tue, 28 Dec 2021 10:41:22 +0200