In the previous article, the data set is divided into 37 points in order, which will lead to inaccurate results. Therefore, this paper uses the kfold method in sklearn to realize cross validation, so as to make the results more accurate
Last article ----- > Python post processing data format run model (pycruise) - verify data validity
Added by POGRAN on Sat, 12 Feb 2022 18:52:26 +0200
Super parameter optimization methods: grid search, random search, Bayesian Optimization (BO) and other algorithms.
reference material: Three super parameter optimization methods are explained in detail, as well as code implementation
Experimental basic code
import numpy as np
import pandas as pd
from lightgbm.sklearn import LGBMRegr ...
1.1 why feature engineering is needed
The features in the sample data may have missing values, duplicate values, abnormal values, etc., so the relevant noise data in the features need to be processed
Processing purpose: there is a purer sample set, so that the model can have better prediction abil ...
Added by terandle on Thu, 03 Feb 2022 06:09:30 +0200
Scikit learn nanny level introductory tutorial
Scikit learn is a well-known Python machine learning library, which is widely used in data science fields such as statistical analysis and machine learning modeling.
Modeling invincible: users can realize various supervised and unsupervised learning models through scikit learnVarious functions: a ...
Measuring the performance of clustering algorithm is not simply counting the number of errors or calculating the precision and recall in supervised classification algorithm. There are many evaluation indicators for clustering algorithms. This paper is mainly based on the sklearn machine learning library, which provides a series of measu ...
Added by srini_r_r on Thu, 13 Jan 2022 07:26:02 +0200
1 integrated learning
2 random forest classifier
2.1 random forest classifier function and its parameters
2.2 construction of random forest
2.3 comparison of random forest and decision tree under cross validation
2.4 drawing n_ Learning curve of estimators
3 random forest regressor
3.1 random forest classifier function and its ...
Added by xenooreo on Tue, 11 Jan 2022 14:22:59 +0200
1. Principal Component Analysis (PCA) Thought and Principle
1.1 What is principal component analysis
PCA(Principal Component Analysis), a principal component analysis method, is the most widely used data dimension reduction algorithm (unsupervised machine learning method).
Its main purpose is to "reduce dimensionality", by disjunct ...
Feature extraction is simply to convert a series of data into digital features that can be used for machine learning. sklearn.feature_extraction is a module of scikit learn feature extraction
This paper summarizes the following contents:
Onehot encodingDictVectorizer usesCountVectorizer useTfidfVectorizer usesHashingVectorizer uses