Datawhale zero foundation entry data mining Task5 model fusion
Datawhale zero foundation entry data mining Task5 model fusion
5, Model fusion
Game Title: Zero basic entry data mining - used car transaction price prediction
5.1 model fusion objectives
Model fusion is carried out for the models completed by multiple parameters adjustment.Complete the fusion of multiple models.
5.2 content introduction
...
Added by pbs on Sat, 19 Feb 2022 19:23:15 +0200
Classification of film reviews using naive Bayes
Classification of film reviews using naive Bayes
1. Data set explanation:
The data set is a subset of IMDB movie data set, which has been divided into test set and training set. The training set includes 25000 movie reviews, and there are 25000 test sets. The data set has been preprocessed to convert the specific word sequence of each rev ...
Added by thestars on Sat, 19 Feb 2022 08:24:35 +0200
Record the process of data analysis. Child vision data
Recently, I analyzed the vision data of a child and recorded it. Small partners who need data can download it.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re
import os
import seaborn as sns
import scipy.stats as ss
plt.rcParams['font.family'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
res_dir = ...
Added by fatmart on Tue, 15 Feb 2022 22:41:07 +0200
Integrated learning case 2 (steam volume prediction)
Background introduction
The basic principle of thermal power generation is: when the fuel is burned, it heats water to generate steam, the steam pressure drives the steam turbine to rotate, and then the steam turbine drives the generator to rotate to generate electric energy. In this series of energy conversion, the core affecting the power ge ...
Added by dgrinberg on Fri, 11 Feb 2022 23:02:01 +0200
Preprocessing of time series data
Time series data can be seen everywhere. In order to analyze time series, we must preprocess the data first. Time series preprocessing technology has a significant impact on the accuracy of data modeling.In this article, we will mainly discuss the following points:Definition and importance of time series data.Preprocessing steps of time series ...
Added by gillypogi on Fri, 11 Feb 2022 17:10:19 +0200
8 Python libraries that can improve the efficiency of data science and save valuable time
In data science, you may waste a lot of time coding and waiting for the computer to run something. So I chose some Python libraries that can help you save valuable time.1,OptunaOptuna is an open source hyperparametric optimization framework, which can automatically find the best hyperparameters for machine learning models.The most basic (and po ...
Added by Horatiu on Fri, 11 Feb 2022 08:29:27 +0200
Exploratory Data Analysis EDA (Exploratory Data Analysis) analysis with python
Exploratory Data Analysis EDA (Exploratory Data Analysis) analysis with python
show holy respect to python community, for there dedication and wisdom
Dataset related:
First, UCL wine dataset:
UCI data set is a commonly used standard test data set for machine learning. It is a database for machine learning proposed by the University of ...
Added by iacataca on Thu, 10 Feb 2022 05:31:49 +0200
Hadoop + spark big data analysis: Hadoop cluster construction
Article catalogue
preface
1, Download and configuration of cluster environment
1. Download hadoop
2. Configure hadoop environment variables
Configure hadoop core environment
Configure core site xml
Configure HDFS site xml
Configure mapred site xml
Configure yarn site xml
Configure workers
Disable firewall
2, Clone ...
Added by jonniejoejonson on Tue, 08 Feb 2022 05:25:06 +0200
Climb the movie reviews of Watergate bridge to generate visual data and word clouds
1, Crawling Movie Reviews
In order to analyze the data of renyin's Spring Festival New Year film "Changjin Lake - shuimen bridge", more than 40000 film reviews were crawled from the cat's eye by means of reptiles.
1. In order to prevent the address from being banned, the proxy address pool is used for crawling:
To set the proxy add ...
Added by ugh82 on Sun, 06 Feb 2022 21:53:19 +0200
Collaborative filtering based on regression model (random gradient descent + alternating least squares optimization)
Collaborative filtering based on regression model
Considering the score as a continuous value rather than a discrete value, we can predict the score of the target user on an item with the help of the idea of linear regression. A Baseline implementation strategy is called Baseline.
1. Baseline: benchmark forecast
The Baseline design idea ...
Added by snap2000 on Fri, 04 Feb 2022 09:19:13 +0200