Datawhale zero foundation entry data mining Task5 model fusion

Datawhale zero foundation entry data mining Task5 model fusion 5, Model fusion Game Title: Zero basic entry data mining - used car transaction price prediction 5.1 model fusion objectives Model fusion is carried out for the models completed by multiple parameters adjustment.Complete the fusion of multiple models. 5.2 content introduction ...

Added by pbs on Sat, 19 Feb 2022 19:23:15 +0200

Classification of film reviews using naive Bayes

Classification of film reviews using naive Bayes 1. Data set explanation: The data set is a subset of IMDB movie data set, which has been divided into test set and training set. The training set includes 25000 movie reviews, and there are 25000 test sets. The data set has been preprocessed to convert the specific word sequence of each rev ...

Added by thestars on Sat, 19 Feb 2022 08:24:35 +0200

Record the process of data analysis. Child vision data

Recently, I analyzed the vision data of a child and recorded it. Small partners who need data can download it. import numpy as np import pandas as pd import matplotlib.pyplot as plt import re import os import seaborn as sns import scipy.stats as ss plt.rcParams['font.family'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = False res_dir = ...

Added by fatmart on Tue, 15 Feb 2022 22:41:07 +0200

Integrated learning case 2 (steam volume prediction)

Background introduction The basic principle of thermal power generation is: when the fuel is burned, it heats water to generate steam, the steam pressure drives the steam turbine to rotate, and then the steam turbine drives the generator to rotate to generate electric energy. In this series of energy conversion, the core affecting the power ge ...

Added by dgrinberg on Fri, 11 Feb 2022 23:02:01 +0200

Preprocessing of time series data

Time series data can be seen everywhere. In order to analyze time series, we must preprocess the data first. Time series preprocessing technology has a significant impact on the accuracy of data modeling.In this article, we will mainly discuss the following points:Definition and importance of time series data.Preprocessing steps of time series ...

Added by gillypogi on Fri, 11 Feb 2022 17:10:19 +0200

8 Python libraries that can improve the efficiency of data science and save valuable time

In data science, you may waste a lot of time coding and waiting for the computer to run something. So I chose some Python libraries that can help you save valuable time.1,OptunaOptuna is an open source hyperparametric optimization framework, which can automatically find the best hyperparameters for machine learning models.The most basic (and po ...

Added by Horatiu on Fri, 11 Feb 2022 08:29:27 +0200

Exploratory Data Analysis EDA (Exploratory Data Analysis) analysis with python

Exploratory Data Analysis EDA (Exploratory Data Analysis) analysis with python   show holy respect to python community, for there dedication and wisdom   Dataset related: First, UCL wine dataset: UCI data set is a commonly used standard test data set for machine learning. It is a database for machine learning proposed by the University of ...

Added by iacataca on Thu, 10 Feb 2022 05:31:49 +0200

Hadoop + spark big data analysis: Hadoop cluster construction

  Article catalogue preface 1, Download and configuration of cluster environment 1. Download hadoop 2. Configure hadoop environment variables Configure hadoop core environment Configure core site xml Configure HDFS site xml Configure mapred site xml Configure yarn site xml Configure workers Disable firewall 2, Clone ...

Added by jonniejoejonson on Tue, 08 Feb 2022 05:25:06 +0200

Climb the movie reviews of Watergate bridge to generate visual data and word clouds

1, Crawling Movie Reviews In order to analyze the data of renyin's Spring Festival New Year film "Changjin Lake - shuimen bridge", more than 40000 film reviews were crawled from the cat's eye by means of reptiles. 1. In order to prevent the address from being banned, the proxy address pool is used for crawling: To set the proxy add ...

Added by ugh82 on Sun, 06 Feb 2022 21:53:19 +0200

Collaborative filtering based on regression model (random gradient descent + alternating least squares optimization)

Collaborative filtering based on regression model Considering the score as a continuous value rather than a discrete value, we can predict the score of the target user on an item with the help of the idea of linear regression. A Baseline implementation strategy is called Baseline. 1. Baseline: benchmark forecast The Baseline design idea ...

Added by snap2000 on Fri, 04 Feb 2022 09:19:13 +0200