[several data set sampling methods]

Pytorch Sampler When training neural network, if the amount of data is too large to put the data into the network for training at one time, it is necessary to read the data in batches. This problem involves how to read data from the data set. PyTorch framework provides Sampler base class and multiple subclasses to realize data sampling in ...

Added by lth2h on Wed, 02 Mar 2022 15:14:19 +0200

NLP: Text Clustering [PCA -- > K-means]

What is text clustering? Text clustering is to transform the original natural language text information into mathematical information, which is displayed in the form of high-dimensional spatial points. By calculating the distance between those points, those points are clustered into a cluster, and the center of the cluster is called the cluster ...

Added by pbsperry on Tue, 22 Feb 2022 20:49:19 +0200

5-minute NLP: summary of three pre training libraries for rapid realization of NER

In the NLP task of automatic text understanding, named entity recognition (NER) is the primary task. The function of NER model is to identify named entities in text corpus, such as person name, organization, location, language and so on.NER model can be used to understand the meaning of a text sentence / phrase. It can recognize the words that ...

Added by Trafalger on Mon, 21 Feb 2022 03:23:06 +0200

Use process of Rasa chat robot framework

Detailed chat process (Rasa robot) sketch: Rasa is a framework that can be used to build a robot dialogue system. Building a robot dialogue system based on rasa framework can be used in various industrial voice intelligent service scenarios, such as telemedicine consultation, intelligent customer service, insurance product sales, financial co ...

Added by cheerio on Wed, 16 Feb 2022 13:02:43 +0200

Automatic Title Generation Based on BERT

🐱 Text title generation based on BERT A good title is based on the ingenious refinement of the article content, which can quickly arouse the interest of readers. In order to generate news headlines quickly and accurately, this project uses the classic BERT model to automatically complete the generation of news headlines. This project refer ...

Added by sethi_kapil on Sun, 13 Feb 2022 08:37:06 +0200

[Python] use kfold in sklearn to realize cross validation in the model

In the previous article, the data set is divided into 37 points in order, which will lead to inaccurate results. Therefore, this paper uses the kfold method in sklearn to realize cross validation, so as to make the results more accurate Last article ----- > Python post processing data format run model (pycruise) - verify data validity ...

Added by POGRAN on Sat, 12 Feb 2022 18:52:26 +0200

[Algorithms] Calculate Bob Dylan's lyrics using hash table partitioning

1. Overview Bob Dylan is a great American poet and songwriter. His creative works contribute a lot to American culture and even to the culture of the whole world. This paper uses NLTK to extract nouns from Bob Dylan's lyrics, stores his lyrics in a hash table for word segmentation statistics, and visualizes the lyrics with high frequency using ...

Added by zimick on Sat, 12 Feb 2022 02:12:22 +0200

Word2vec (skip gram and CBOW) - PyTorch

Word vector is a vector used to express the meaning of words, and can also be regarded as the feature vector of words. The technology of mapping words to real vectors is called word embedding. 1, Word embedding (Word2vec) The unique heat vector can not accurately express the similarity between different words. word2vec is proposed to so ...

Added by j0n on Fri, 11 Feb 2022 13:14:28 +0200

Attention is All You Need paper notes and pytorch code Notes

Self reference Li Mu read the paper and pytorch code I don't understand residual networkPosition-wiseLayer normEncoder attention Parameter setting ## dimension d_model = 512 # Dimensions of sub layers, embedding layers and outputs (an addition operation to make use of residual connection) d_inner_hid = 2048 # Dimension of Feed Forward(MLP) ...

Added by lyasian on Wed, 09 Feb 2022 17:29:29 +0200

Propeller regular season: Chinese News Text Title Classification - No. 1 scheme in December

Regular season: Chinese News Text Title Classification I Scheme introduction 1.1 introduction to the competition: Text classification is to automatically classify and mark the text set (or other entities or objects) according to a certain classification system or standard with the help of computer. This competition is for news title text cla ...

Added by scofansnags on Thu, 03 Feb 2022 08:07:42 +0200