Film recommendation system Xiamen University database laboratory version
Resource address: http://dblab.xmu.edu.cn/post/movierecommend/
Project introduction
1. Recommendation system
Discover the potential needs of users according to their historical data.
2. Long tail commodity
Different from popular goods, popular goods represent the general needs of users, while long tail goods represent the personalized need ...
Added by linusx007 on Sun, 23 Jan 2022 18:04:39 +0200
Spark Development Learning: using idea to develop spark applications
Spark Learning: using idea to develop spark applications
This article is based on jdk1 8. The idea development tool and maven are all configured.
background
Because saprk service has been deployed on the remote centos server, but the code of spark based application is developed in the local idea, how to make the locally developed spark code ...
Added by jon2396 on Thu, 20 Jan 2022 19:55:28 +0200
Spark performance optimization guide - train of thought
preface
Spark job optimization is actually a general topic, because sometimes it is slow, but the solution is really different. I want to point out all aspects of optimization so that the system can formulate the overall optimization scheme.
Sorting out optimization ideas
How to treat the so-called slow problem? I made a sorting:
themeresou ...
Added by jber on Fri, 14 Jan 2022 22:46:36 +0200
org.apache.spark.SparkException: Task not serializable
preface
This article belongs to the column Spark abnormal problems summary, which is original by the author. Please indicate the source of quotation. Please help point out the deficiencies and errors in the comment area. Thank you!
Please refer to Spark exception summary for the directory structure and references of this column
text
If ...
Added by johnska7 on Fri, 14 Jan 2022 14:37:54 +0200
Will Python script be invoked in Spark Scala/Java application?
Abstract: This article will introduce how to call Python script in Spark scala program, and the procedure of calling Spark java program is basically the same.
This article is shared from Huawei cloud community< [Spark] how to invoke Python script in Spark Scala/Java application >, author: little rabbit 615.
1.PythonRunner
For programs run ...
Added by MrRosary on Thu, 13 Jan 2022 09:18:20 +0200
Rpc architecture of Spark source code
1, Overview
In spark, many places involve network communication, such as message exchange between various components of spark, upload of user files and Jar packets, Shuffle process data transmission between nodes, copy and backup of Block data, etc. Spark1. Before 6, Spark Rpc was implemented based on Akka, which is an asynchronous message ...
Added by ploppy on Tue, 11 Jan 2022 01:14:43 +0200
Big data - Summary of common operators of Spark RDD
The core of Spark is based on the same abstract Resilient Distributed Datasets (RDD), which enables the components of Spark to integrate seamlessly and complete big data processing in the same application
1. Basic concepts of RDD
RDD is the most important abstract concept provided by spark. It is a special data set with fault-tolerant mechani ...
Added by cainfool on Mon, 10 Jan 2022 21:37:05 +0200
Teach you how to call Python script in Spark Scala/Java application.
Abstract: This article will introduce how to call Python script in Spark scala program, and the procedure of calling Spark java program is basically the same.
This article is shared from Huawei cloud community< [Spark] how to invoke Python script in Spark Scala/Java application >, author: little rabbit 615.
1.PythonRunner
For programs run ...
Added by billynastie on Mon, 10 Jan 2022 04:18:39 +0200
Spark sparksql foundation, DataFrame, DataSet
Spark-SQL
summary
Spark SQL is a spark module used by spark for structured data processing.
For developers, SparkSQL can simplify the development of RDD, improve the development efficiency, and the execution efficiency is very fast. Therefore, in practical work, SparkSQL is basically used. In order to simplify the development of RDD and impr ...
Added by Asnom on Thu, 06 Jan 2022 08:03:44 +0200
[big data framework and practice] - Chapter 1 spark basic course
Section 1 Introduction to spark
1. What is spark?
1.apache spark is a unified computing engine and a set of class libraries. Using spark to process data is 100 times faster than the traditional way. 2. It is not that spark is 100 times faster than python on a single computer, but that spark is mainly used for parallel data processing on c ...
Added by hotcigar on Wed, 05 Jan 2022 08:29:00 +0200