[proficient in Spark series] is it difficult to start everything? This article makes it easy for you to get started with Spark

๐Ÿš€ Author: "big data Zen" ๐Ÿš€ ** Introduction * *: This article is a series of spark articles. The column will record the contents from the basic to advanced spark, including the introduction of spark, cluster construction, core components, RDD, the use of operators, underlying principles, SparkCore, SparkSQL, SparkStreaming, etc, S ...

Added by stringman on Sun, 05 Dec 2021 18:32:19 +0200

2021-11-29 the 38th step towards procedural ape

catalogue 1, linux overview 2, Installing VMware 3, Installing LINUX 4, linux common commands 4.1 description of command format 4.2 three common commands 4.3 help command 4.4 document processing instructions 4.5 document viewing instructions 4.6 document search instruction 4.7 file (DE) compression instruction 4.8 time instruction ...

Added by programming.name on Mon, 29 Nov 2021 15:20:28 +0200

Feature processing of individual loan default prediction competition in CCF big data and computational intelligence competition

Game address portal: CCF big data and computing intelligence competition First read the data import matplotlib.pyplot as plt import seaborn as sns import gc import re import pandas as pd import lightgbm as lgb import numpy as np from sklearn.metrics import roc_auc_score, precision_recall_curve, roc_curve, average_precision_score from sklearn ...

Added by genesysmedia on Sun, 28 Nov 2021 08:15:17 +0200

Big data Flume enterprise development practice

1 replication and multiplexing 1.1 case requirements Flume-1 is used to monitor file changes. Flume-1 passes the changes to Flume-2, which is responsible for storing them To HDFS. At the same time, Flume-1 passes the changes to Flume-3, which is responsible for outputting them to the local file system. 1.2 demand analysis: single data ...

Added by ss-mike on Fri, 26 Nov 2021 15:40:56 +0200

Hadoop deployment and configuration

Hadoop download address https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/ 1, Hadoop installation 1. Upload hadoop-3.1.3.tar.gz to / opt/software directory of linux hadoop-3.1.3.tar.gz 2. Unzip hadoop-3.1.3.tar.gz to / opt/server / [linux@node1 software]$ tar -zxvf hadoop-3.1.3.tar.gz -C /opt/server/ 3. Modify / etc/profile. ...

Added by the7soft.com on Fri, 26 Nov 2021 13:00:49 +0200

scala -- process control + yield derivation + scala does not have continue or break?

1. Process control structure 1.1 general In the actual development, we have to write thousands of lines of code. The order of the code is different, and the execution results will certainly be affected. Some codes can be executed only if they meet specific conditions, and some codes need to be executed repeatedly. How to reasonably plan these ...

Added by xmanofsteel69 on Fri, 26 Nov 2021 02:52:49 +0200

Flink Core Programming

Flink Core Programming 1,Environment When Flink Job submits to perform calculations, it first establishes a link with the Flink framework, that is, the current Flink runtime environment in which task can be scheduled to a different taskManager execution only if environmental information is available. This environment object is relatively simp ...

Added by MadRhino on Wed, 24 Nov 2021 22:44:02 +0200

Six steps of jdbc connection to database and handwritten implementation of simple database connection pool

We may often use Hibernate, Mybatis, jpa and other frameworks in our study and work. These frameworks have a good encapsulation of the database connection pool and may ignore the underlying implementation of the database. Today, let's take a look at how to write a simple database connection pool. Before that, let's recall the steps of ja ...

Added by venradio on Wed, 24 Nov 2021 01:48:05 +0200

How to limit the footwall volume of temporary data files in data warehouse

Absrtact: if the intermediate result set of query is too large, resulting in the drop of the temporary data file generated, this paper provides two schemes to limit the amount of data in the lower disk of the temporary data file to prevent affecting the normal business operation. This article is shared from Huawei cloud community< How doe ...

Added by Brusca on Tue, 23 Nov 2021 12:29:29 +0200

Summary of massive data processing calculation and algorithm implementation

This blog mainly explains the calculation and algorithm implementation of massive data processing, and understands the steps that can be taken by massive data processing methods Summary of massive data processing methods , understanding data processing problems can be moved step by step Summary of massive data processing problems Method review ...

Added by hash1 on Fri, 19 Nov 2021 19:08:04 +0200