[proficient in Spark series] is it difficult to start everything? This article makes it easy for you to get started with Spark
๐ Author: "big data Zen"
๐ ** Introduction * *: This article is a series of spark articles. The column will record the contents from the basic to advanced spark, including the introduction of spark, cluster construction, core components, RDD, the use of operators, underlying principles, SparkCore, SparkSQL, SparkStreaming, etc, S ...
Added by stringman on Sun, 05 Dec 2021 18:32:19 +0200
2021-11-29 the 38th step towards procedural ape
catalogue
1, linux overview
2, Installing VMware
3, Installing LINUX
4, linux common commands
4.1 description of command format
4.2 three common commands
4.3 help command
4.4 document processing instructions
4.5 document viewing instructions
4.6 document search instruction
4.7 file (DE) compression instruction
4.8 time instruction ...
Added by programming.name on Mon, 29 Nov 2021 15:20:28 +0200
Feature processing of individual loan default prediction competition in CCF big data and computational intelligence competition
Game address portal:
CCF big data and computing intelligence competition
First read the data
import matplotlib.pyplot as plt
import seaborn as sns
import gc
import re
import pandas as pd
import lightgbm as lgb
import numpy as np
from sklearn.metrics import roc_auc_score, precision_recall_curve, roc_curve, average_precision_score
from sklearn ...
Added by genesysmedia on Sun, 28 Nov 2021 08:15:17 +0200
Big data Flume enterprise development practice
1 replication and multiplexing
1.1 case requirements
Flume-1 is used to monitor file changes. Flume-1 passes the changes to Flume-2, which is responsible for storing them To HDFS. At the same time, Flume-1 passes the changes to Flume-3, which is responsible for outputting them to the local file system.
1.2 demand analysis: single data ...
Added by ss-mike on Fri, 26 Nov 2021 15:40:56 +0200
Hadoop deployment and configuration
Hadoop download address
https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/
1, Hadoop installation
1. Upload hadoop-3.1.3.tar.gz to / opt/software directory of linux
hadoop-3.1.3.tar.gz
2. Unzip hadoop-3.1.3.tar.gz to / opt/server /
[linux@node1 software]$ tar -zxvf hadoop-3.1.3.tar.gz -C /opt/server/
3. Modify / etc/profile. ...
Added by the7soft.com on Fri, 26 Nov 2021 13:00:49 +0200
scala -- process control + yield derivation + scala does not have continue or break?
1. Process control structure
1.1 general
In the actual development, we have to write thousands of lines of code. The order of the code is different, and the execution results will certainly be affected. Some codes can be executed only if they meet specific conditions, and some codes need to be executed repeatedly. How to reasonably plan these ...
Added by xmanofsteel69 on Fri, 26 Nov 2021 02:52:49 +0200
Flink Core Programming
Flink Core Programming
1,Environment
When Flink Job submits to perform calculations, it first establishes a link with the Flink framework, that is, the current Flink runtime environment in which task can be scheduled to a different taskManager execution only if environmental information is available. This environment object is relatively simp ...
Added by MadRhino on Wed, 24 Nov 2021 22:44:02 +0200
Six steps of jdbc connection to database and handwritten implementation of simple database connection pool
We may often use Hibernate, Mybatis, jpa and other frameworks in our study and work. These frameworks have a good encapsulation of the database connection pool and may ignore the underlying implementation of the database. Today, let's take a look at how to write a simple database connection pool. Before that, let's recall the steps of ja ...
Added by venradio on Wed, 24 Nov 2021 01:48:05 +0200
How to limit the footwall volume of temporary data files in data warehouse
Absrtact: if the intermediate result set of query is too large, resulting in the drop of the temporary data file generated, this paper provides two schemes to limit the amount of data in the lower disk of the temporary data file to prevent affecting the normal business operation.
This article is shared from Huawei cloud community< How doe ...
Added by Brusca on Tue, 23 Nov 2021 12:29:29 +0200
Summary of massive data processing calculation and algorithm implementation
This blog mainly explains the calculation and algorithm implementation of massive data processing, and understands the steps that can be taken by massive data processing methods Summary of massive data processing methods , understanding data processing problems can be moved step by step Summary of massive data processing problems
Method review ...
Added by hash1 on Fri, 19 Nov 2021 19:08:04 +0200