Hadoop distributed file system (HDFS)

Hadoop distributed file system brief introduction HDFS (Hadoop distributed file system) is a core component of Hadoop and a distributed storage service Distributed file systems can span polymorphic computers. It has a wide application prospect in the era of big data. They provide the required expansion capability for storing and processing s ...

Added by ajaybuilder on Mon, 03 Jan 2022 16:43:34 +0200

PCT package using R language: drawing road network map (British bicycle database)

This paper mainly refers to: PCT Get started; International application of the PCT methods This paper mainly introduces the R package PCT, whose goal is to improve the accessibility and repeatability of the data generated by the dependency to cycle too (PCT), which is hosted on www.pct.bike. The bicycle use data study (dependency ot cycle - ...

Added by DJTim666 on Mon, 03 Jan 2022 16:29:51 +0200

Elasticsearch 7.X Ik source code interpretation, and custom remote dynamic thesaurus

1, ik remote Thesaurus The previous article explained ik as a whole, including the remote dynamic thesaurus. However, the previous article is based on nginx + static txt file. After modifying the file with nginx, the last modified attribute is automatically added. This method is also officially recommended: Officials recommend using another t ...

Added by socalnate on Mon, 03 Jan 2022 12:30:30 +0200

[review] Spark core programming --- RDD

Spark computing framework encapsulates three data structures to handle different application scenarios in order to process data with high concurrency and high throughput. The three data structures are:  RDD: elastic distributed data set  accumulator: distributed shared write only variables Broadcast variable: distributed shared read-o ...

Added by faraco on Mon, 03 Jan 2022 03:37:59 +0200

Hive [environment setup 02] [hive-3.1.2 version HiveServer2/beeline configuration use]

Hive has built-in HiveServer and HiveServer2 services, both of which allow clients to connect using multiple programming languages. However, HiveServer cannot handle concurrent requests from multiple clients, so HiveServer2 is generated. HiveServer2 (HS2) allows remote clients to submit requests to hive and retrieve results in various programmi ...

Added by greenber on Sun, 02 Jan 2022 03:01:29 +0200

[Tushare big data community - saving your financial data needs]

Tushare big data community - I have everything I want Wande is too expensive? Reptiles don't? But what if we still need financial data? Tushare big data community: I have everything! (tushare ID: 436348) For economic and management researchers, financial data is just needed. A clever woman can't make bricks without straw. In most empirica ...

Added by davidjam on Sat, 01 Jan 2022 13:06:47 +0200

Detailed explanation of Scala pattern matching

Big data technology AI Flink/Spark/Hadoop / data warehouse, data analysis, interview, source code interpretation and other dry goods learning materials 101 original content official account Pattern matching in Scala is similar to the switch syntax in Java int i = 10 switch (i) { case 10 : System.out.println("10"); break; case 20 ...

Added by Altairzq on Sat, 01 Jan 2022 03:26:02 +0200

2, Build Hadoop cluster

1, Create template machine 1.1. Modify the IP settings in the configuration file vim /etc/sysconfig/network-scripts/ifcfg-ens33 #Modification: ONBOOT=yes BOOTPROTO=static IPADDR=192.168.150.211 NETMASK=255.255.255.0 GATEWAY=192.168.150.2 DNS1=192.168.150.2 1.2 modify the host name to hadoop01 vim /etc/hostname 1.3 restart network servic ...

Added by SoccerGloves on Fri, 31 Dec 2021 05:15:31 +0200

Deep Tilling ElasticSearch - Bar Chart / Aggregation by Time Statistics / Range Limited

1. Data preparation 1. Create an index mapping: PUT /cars { "mappings": { "properties": { "price":{ "type": "integer" }, "color":{ "type": "keyword" }, "make":{ "type": "keyword" }, "sold":{ "type": "date" } } } } 2. Index documents: POST /cars ...

Added by hws on Fri, 31 Dec 2021 04:48:02 +0200

Hive: permission management

Storage Based Authorization in the Metastore Server Based on storage authorization, metadata in the Metastore can be protected, but more fine-grained access control (such as column level and row level) is not providedSQL Standards Based Authorization in HiveServer2 Hive authorization based on SQL standard is fully compatible with SQL auth ...

Added by bgbs on Thu, 30 Dec 2021 15:39:51 +0200