How to deploy dolphin scheduler Apache Dolphin Scheduler 1.2.0 in CDH5.16.2

Apache Dolphin Scheduler

Component introduction

Distributed and extensible visual DAG workflow task scheduling system. It is committed to solving the complex dependencies in the data processing process, so that the scheduling system can be used out of the box in the data processing process.

Official website: https://dolphinscheduler.apache.org/en-us/

Github : https://github.com/apache/incubator-dolphinscheduler

Deployment environment

CDH test environment
- 6 machines
- Gateway node deployment worker
- CM node deploys master and monitors web
- Hive & spark gateway has been deployed on the gateway node
Platform version
- CDH5.16.2
- Dolphin Scheduler 1.2.0
Basic software
- PostgreSQL or MySql store metadata

Front end deployment

Installation package download

https://dolphinscheduler.apache.org/en-us/docs/release/download.html

Create the deployment folder / opt/ds, upload the tar package to the directory, and unzip it

# create deploy dir
mkdir -p /opt/ds/ds-ui;
# decompression
tar -zxvf apache-dolphinscheduler-incubating-1.2.1-SNAPSHOT-dolphinscheduler-front-bin.tar.gz -C /opt/ds/;
mv apache-dolphinscheduler-incubating-1.2.1-SNAPSHOT-dolphinscheduler-front-bin ds-1.2.0-ui;

Select Automated Deployment

Check the yum source. This is the development environment. You need to use an agent for the Internet and install nginx
Enter the ds-1.2.0-ui directory and execute the install-dolphin scheduler-ui.sh installation script
Modify the front-end port to 8886 to prevent conflict with Hue port
Modify the ip address of API server
Modify API server port
Select centos7 installation

Modify nginx upload size parameter

Add nginx configuration client_max_body_size 1024m;
Restart nginx
This step must be done, otherwise the resource is too large to upload to the resource center

vi /etc/nginx/nginx.conf

# add param
client_max_body_size 1024m;

# restart nginx
systemctl restart nginx

Visit the 8888 (customized as 8886) port on the front-end page, the loading page appears, and the front-end web installation is completed

Back end deployment

preparation

Download installation package

https://dolphinscheduler.apache.org/en-us/docs/release/download.html

Upload the tar package to / opt/ds and decompress it

tar -zxvf apache-dolphinscheduler-incubating-1.2.1-SNAPSHOT-dolphinscheduler-backend-bin.tar.gz -C /opt/ds/;
mv apache-dolphinscheduler-incubating-1.2.1-SNAPSHOT-dolphinscheduler-backend-bin ds-1.2.0-backend;

Create deployment user

Create deployment user and set password (all deployment machines)
Hang the deployment user under the hadoop group and use hdfs as the resource center
Configure sudo security free

# add user dscheduler
useradd dscheduler;
# modify user password
passwd dscheduler;
# add sudo
vi /etc/sudoers;
dscheduler      ALL=(ALL)       NOPASSWD: ALL

Switch to the deployment user and configure the machine secret free login. The pseudo distribution needs to configure the machine secret free login

su dscheduler;
ssh-keygen -t rsa;
#Configure mutual security free and stand-alone security free, and [hostname] configure the machine hosts that need security free
ssh-copy-id -i ~/.ssh/id_rsa.pub dscheduler@[hostname];

database initialized

mysql entering CDH cluster
- mysql -uroot -p
The default database is pg. mysql needs to add the mysql connector java package to the lib directory
Execute the database initialization command and set the access account password

CREATE DATABASE dscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL PRIVILEGES ON dscheduler.* TO 'dscheduler'@'%' IDENTIFIED BY 'xxxx';
GRANT ALL PRIVILEGES ON dscheduler.* TO 'dscheduler'@'localhost' IDENTIFIED BY 'xxxx';
FLUSH PRIVILEGES;

Create tables and import basic data
- Modify the application-dao.properties file in the conf directory
- Comment out pg and use mysql
- Add the MySQL connector java package to the lib directory

- implement script Under directory create-dolphinscheduler.sh

Configure environment variables

Modify directory permissions

chown -R dscheduler:dscheduler ds-1.2.0-backend/;
chmod -R 755 ds-1.2.0-backend/;

Modify the. Dolphin scheduler in the conf/env directory_ Env.sh file
- Spark task component in ds-1.2.0 can only submit tasks of spark 1
- SPARK_ HOME1&SPARK_ Home2 is configured as the spark2 home of the cluster
- You can also comment out SPARK_HOME1
- Flink is not deployed in the cluster and the parameters are not modified

Link the jdk soft link to / bin/java

ln -s /usr/java/jdk1.8.0_131/bin/java /usr/bin/java

Modify the configuration of install.sh according to the cluster itself
- Attention parameters
  - installPath - where to install ds, such as: / opt / ds agent
  - zkQuorum - it must be ip:2181. Remember to bring the 2181 port
  - deployUser - the deployment user needs permission to operate HDFS
- To use HDFS as the resource center, in the case of HA, you need to copy the core-site.xml file and hdfs-site.xml file of the cluster to the conf directory

Deploy and install kazoo

Installing zk tools for python
- The CDH cluster defaults to Python 2.7

yum -y install python-pip;
pip install kazoo;

Execute the install script, sh install.sh
Use jps on the worker and master machines to check whether the service is started

Access front end
- User name admin
- Password dolphin scheduler 123

dolphin scheduler 1.2.0 deployment completed

DAG test

Create tenant

Create user
If there is a problem with tenant creation, check whether content center is enabled

New project and new workflow

Run the workflow to view the execution results

So far, the Dolphin Scheduler 1.2.0 dag demo test is completed

Keywords: Big Data

Added by flunn on Thu, 02 Sep 2021 00:58:00 +0300

Programming VIP