How to deploy dolphin scheduler Apache Dolphin Scheduler 1.2.0 in CDH5.16.2

Apache Dolphin Scheduler

Component introduction

Distributed and extensible visual DAG workflow task scheduling system. It is committed to solving the complex dependencies in the data processing process, so that the scheduling system can be used out of the box in the data processing process.

Official website: https://dolphinscheduler.apache.org/en-us/

Github : https://github.com/apache/incubator-dolphinscheduler

Deployment environment

  • CDH test environment
    • 6 machines
    • Gateway node deployment worker
    • CM node deploys master and monitors web
    • Hive & spark gateway has been deployed on the gateway node
  • Platform version
    • CDH5.16.2
    • Dolphin Scheduler 1.2.0
  • Basic software
    • PostgreSQL or MySql store metadata

Front end deployment

Installation package download

https://dolphinscheduler.apache.org/en-us/docs/release/download.html

  • Create the deployment folder / opt/ds, upload the tar package to the directory, and unzip it
# create deploy dir
mkdir -p /opt/ds/ds-ui;
# decompression
tar -zxvf apache-dolphinscheduler-incubating-1.2.1-SNAPSHOT-dolphinscheduler-front-bin.tar.gz -C /opt/ds/;
mv apache-dolphinscheduler-incubating-1.2.1-SNAPSHOT-dolphinscheduler-front-bin ds-1.2.0-ui;

Select Automated Deployment

  • Check the yum source. This is the development environment. You need to use an agent for the Internet and install nginx
  • Enter the ds-1.2.0-ui directory and execute the install-dolphin scheduler-ui.sh installation script
  • Modify the front-end port to 8886 to prevent conflict with Hue port
  • Modify the ip address of API server
  • Modify API server port
  • Select centos7 installation

Modify nginx upload size parameter

  • Add nginx configuration client_max_body_size 1024m;
  • Restart nginx
  • This step must be done, otherwise the resource is too large to upload to the resource center
vi /etc/nginx/nginx.conf

# add param
client_max_body_size 1024m;

# restart nginx
systemctl restart nginx

Visit the 8888 (customized as 8886) port on the front-end page, the loading page appears, and the front-end web installation is completed

Back end deployment

preparation

Download installation package

https://dolphinscheduler.apache.org/en-us/docs/release/download.html

Upload the tar package to / opt/ds and decompress it

tar -zxvf apache-dolphinscheduler-incubating-1.2.1-SNAPSHOT-dolphinscheduler-backend-bin.tar.gz -C /opt/ds/;
mv apache-dolphinscheduler-incubating-1.2.1-SNAPSHOT-dolphinscheduler-backend-bin ds-1.2.0-backend;

Create deployment user

  • Create deployment user and set password (all deployment machines)
  • Hang the deployment user under the hadoop group and use hdfs as the resource center
  • Configure sudo security free
# add user dscheduler
useradd dscheduler;
# modify user password
passwd dscheduler;
# add sudo
vi /etc/sudoers;
dscheduler      ALL=(ALL)       NOPASSWD: ALL

  • Switch to the deployment user and configure the machine secret free login. The pseudo distribution needs to configure the machine secret free login
su dscheduler;
ssh-keygen -t rsa;
#Configure mutual security free and stand-alone security free, and [hostname] configure the machine hosts that need security free
ssh-copy-id -i ~/.ssh/id_rsa.pub dscheduler@[hostname];

database initialized

  • mysql entering CDH cluster
    • mysql -uroot -p
  • The default database is pg. mysql needs to add the mysql connector java package to the lib directory
  • Execute the database initialization command and set the access account password
CREATE DATABASE dscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL PRIVILEGES ON dscheduler.* TO 'dscheduler'@'%' IDENTIFIED BY 'xxxx';
GRANT ALL PRIVILEGES ON dscheduler.* TO 'dscheduler'@'localhost' IDENTIFIED BY 'xxxx';
FLUSH PRIVILEGES;

  • Create tables and import basic data
    • Modify the application-dao.properties file in the conf directory
    • Comment out pg and use mysql
    • Add the MySQL connector java package to the lib directory

- implement script Under directory create-dolphinscheduler.sh

Configure environment variables

  • Modify directory permissions
chown -R dscheduler:dscheduler ds-1.2.0-backend/;
chmod -R 755 ds-1.2.0-backend/;

  • Modify the. Dolphin scheduler in the conf/env directory_ Env.sh file
    • Spark task component in ds-1.2.0 can only submit tasks of spark 1
    • SPARK_ HOME1&SPARK_ Home2 is configured as the spark2 home of the cluster
    • You can also comment out SPARK_HOME1
    • Flink is not deployed in the cluster and the parameters are not modified

  • Link the jdk soft link to / bin/java
ln -s /usr/java/jdk1.8.0_131/bin/java /usr/bin/java

  • Modify the configuration of install.sh according to the cluster itself
    • Attention parameters
      • installPath - where to install ds, such as: / opt / ds agent
      • zkQuorum - it must be ip:2181. Remember to bring the 2181 port
      • deployUser - the deployment user needs permission to operate HDFS
    • To use HDFS as the resource center, in the case of HA, you need to copy the core-site.xml file and hdfs-site.xml file of the cluster to the conf directory

Deploy and install kazoo

  • Installing zk tools for python
    • The CDH cluster defaults to Python 2.7
yum -y install python-pip;
pip install kazoo;

  • Execute the install script, sh install.sh
  • Use jps on the worker and master machines to check whether the service is started

  • Access front end
    • User name admin
    • Password dolphin scheduler 123

  • dolphin scheduler 1.2.0 deployment completed

DAG test

  • Create tenant

  • Create user
  • If there is a problem with tenant creation, check whether content center is enabled

  • New project and new workflow

  • Run the workflow to view the execution results

  • So far, the Dolphin Scheduler 1.2.0 dag demo test is completed

Keywords: Big Data

Added by flunn on Thu, 02 Sep 2021 00:58:00 +0300