Atlas installation of big data components based on Apache version

Atlas2.1.0 detailed installation record of big data components based on Apache open source version (test environment)

Note: Atlas installation refers to a large number of online materials. This record is only used for future convenience. If there is infringement in this article, please contact immediately.

Component version

Component name	Component version
Hadoop	3.2.1
Hive	3.1.2
Hbase	2.3.4
Zookeeper	3.5.9
Kafka	2.6.2
Solr	7.4.0
Atlas	2.1.0
jdk	1.8
Maven	3.6.3

1, Atlas 2 1.0 compilation

Premise: I compile through the virtual machine, which is installed with CentOS 7 6 operating system

1. Construction of virtual machine

slightly

2. Install jdk

1)uninstall centos7.6 Self contained openjdk(This must be uninstalled, or there will be problems with compilation)
rpm -qa | grep openjdk
rpm -e --nodeps + The results of the above query

2)Install your own jdk1.8
mkdir /app
tar -zxvf jdk-8u151-linux-x64.tar.gz -C /app
mv jdk1.8 jdk

Configure environment variables
vim /etc/profile
 Add the following at the end:
export JAVA_HOME=/app/jdk
export PATH=$PATH:$JAVA_HOME/bin:
Save and exit

Make environment variables effective
source /etc/profile

verification
java -version

3. Install Maven

Installed maven Version is maven3.6.3

tar -zxvf apache-maven-3.6.3-bin.tar.gz -C /app
mv apache-maven-3.6.3-bin maven

Configure environment variables
vim /etc/profile
 Add the following at the end:
export MVN_HOME=/app/maven
export PATH=$PATH:$JAVA_HOME/bin:$MAV_HOME/bin:
Save and exit

Make environment variables effective
source /etc/profile

verification
mvn -version

to configure maven Warehouse address

vim /app/maven/cong/settings.xml

add to:
	<mirror>
    	<id>alimaven</id>
    	<name>aliyun maven</name>
    	<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
    	<mirrorOf>central</mirrorOf>
	</mirror>
<!-- Central warehouse 1 -->
    <mirror>
        <id>repo1</id>
        <mirrorOf>central</mirrorOf>
        <name>Human Readable Name for this Mirror.</name>
        <url>https://repo1.maven.org/maven2/</url>
    </mirror>
<!-- Central warehouse 2 -->
    <mirror>
        <id>repo2</id>
        <mirrorOf>central</mirrorOf>
        <name>Human Readable Name for this Mirror.</name>
        <url>https://repo2.maven.org/maven2/</url>
    </mirror>

4. Compiling Atlas

tar -zxvf apache-atlas-2.1.0-sources.tar.gz -C /app
cd /app/apache-atlas-sources-2.1.0

Edit the top-level POM of the project XML file

vim pom.xml

Modify the version of each component, mainly as follows:
<hadoop.version>3.2.1</hadoop.version>
<hbase.version>2.3.4</hbase.version>
<solr.version>7.5.0</solr.version>
<hive.version>3.1.2</hive.version>
<kafka.version>2.2.1</kafka.version>
<kafka.scala.binary.version>2.11</kafka.scala.binary.version>
<calcite.version>1.16.0</calcite.version>
<zookeeper.version>3.5.9</zookeeper.version>
<falcon.version>0.8</falcon.version>
<sqoop.version>1.4.6.2.3.99.0-195</sqoop.version>
<storm.version>1.2.0</storm.version>
<curator.version>4.0.1</curator.version>
<elasticsearch.version>5.6.4</elasticsearch.version>

The part of the code that needs to be modified (the online information says that this part of the code needs to be modified. I have modified and successfully run it. At present, I only tested hive's hook without any problems. I don't know what will happen if I don't modify it)

vim /app/apache-atlas-sources-2.1.0/addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java

577 that 's ok
 take:
String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null;
Replace with:
String catalogName = null;

vim /app/apache-atlas-sources-2.1.0/addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/AtlasHiveHookContext.java

81 that 's ok
 Will:
this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null;
Replace with:
this.metastoreHandler = null;

Compile

cd /app/apache-atlas-sources-2.1.0

Packaging: (use external hbase and solr The packaging method of is not considered here atlas (self contained)
mvn clean -DskipTests package -Pdist -X

Note: errors may be reported in the compilation process, which are basically due to network problems. Try again to solve it. If retry does not solve the problem of downloading jar packages, you can manually download the missing jars and put them into the local maven warehouse for re packaging.

atlas storage location after compilation

cd /app/apache-atlas-sources-2.1.0/distro/target

apache-atlas-2.1.0-bin.tar.gz It's the bag we need

2, Component installation

Note: this atlas installation uses external independent HBase and Solr, so Hadoop, Hive, Zoopkeeper, Kafka, Solr and HBase need to be deployed separately and tested with three virtual machines, as follows:

Virtual machine name	operating system	IP
hadoop01	Centos7.6	192.168.190.15
hadoop02	Centos7.6	192.168.190.16
hadoop03	Centos7.6	192.168.190.17

The environment variables configured for the three machines are as follows: (given here first)

vim /etc/profile

export JAVA_HOME=/app/jdk
export ZK_HOME=/app/zookeeper
export HIVE_HOME=/app/hive
export HADOOP_HOME=/app/hadoop
export HBASE_HOME=/app/hbase
export KAFKA_HOME=/app/kafka
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZK_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin:$KAFKA_HOME/bin:

1.jdk installation

1)uninstall centos7.6 Self contained openjdk
rpm -qa | grep openjdk
rpm -e --nodeps + The results of the above query

2)Install your own jdk1.8
mkdir /app
tar -zxvf jdk-8u151-linux-x64.tar.gz -C /app
mv jdk1.8 jdk

Configure environment variables
vim /etc/profile
 Add the following at the end:
export JAVA_HOME=/app/jdk
export PATH=$PATH:$JAVA_HOME/bin:
Save and exit

Make environment variables effective
source /etc/profile

verification
java -version

Then/app/jdk Copy entire folder to hadoop02,hadoop03 And configure environment variables
scp -r /app/jdk hadoop02:/app/
scp -r /app/jdk hadoop03:/app/

2.Zookeeper installation

mkdir /app
tar -zxvf apache-zookeeper-3.5.9-bin.tar.gz -C /app
mv apache-zookeeper-3.5.9-bin zookeeper
cd /app/zookeeper/conf
 take zoo_sample.cfg Make a copy
cp zoo_sample.cfg zoo.cfg

vim zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/app/zookeeper/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hadoop01:2888:3888
server.2=hadoop02:2888:3888
server.3=hadoop03:2888:3888

establish data file
mkdir /app/zookeeper/data
cd /app/zookeeper/data
touch myid && echo "1" > myid

Then/app/zookeeper Copy entire folder to hadoop02,hadoop03 And configure environment variables
scp -r /app/zookeeper hadoop02:/app/
scp -r /app/zookeeper hadoop03:/app/

And modify hadoop02,hadoop03 On the machine/app/zookeeper/data/myid file
hadoop02   2
hadoop03   3

3 Start on each machine zk
zkServer.sh start

3. Install Hadoop

tar -zxvf hadoop-3.2.1.tar.gz -C /app
mv hadoop-3.2.1 hadoop

All the files that need to be edited are in the/app/hadoop/etc/hadoop Under the directory

core-site.xml

vim core-site.xml

<configuration>
    # The HDFS main entry, mycluster, is only the logical name of the cluster and can be changed at will, but it must be consistent with HDFS site DFS. XML The nameservices value is consistent
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
    </property>

    # The default Hadoop tmp. Dir refers to the / tmp directory, which will cause the data of namenode and datanode to be saved in the volatile directory. Modify it here
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/data/hadoop</value>
    </property>

    # User role configuration. If this item is not configured, an error will be reported on the web page
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>root</value>
    </property>

    # zookeeper cluster address. Only a single set is configured here. If the cluster is separated by commas
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
    </property>
	<property>
		<name>dfs.permissions.enabled</name>
		<value>false</value>
	</property>

	<property>
		<name>hadoop.proxyuser.root.hosts</name>
		<value>*</value>
	</property>

	<property>
		<name>hadoop.proxyuser.root.groups</name>
		<value>*</value>
	</property>
</configuration>

hadoop-env.sh

export JAVA_HOME=/app/jdk
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_ZKFC_USER="root"
export HDFS_JOURNALNODE_USER="root"

hdfs-site.xml

<configuration>
	<property>
       <name>dfs.replication</name>
       <value>2</value>
   </property>
   <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
   </property>
   <!--appoint hdfs of nameservice by mycluster，Need and core-site.xml Consistent in -->
   <property>
       <name>dfs.nameservices</name>
       <value>mycluster</value>
   </property>
   <!-- mycluster There are two below NameNode，namely nn1，nn2 -->
   <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
   </property>
   <!-- RPC mailing address -->
   <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>hadoop01:8020</value>
   </property>
   <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>hadoop02:8020</value>
   </property>
 <!-- http mailing address -->
   <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>hadoop01:9870</value>
   </property>
   <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>hadoop02:9870</value>
   </property>
   <!-- appoint NameNode of edits Metadata in JournalNode Storage location on -->
   <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/mycluster</value>
   </property>
   <!-- appoint JournalNode Location of data on local disk -->
   <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/data/hadoop/ha-hadoop/journaldata</value>
   </property>
	<!-- open NameNode Fail auto switch -->
   <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
   </property>
   <!-- Implementation mode of automatic switching in case of configuration failure -->
   <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
   </property>
   <!-- Configure the isolation mechanism method. Multiple mechanisms are separated by line feed, that is, each mechanism temporarily uses one line-->
   <property>
        <name>dfs.ha.fencing.methods</name>
        <value>
                sshfence
                shell(/bin/true)
        </value>
   </property>
   <!-- use sshfence Isolation mechanism is required ssh No login -->
   <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
   </property>
   <!-- to configure sshfence Isolation mechanism timeout -->
   <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
   </property>
</configuration>

mapred-env.sh

export JAVA_HOME=/app/jdk

mapred-site.xml

<configuration>
     <!-- appoint mr Frame is yarn mode -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <!-- appoint mapreduce jobhistory address -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop01:10020</value>
    </property>

    <!-- Task history server web address -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop01:19888</value>
    </property>

    <property>
      <name>mapreduce.application.classpath</name>
      <value>
                /app/hadoop/etc/hadoop,
                /app/hadoop/share/hadoop/common/*,
                /app/hadoop/share/hadoop/common/lib/*,
                /app/hadoop/share/hadoop/hdfs/*,
                /app/hadoop/share/hadoop/hdfs/lib/*,
                /app/hadoop/share/hadoop/mapreduce/*,
                /app/hadoop/share/hadoop/mapreduce/lib/*,
                /app/hadoop/share/hadoop/yarn/*,
                /app/hadoop/share/hadoop/yarn/lib/*
      </value>
    </property>
</configuration>

yarn-env.sh

export JAVA_HOME=/app/jdk

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
    <!-- open RM High availability -->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>

    <!-- appoint RM of cluster id -->
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>cluster1</value>
    </property>

    <!-- appoint RM Name of -->
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>

    <!-- Specify separately RM Address of -->
    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>hadoop01</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>hadoop02</value>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>hadoop01:8088</value>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>hadoop02:8088</value>
    </property>

    <!-- appoint zk Cluster address -->
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>

    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>86400</value>
    </property>

    <!-- Enable automatic recovery -->
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>

    <!-- formulate resourcemanager The status information of is stored in zookeeper On Cluster -->
    <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>

    <!-- Whether virtual memory limits will be enforced for containers.  -->
    <property>
                <name>yarn.nodemanager.vmem-check-enabled</name>
                <value>false</value>
    </property>

    <property>
                <name>yarn.nodemanager.vmem-pmem-ratio</name>
                <value>5</value>
    </property>

</configuration>

workers

hadoop01
hadoop02
hadoop03

Hadoop 3 has permission problems. In order to avoid startup failure caused by permission problems, add the specified user in the following file

vim /app/hadoop/sbin/start-dfs.sh
vim /app/hadoop/sbin/stop-dfs.sh

add to
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root

vim /app/hadoop/sbin/start-yarn.sh
vim /app/hadoop/sbin/stop-yarn.sh

add to
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

start-up

Zookeeper->JournalNode->format NameNode->Create namespace zkfs->NameNode->Datanode->ResourceManager->NodeManager

Start JournalNode

3 Start on machine JournalNode
cd /app/hadoop/sbin/
./hadoop-daemon.sh start journalnode  start-up journalnode

Format namenode

stay hadoop01 Upper execution
hadoop namenode -format

Copy the contents in / data/hadoop/dfs/name directory to the standby namenode host

If standby namenode If the host does not have this directory, create one
scp -r /data/hadoop/dfs/name hadoop02:/data/hadoop/dfs/name/

Format zkfc

In two namenode On the host zkfc Formatting of
./hdfs zkfc -formatZK

Close JournalNode

3 Shut down on this machine JournalNode
cd /app/hadoop/sbin/
./hadoop-daemon.sh stop journalnode

Start hadoop

stay hadoop01 On the machine:
start-all.sh

4. Install Hbase

tar -zxvf hbase-2.3.4-bin.tar.gz -C /app
mv hbase-2.3.4-bin hbase

All the files that need to be edited are in the/app/hbase/conf Under the directory

hbase-env.sh

export JAVA_HOME=/app/jdk
export HBASE_CLASSPATH=/app/hadoop/etc/hadoop

hbase-site.xml

<configuration>
<!-- mycluster Is based on hdfs-site.xml of dfs.nameservices of value Configure -->
<property>
        <name>hbase.rootdir</name>
        <value>hdfs://mycluster/hbase</value>
</property>
<property>
        <name>hbase.master</name>
        <value>8020</value>
</property>
<!-- zookeeper colony -->
<property>
        <name>hbase.zookeeper.quorum</name>
        <value>hadoop01,hadoop02,hadoop03</value>
</property>
<property>
        <name>hbase.zookeeper.property.clientProt</name>
        <value>2181</value>
</property>
<property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/app/zookeeper/conf</value>
</property>
<property>
        <name>hbase.tmp.dir</name>
        <value>/var/hbase/tmp</value>
</property>
<property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
</property>
<property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
</property>

<!-- If it doesn't start Hmaster，The following errors are reported when viewing the log:  The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
     Then enable the configuration
<property>
        <name>hbase.unsafe.stream.capability.enforce</name>
        <value>false</value>
</property>
-->
</configuration>

regionservers

hadoop01
hadoop02
hadoop03

Hbase needs to edit the file backup masters to start high availability (add the standby HMaster host in it)

vim backup-masters

hadoop03

Start Hbase

start-hbase.sh

5. Install hive

mysql install
 slightly

tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /app
mv apache-hive-3.1.2-bin hive

All the files that need to be edited are in the/app/hive/conf Under the directory

hive-env.sh

export HADOOP_HOME=/app/hadoop/
export HIVE_CONF_DIR=/app/hive/conf/

hive-site.xml

<configuration>
<!-- record HIve Metadata information in is recorded in mysql in -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop01:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
</property>

<!-- jdbc mysql drive -->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>

<!-- mysql User name and password for -->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>mysql in hive User name for</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mysql in hive Password for</value>
</property>

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>

<property>
<name>hive.exec.scratchdir</name>
<value>/user/hive/tmp</value>
</property>

<!-- Log directory -->
<property>
<name>hive.querylog.location</name>
<value>/user/hive/log</value>
</property>

<!-- set up metastore Node information for -->
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop01:9083</value>
</property>
<!-- Port for client remote connection -->
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hive.server2.webui.host</name>
<value>0.0.0.0</value>
</property>

<!-- hive Port of the page served -->
<property>
<name>hive.server2.webui.port</name>
<value>10002</value>
</property>

<property>
<name>hive.server2.long.polling.timeout</name>
<value>5000</value>
</property>

<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>

<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>

<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>

<property>
<name>hive.execution.engine</name>
<value>mr</value>
</property>
</configuration>

take mysql Drive of jar Upload package to hive of lib Under the directory

Initialize hive's metabase

schematool -dbType mysql -initSchema

Start hive's matestore

hive --service matestore &

Enter hive for verification

hive

Distribute the / app/hive directory (the purpose is that all machines can use hive without modifying any configuration)

scp -r /app/hive hadoop02:/app/
scp -r /app/hive hadoop02:/app/

6. Install Kafka

tar -zxvf kafka_2.12-2.6.2.tgz -C /app
mv kafka_2.12-2.6.2 kafka

All the files that need to be edited are in the/app/kafka/config Under the directory

server.properties

broker.id=0
zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181

take/app/kafka The documents are distributed to the remaining machines and modified/app/kafka/config/server.properties In the file broker.id Value of
scp -r /app/kafka hadoop02:/app/
scp -r /app/kafka hadoop02:/app/

vim /app/kafka/config/server.properties
hadoop02    1
hadoop03    2

Start kafka

3 Start the machines separately kafka
cd /app/kafka/bin

Background start:
./kafka-server-start.sh -daemon ../config/server.properties

7. Install solr

tar -zxvf solr-7.4.0.tgz -C /app
mv solr-7.4.0 solr

All the files that need to be edited are in the/app/solr/bin Under the directory

solr.in.sh

ZK_HOST="hadoop01:2181,hadoop02:2181,hadoop03:2181"
SOLR_HOST="hadoop01"

take/app/solr The documents are distributed to the remaining machines and modified/app/solr/bin/solr.in.sh In the file SOLR_HOST Value of
scp -r /app/solr hadoop02:/app/
scp -r /app/solr hadoop02:/app/

vim /app/solr/bin/solr.in.sh
hadoop02    hadoop02
hadoop03    hadoop03

Start solr

3 Start the machines separately solr
cd /app/solr/bin
./solr start -force

8. Install Atlas

take'one'Compiled in apache-atlas-2.1.0-bin.tar.gz Upload the package (here it is uploaded to hadoop03 (machine)

tar -zxvf apache-atlas-2.1.0-bin.tar.gz -C /app
mv apache-atlas-2.1.0-bin atlas

All the files that need to be edited are in the/app/atlas/conf Under the directory

atlas-env.sh

export MANAGE_LOCAL_HBASE=false

# indicates whether or not a local instance of Solr should be started for Atlas
export MANAGE_LOCAL_SOLR=false

# indicates whether or not cassandra is the embedded backend for Atlas
export MANAGE_EMBEDDED_CASSANDRA=false

# indicates whether or not a local instance of Elasticsearch should be started for Atlas
export MANAGE_LOCAL_ELASTICSEARCH=false
export JAVA_HOME=/app/jdk
export HBASE_CONF_DIR=/app/hbase/conf

atlas-application.properties (all contents are given here, only hive is integrated as a test. If there are other components, install the components and configure atlas hook)

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#########  Graph Database Configs  #########

# Graph Database

#Configures the graph database to use.  Defaults to JanusGraph
#atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase

# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with  -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various  storage backends.
#
atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus

#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=192.168.190.15:2181,192.168.190.16:2181,192.168.190.17:2181
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000

#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername=
#atlas.graph.storage.port=

# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true

# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1

# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository

# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1


# Graph Search Index
atlas.graph.index.search.backend=solr

#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=192.168.190.15:2181,192.168.190.16:2181,192.168.190.17:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true

#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr

# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: https://docs.janusgraph.org/latest/elasticsearch.html
#atlas.graph.index.search.hostname=localhost
#atlas.graph.index.search.elasticsearch.client-only=true

# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150

#########  Import Configs  #########
#atlas.import.temp.directory=/temp/import

#########  Notification Configs  #########
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=192.168.190.15:2181,192.168.190.16:2181,192.168.190.17:2181
atlas.kafka.bootstrap.servers=192.168.190.15:9092,192.168.190.16:9092,192.168.190.17:9092
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas

atlas.kafka.enable.auto.commit=true
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000

atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab

## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443

#########  Security Properties  #########

# SSL config
atlas.enableTLS=false

#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks

#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks

# Authentication config

atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true

#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none

#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties

### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true

######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://<ldap server url>:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=<password>
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=<default role>


######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=<password>
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=<default role>

#########  JAAS Configuration ########

#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/_HOST@EXAMPLE.COM

#########  Server Properties  #########
atlas.rest.address=http://192.168.190.17:21000
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false

#########  Entity Audit Configs  #########
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=192.168.190.15:2181,192.168.190.16:2181,192.168.190.17:2181

#########  High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>



######### Atlas Authorization #########
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json

#########  Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=

#########  Performance Configs  #########
#atlas.graph.storage.lock.retries=10
#atlas.graph.storage.cache.db-cache-time=120000

#########  CSRF Configs  #########
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER

############ KNOX Configs ################
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://<knox gateway ip>:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=

############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query.<key>.<name>
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=

#########  Compiled Query Cache Configuration  #########

# The size of the compiled query cache.  Older queries will be evicted from the cache
# when we reach the capacity.

#atlas.CompiledQueryCache.capacity=1000

# Allows notifications when items are evicted from the compiled query
# cache because it has become full.  A warning will be issued when
# the specified number of evictions have occurred.  If the eviction
# warning threshold <= 0, no eviction warnings will be issued.

#atlas.CompiledQueryCache.evictionWarningThrottle=0


#########  Full Text Search Configuration  #########

#Set to false to disable full text search.
#atlas.search.fulltext.enable=true

#########  Gremlin Search Configuration  #########

#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false


########## Add http headers ###########

#atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.<headerName>=<headerValue>


#########  UI Configuration ########

atlas.ui.default.version=v1


######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary

Integrated hbase

ln -s /app/hbase/conf/ /app/atlas/conf/hbase/
cp /app/hbase/conf/* /app/atlas/conf/hbase/

Integrated solr

cp -r /app/atlas/conf/solr /app/solr/
cd /app/solr/
mv solr/ atlas-solr
scp -r ./atlas-solr/ hadoop01:/app/solr/
scp -r ./atlas-solr/ hadoop02:/app/solr/


restart solr
cd /app/solr/bin/
./solr stop -force
./solr start -force

stay solr Create index in
solr create -c vertex_index -d /app/solr/atlas-solr/ -shards 3 -replicationFactor 2 -force
solr create -c edge_index -d /app/solr/atlas-solr/ -shards 3 -replicationFactor 2 -force
solr create -c fulltext_index -d /app/solr/atlas-solr/ -shards 3 -replicationFactor 2 -force

kafka related operations

stay kafka Create correlation in topic
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK

Integrated hive

cd /app/atlas/conf      (This is very important. It must be in this directory)
zip -u /app/atlas/hook/hive/hive-bridge-shim-2.1.0.jar atlas-application.properties

cp -r /app/atlas/hook/hive/* /app/hive/lib/
scp -r /app/atlas/hook/hive/* hadoop01:/app/hive/lib/
scp -r /app/atlas/hook/hive/* hadoop02:/app/hive/lib/
cp ./atlas-application.properties /app/hive/conf/
scp ./atlas-application.properties hadoop01:/app/hive/conf/
scp ./atlas-application.properties hadoop02:/app/hive/conf/

hive related configuration

3 All machines need configuration
cd /app/hive/conf

hive-env.sh Add in
export JAVA_HOME=/app/jdk
export HIVE_AUX_JARS_PATH=/app/hive/lib/


hive-site.xml Add to:
<property>
      <name>hive.exec.post.hooks</name>
      <value>org.apache.atlas.hive.hook.HiveHook</value>
</property>

Start atlas

cd /app/atlas/bin

./atlas_start.py

Description: first start atlas After a long wait, even if the display is started, it will take some time to access atlas web ui
 Can be in/app/atlas/logs View logs and report errors in the directory

Import hive metadata after startup

cd /app/atlas/bin
./import-hive.sh

After that, you can check the normal blood relationship

Keywords: Hadoop hive Atlas

Added by Dark.Munk on Wed, 02 Feb 2022 06:26:47 +0200

Programming VIP