Atlas2.1.0 detailed installation record of big data components based on Apache open source version (test environment)
Note: Atlas installation refers to a large number of online materials. This record is only used for future convenience. If there is infringement in this article, please contact immediately.
Component version
Component name | Component version |
---|---|
Hadoop | 3.2.1 |
Hive | 3.1.2 |
Hbase | 2.3.4 |
Zookeeper | 3.5.9 |
Kafka | 2.6.2 |
Solr | 7.4.0 |
Atlas | 2.1.0 |
jdk | 1.8 |
Maven | 3.6.3 |
1, Atlas 2 1.0 compilation
Premise: I compile through the virtual machine, which is installed with CentOS 7 6 operating system
1. Construction of virtual machine
slightly
2. Install jdk
1)uninstall centos7.6 Self contained openjdk(This must be uninstalled, or there will be problems with compilation) rpm -qa | grep openjdk rpm -e --nodeps + The results of the above query 2)Install your own jdk1.8 mkdir /app tar -zxvf jdk-8u151-linux-x64.tar.gz -C /app mv jdk1.8 jdk Configure environment variables vim /etc/profile Add the following at the end: export JAVA_HOME=/app/jdk export PATH=$PATH:$JAVA_HOME/bin: Save and exit Make environment variables effective source /etc/profile verification java -version
3. Install Maven
Installed maven Version is maven3.6.3 tar -zxvf apache-maven-3.6.3-bin.tar.gz -C /app mv apache-maven-3.6.3-bin maven Configure environment variables vim /etc/profile Add the following at the end: export MVN_HOME=/app/maven export PATH=$PATH:$JAVA_HOME/bin:$MAV_HOME/bin: Save and exit Make environment variables effective source /etc/profile verification mvn -version
to configure maven Warehouse address vim /app/maven/cong/settings.xml add to: <mirror> <id>alimaven</id> <name>aliyun maven</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> <mirrorOf>central</mirrorOf> </mirror> <!-- Central warehouse 1 --> <mirror> <id>repo1</id> <mirrorOf>central</mirrorOf> <name>Human Readable Name for this Mirror.</name> <url>https://repo1.maven.org/maven2/</url> </mirror> <!-- Central warehouse 2 --> <mirror> <id>repo2</id> <mirrorOf>central</mirrorOf> <name>Human Readable Name for this Mirror.</name> <url>https://repo2.maven.org/maven2/</url> </mirror>
4. Compiling Atlas
tar -zxvf apache-atlas-2.1.0-sources.tar.gz -C /app cd /app/apache-atlas-sources-2.1.0
Edit the top-level POM of the project XML file
vim pom.xml Modify the version of each component, mainly as follows: <hadoop.version>3.2.1</hadoop.version> <hbase.version>2.3.4</hbase.version> <solr.version>7.5.0</solr.version> <hive.version>3.1.2</hive.version> <kafka.version>2.2.1</kafka.version> <kafka.scala.binary.version>2.11</kafka.scala.binary.version> <calcite.version>1.16.0</calcite.version> <zookeeper.version>3.5.9</zookeeper.version> <falcon.version>0.8</falcon.version> <sqoop.version>1.4.6.2.3.99.0-195</sqoop.version> <storm.version>1.2.0</storm.version> <curator.version>4.0.1</curator.version> <elasticsearch.version>5.6.4</elasticsearch.version>
The part of the code that needs to be modified (the online information says that this part of the code needs to be modified. I have modified and successfully run it. At present, I only tested hive's hook without any problems. I don't know what will happen if I don't modify it)
vim /app/apache-atlas-sources-2.1.0/addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 577 that 's ok take: String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null; Replace with: String catalogName = null;
vim /app/apache-atlas-sources-2.1.0/addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/AtlasHiveHookContext.java 81 that 's ok Will: this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null; Replace with: this.metastoreHandler = null;
Compile
cd /app/apache-atlas-sources-2.1.0 Packaging: (use external hbase and solr The packaging method of is not considered here atlas (self contained) mvn clean -DskipTests package -Pdist -X
Note: errors may be reported in the compilation process, which are basically due to network problems. Try again to solve it. If retry does not solve the problem of downloading jar packages, you can manually download the missing jars and put them into the local maven warehouse for re packaging.
atlas storage location after compilation
cd /app/apache-atlas-sources-2.1.0/distro/target apache-atlas-2.1.0-bin.tar.gz It's the bag we need
2, Component installation
Note: this atlas installation uses external independent HBase and Solr, so Hadoop, Hive, Zoopkeeper, Kafka, Solr and HBase need to be deployed separately and tested with three virtual machines, as follows:
Virtual machine name | operating system | IP |
---|---|---|
hadoop01 | Centos7.6 | 192.168.190.15 |
hadoop02 | Centos7.6 | 192.168.190.16 |
hadoop03 | Centos7.6 | 192.168.190.17 |
The environment variables configured for the three machines are as follows: (given here first)
vim /etc/profile export JAVA_HOME=/app/jdk export ZK_HOME=/app/zookeeper export HIVE_HOME=/app/hive export HADOOP_HOME=/app/hadoop export HBASE_HOME=/app/hbase export KAFKA_HOME=/app/kafka export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZK_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin:$KAFKA_HOME/bin:
1.jdk installation
1)uninstall centos7.6 Self contained openjdk rpm -qa | grep openjdk rpm -e --nodeps + The results of the above query 2)Install your own jdk1.8 mkdir /app tar -zxvf jdk-8u151-linux-x64.tar.gz -C /app mv jdk1.8 jdk Configure environment variables vim /etc/profile Add the following at the end: export JAVA_HOME=/app/jdk export PATH=$PATH:$JAVA_HOME/bin: Save and exit Make environment variables effective source /etc/profile verification java -version Then/app/jdk Copy entire folder to hadoop02,hadoop03 And configure environment variables scp -r /app/jdk hadoop02:/app/ scp -r /app/jdk hadoop03:/app/
2.Zookeeper installation
mkdir /app tar -zxvf apache-zookeeper-3.5.9-bin.tar.gz -C /app mv apache-zookeeper-3.5.9-bin zookeeper cd /app/zookeeper/conf take zoo_sample.cfg Make a copy cp zoo_sample.cfg zoo.cfg
vim zoo.cfg # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/app/zookeeper/data # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 server.1=hadoop01:2888:3888 server.2=hadoop02:2888:3888 server.3=hadoop03:2888:3888
establish data file mkdir /app/zookeeper/data cd /app/zookeeper/data touch myid && echo "1" > myid
Then/app/zookeeper Copy entire folder to hadoop02,hadoop03 And configure environment variables scp -r /app/zookeeper hadoop02:/app/ scp -r /app/zookeeper hadoop03:/app/ And modify hadoop02,hadoop03 On the machine/app/zookeeper/data/myid file hadoop02 2 hadoop03 3
3 Start on each machine zk zkServer.sh start
3. Install Hadoop
tar -zxvf hadoop-3.2.1.tar.gz -C /app mv hadoop-3.2.1 hadoop All the files that need to be edited are in the/app/hadoop/etc/hadoop Under the directory
core-site.xml
vim core-site.xml <configuration> # The HDFS main entry, mycluster, is only the logical name of the cluster and can be changed at will, but it must be consistent with HDFS site DFS. XML The nameservices value is consistent <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> # The default Hadoop tmp. Dir refers to the / tmp directory, which will cause the data of namenode and datanode to be saved in the volatile directory. Modify it here <property> <name>hadoop.tmp.dir</name> <value>/data/hadoop</value> </property> # User role configuration. If this item is not configured, an error will be reported on the web page <property> <name>hadoop.http.staticuser.user</name> <value>root</value> </property> # zookeeper cluster address. Only a single set is configured here. If the cluster is separated by commas <property> <name>ha.zookeeper.quorum</name> <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration>
hadoop-env.sh
export JAVA_HOME=/app/jdk export HDFS_NAMENODE_USER="root" export HDFS_DATANODE_USER="root" export HDFS_ZKFC_USER="root" export HDFS_JOURNALNODE_USER="root"
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <!--appoint hdfs of nameservice by mycluster,Need and core-site.xml Consistent in --> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <!-- mycluster There are two below NameNode,namely nn1,nn2 --> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <!-- RPC mailing address --> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>hadoop01:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>hadoop02:8020</value> </property> <!-- http mailing address --> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>hadoop01:9870</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>hadoop02:9870</value> </property> <!-- appoint NameNode of edits Metadata in JournalNode Storage location on --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/mycluster</value> </property> <!-- appoint JournalNode Location of data on local disk --> <property> <name>dfs.journalnode.edits.dir</name> <value>/data/hadoop/ha-hadoop/journaldata</value> </property> <!-- open NameNode Fail auto switch --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- Implementation mode of automatic switching in case of configuration failure --> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- Configure the isolation mechanism method. Multiple mechanisms are separated by line feed, that is, each mechanism temporarily uses one line--> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <!-- use sshfence Isolation mechanism is required ssh No login --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <!-- to configure sshfence Isolation mechanism timeout --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> </configuration>
mapred-env.sh
export JAVA_HOME=/app/jdk
mapred-site.xml
<configuration> <!-- appoint mr Frame is yarn mode --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- appoint mapreduce jobhistory address --> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop01:10020</value> </property> <!-- Task history server web address --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop01:19888</value> </property> <property> <name>mapreduce.application.classpath</name> <value> /app/hadoop/etc/hadoop, /app/hadoop/share/hadoop/common/*, /app/hadoop/share/hadoop/common/lib/*, /app/hadoop/share/hadoop/hdfs/*, /app/hadoop/share/hadoop/hdfs/lib/*, /app/hadoop/share/hadoop/mapreduce/*, /app/hadoop/share/hadoop/mapreduce/lib/*, /app/hadoop/share/hadoop/yarn/*, /app/hadoop/share/hadoop/yarn/lib/* </value> </property> </configuration>
yarn-env.sh
export JAVA_HOME=/app/jdk
yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <!-- open RM High availability --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- appoint RM of cluster id --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>cluster1</value> </property> <!-- appoint RM Name of --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- Specify separately RM Address of --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop01</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop02</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>hadoop01:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>hadoop02:8088</value> </property> <!-- appoint zk Cluster address --> <property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>86400</value> </property> <!-- Enable automatic recovery --> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!-- formulate resourcemanager The status information of is stored in zookeeper On Cluster --> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <!-- Whether virtual memory limits will be enforced for containers. --> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>5</value> </property> </configuration>
workers
hadoop01 hadoop02 hadoop03
Hadoop 3 has permission problems. In order to avoid startup failure caused by permission problems, add the specified user in the following file
vim /app/hadoop/sbin/start-dfs.sh vim /app/hadoop/sbin/stop-dfs.sh add to HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root HDFS_JOURNALNODE_USER=root HDFS_ZKFC_USER=root
vim /app/hadoop/sbin/start-yarn.sh vim /app/hadoop/sbin/stop-yarn.sh add to YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
start-up
Zookeeper->JournalNode->format NameNode->Create namespace zkfs->NameNode->Datanode->ResourceManager->NodeManager
Start JournalNode
3 Start on machine JournalNode cd /app/hadoop/sbin/ ./hadoop-daemon.sh start journalnode start-up journalnode
Format namenode
stay hadoop01 Upper execution hadoop namenode -format
Copy the contents in / data/hadoop/dfs/name directory to the standby namenode host
If standby namenode If the host does not have this directory, create one scp -r /data/hadoop/dfs/name hadoop02:/data/hadoop/dfs/name/
Format zkfc
In two namenode On the host zkfc Formatting of ./hdfs zkfc -formatZK
Close JournalNode
3 Shut down on this machine JournalNode cd /app/hadoop/sbin/ ./hadoop-daemon.sh stop journalnode
Start hadoop
stay hadoop01 On the machine: start-all.sh
4. Install Hbase
tar -zxvf hbase-2.3.4-bin.tar.gz -C /app mv hbase-2.3.4-bin hbase All the files that need to be edited are in the/app/hbase/conf Under the directory
hbase-env.sh
export JAVA_HOME=/app/jdk export HBASE_CLASSPATH=/app/hadoop/etc/hadoop
hbase-site.xml
<configuration> <!-- mycluster Is based on hdfs-site.xml of dfs.nameservices of value Configure --> <property> <name>hbase.rootdir</name> <value>hdfs://mycluster/hbase</value> </property> <property> <name>hbase.master</name> <value>8020</value> </property> <!-- zookeeper colony --> <property> <name>hbase.zookeeper.quorum</name> <value>hadoop01,hadoop02,hadoop03</value> </property> <property> <name>hbase.zookeeper.property.clientProt</name> <value>2181</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/app/zookeeper/conf</value> </property> <property> <name>hbase.tmp.dir</name> <value>/var/hbase/tmp</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <!-- If it doesn't start Hmaster,The following errors are reported when viewing the log: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it. Then enable the configuration <property> <name>hbase.unsafe.stream.capability.enforce</name> <value>false</value> </property> --> </configuration>
regionservers
hadoop01 hadoop02 hadoop03
Hbase needs to edit the file backup masters to start high availability (add the standby HMaster host in it)
vim backup-masters hadoop03
Start Hbase
start-hbase.sh
5. Install hive
mysql install slightly tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /app mv apache-hive-3.1.2-bin hive All the files that need to be edited are in the/app/hive/conf Under the directory
hive-env.sh
export HADOOP_HOME=/app/hadoop/ export HIVE_CONF_DIR=/app/hive/conf/
hive-site.xml
<configuration> <!-- record HIve Metadata information in is recorded in mysql in --> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop01:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value> </property> <!-- jdbc mysql drive --> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <!-- mysql User name and password for --> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>mysql in hive User name for</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>mysql in hive Password for</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> <name>hive.exec.scratchdir</name> <value>/user/hive/tmp</value> </property> <!-- Log directory --> <property> <name>hive.querylog.location</name> <value>/user/hive/log</value> </property> <!-- set up metastore Node information for --> <property> <name>hive.metastore.uris</name> <value>thrift://hadoop01:9083</value> </property> <!-- Port for client remote connection --> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> <property> <name>hive.server2.thrift.bind.host</name> <value>0.0.0.0</value> </property> <property> <name>hive.server2.webui.host</name> <value>0.0.0.0</value> </property> <!-- hive Port of the page served --> <property> <name>hive.server2.webui.port</name> <value>10002</value> </property> <property> <name>hive.server2.long.polling.timeout</name> <value>5000</value> </property> <property> <name>hive.server2.enable.doAs</name> <value>true</value> </property> <property> <name>datanucleus.autoCreateSchema</name> <value>false</value> </property> <property> <name>datanucleus.fixedDatastore</name> <value>true</value> </property> <property> <name>hive.execution.engine</name> <value>mr</value> </property> </configuration>
take mysql Drive of jar Upload package to hive of lib Under the directory
Initialize hive's metabase
schematool -dbType mysql -initSchema
Start hive's matestore
hive --service matestore &
Enter hive for verification
hive
Distribute the / app/hive directory (the purpose is that all machines can use hive without modifying any configuration)
scp -r /app/hive hadoop02:/app/ scp -r /app/hive hadoop02:/app/
6. Install Kafka
tar -zxvf kafka_2.12-2.6.2.tgz -C /app mv kafka_2.12-2.6.2 kafka All the files that need to be edited are in the/app/kafka/config Under the directory
server.properties
broker.id=0 zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181
take/app/kafka The documents are distributed to the remaining machines and modified/app/kafka/config/server.properties In the file broker.id Value of scp -r /app/kafka hadoop02:/app/ scp -r /app/kafka hadoop02:/app/
vim /app/kafka/config/server.properties hadoop02 1 hadoop03 2
Start kafka
3 Start the machines separately kafka cd /app/kafka/bin Background start: ./kafka-server-start.sh -daemon ../config/server.properties
7. Install solr
tar -zxvf solr-7.4.0.tgz -C /app mv solr-7.4.0 solr All the files that need to be edited are in the/app/solr/bin Under the directory
solr.in.sh
ZK_HOST="hadoop01:2181,hadoop02:2181,hadoop03:2181" SOLR_HOST="hadoop01"
take/app/solr The documents are distributed to the remaining machines and modified/app/solr/bin/solr.in.sh In the file SOLR_HOST Value of scp -r /app/solr hadoop02:/app/ scp -r /app/solr hadoop02:/app/
vim /app/solr/bin/solr.in.sh hadoop02 hadoop02 hadoop03 hadoop03
Start solr
3 Start the machines separately solr cd /app/solr/bin ./solr start -force
8. Install Atlas
take'one'Compiled in apache-atlas-2.1.0-bin.tar.gz Upload the package (here it is uploaded to hadoop03 (machine)
tar -zxvf apache-atlas-2.1.0-bin.tar.gz -C /app mv apache-atlas-2.1.0-bin atlas All the files that need to be edited are in the/app/atlas/conf Under the directory
atlas-env.sh
export MANAGE_LOCAL_HBASE=false # indicates whether or not a local instance of Solr should be started for Atlas export MANAGE_LOCAL_SOLR=false # indicates whether or not cassandra is the embedded backend for Atlas export MANAGE_EMBEDDED_CASSANDRA=false # indicates whether or not a local instance of Elasticsearch should be started for Atlas export MANAGE_LOCAL_ELASTICSEARCH=false export JAVA_HOME=/app/jdk export HBASE_CONF_DIR=/app/hbase/conf
atlas-application.properties (all contents are given here, only hive is integrated as a test. If there are other components, install the components and configure atlas hook)
# # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ######### Graph Database Configs ######### # Graph Database #Configures the graph database to use. Defaults to JanusGraph #atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase # Graph Storage # Set atlas.graph.storage.backend to the correct value for your desired storage # backend. Possible values: # # hbase # cassandra # embeddedcassandra - Should only be set by building Atlas with -Pdist,embedded-cassandra-solr # berkeleyje # # See the configuration documentation for more information about configuring the various storage backends. # atlas.graph.storage.backend=hbase2 atlas.graph.storage.hbase.table=apache_atlas_janus #Hbase #For standalone mode , specify localhost #for distributed mode, specify zookeeper quorum here atlas.graph.storage.hostname=192.168.190.15:2181,192.168.190.16:2181,192.168.190.17:2181 atlas.graph.storage.hbase.regions-per-server=1 atlas.graph.storage.lock.wait-time=10000 #In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the #the following properties #atlas.graph.storage.clustername= #atlas.graph.storage.port= # Gremlin Query Optimizer # # Enables rewriting gremlin queries to maximize performance. This flag is provided as # a possible way to work around any defects that are found in the optimizer until they # are resolved. #atlas.query.gremlinOptimizerEnabled=true # Delete handler # # This allows the default behavior of doing "soft" deletes to be changed. # # Allowed Values: # org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes # org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes # #atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 # Entity audit repository # # This allows the default behavior of logging entity changes to hbase to be changed. # # Allowed Values: # org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase # org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra # org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository # atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository # if Cassandra is used as a backend for audit from the above property, uncomment and set the following # properties appropriately. If using the embedded cassandra profile, these properties can remain # commented out. # atlas.EntityAuditRepository.keyspace=atlas_audit # atlas.EntityAuditRepository.replicationFactor=1 # Graph Search Index atlas.graph.index.search.backend=solr #Solr #Solr cloud mode properties atlas.graph.index.search.solr.mode=cloud atlas.graph.index.search.solr.zookeeper-url=192.168.190.15:2181,192.168.190.16:2181,192.168.190.17:2181 atlas.graph.index.search.solr.zookeeper-connect-timeout=60000 atlas.graph.index.search.solr.zookeeper-session-timeout=60000 atlas.graph.index.search.solr.wait-searcher=true #Solr http mode properties #atlas.graph.index.search.solr.mode=http #atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr # ElasticSearch support (Tech Preview) # Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the # hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters. # # Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product # https://www.elastic.co/products/x-pack/security # # Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional # plugins: https://docs.janusgraph.org/latest/elasticsearch.html #atlas.graph.index.search.hostname=localhost #atlas.graph.index.search.elasticsearch.client-only=true # Solr-specific configuration property atlas.graph.index.search.max-result-set-size=150 ######### Import Configs ######### #atlas.import.temp.directory=/temp/import ######### Notification Configs ######### atlas.notification.embedded=false atlas.kafka.data=${sys:atlas.home}/data/kafka atlas.kafka.zookeeper.connect=192.168.190.15:2181,192.168.190.16:2181,192.168.190.17:2181 atlas.kafka.bootstrap.servers=192.168.190.15:9092,192.168.190.16:9092,192.168.190.17:9092 atlas.kafka.zookeeper.session.timeout.ms=400 atlas.kafka.zookeeper.connection.timeout.ms=200 atlas.kafka.zookeeper.sync.time.ms=20 atlas.kafka.auto.commit.interval.ms=1000 atlas.kafka.hook.group.id=atlas atlas.kafka.enable.auto.commit=true atlas.kafka.auto.offset.reset=earliest atlas.kafka.session.timeout.ms=30000 atlas.kafka.offsets.topic.replication.factor=1 atlas.kafka.poll.timeout.ms=1000 atlas.notification.create.topics=true atlas.notification.replicas=1 atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES atlas.notification.log.failed.messages=true atlas.notification.consumer.retry.interval=500 atlas.notification.hook.retry.interval=1000 # Enable for Kerberized Kafka clusters #atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM #atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab ## Server port configuration #atlas.server.http.port=21000 #atlas.server.https.port=21443 ######### Security Properties ######### # SSL config atlas.enableTLS=false #truststore.file=/path/to/truststore.jks #cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks #following only required for 2-way SSL #keystore.file=/path/to/keystore.jks # Authentication config atlas.authentication.method.kerberos=false atlas.authentication.method.file=true #### ldap.type= LDAP or AD atlas.authentication.method.ldap.type=none #### user credentials file atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties ### groups from UGI #atlas.authentication.method.ldap.ugi-groups=true ######## LDAP properties ######### #atlas.authentication.method.ldap.url=ldap://<ldap server url>:389 #atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com #atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com #atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com) #atlas.authentication.method.ldap.groupRoleAttribute=cn #atlas.authentication.method.ldap.base.dn=dc=example,dc=com #atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com #atlas.authentication.method.ldap.bind.password=<password> #atlas.authentication.method.ldap.referral=ignore #atlas.authentication.method.ldap.user.searchfilter=(uid={0}) #atlas.authentication.method.ldap.default.role=<default role> ######### Active directory properties ####### #atlas.authentication.method.ldap.ad.domain=example.com #atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389 #atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0}) #atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com #atlas.authentication.method.ldap.ad.bind.password=<password> #atlas.authentication.method.ldap.ad.referral=ignore #atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0}) #atlas.authentication.method.ldap.ad.default.role=<default role> ######### JAAS Configuration ######## #atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule #atlas.jaas.KafkaClient.loginModuleControlFlag = required #atlas.jaas.KafkaClient.option.useKeyTab = true #atlas.jaas.KafkaClient.option.storeKey = true #atlas.jaas.KafkaClient.option.serviceName = kafka #atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab #atlas.jaas.KafkaClient.option.principal = atlas/_HOST@EXAMPLE.COM ######### Server Properties ######### atlas.rest.address=http://192.168.190.17:21000 # If enabled and set to true, this will run setup steps when the server starts atlas.server.run.setup.on.start=false ######### Entity Audit Configs ######### atlas.audit.hbase.tablename=apache_atlas_entity_audit atlas.audit.zookeeper.session.timeout.ms=1000 atlas.audit.hbase.zookeeper.quorum=192.168.190.15:2181,192.168.190.16:2181,192.168.190.17:2181 ######### High Availability Configuration ######## atlas.server.ha.enabled=false #### Enabled the configs below as per need if HA is enabled ##### #atlas.server.ids=id1 #atlas.server.address.id1=localhost:21000 #atlas.server.ha.zookeeper.connect=localhost:2181 #atlas.server.ha.zookeeper.retry.sleeptime.ms=1000 #atlas.server.ha.zookeeper.num.retries=3 #atlas.server.ha.zookeeper.session.timeout.ms=20000 ## if ACLs need to be set on the created nodes, uncomment these lines and set the values ## #atlas.server.ha.zookeeper.acl=<scheme>:<id> #atlas.server.ha.zookeeper.auth=<scheme>:<authinfo> ######### Atlas Authorization ######### atlas.authorizer.impl=simple atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json ######### Type Cache Implementation ######## # A type cache class which implements # org.apache.atlas.typesystem.types.cache.TypeCache. # The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache. #atlas.TypeCache.impl= ######### Performance Configs ######### #atlas.graph.storage.lock.retries=10 #atlas.graph.storage.cache.db-cache-time=120000 ######### CSRF Configs ######### atlas.rest-csrf.enabled=true atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.* atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE atlas.rest-csrf.custom-header=X-XSRF-HEADER ############ KNOX Configs ################ #atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera #atlas.sso.knox.enabled=true #atlas.sso.knox.providerurl=https://<knox gateway ip>:8443/gateway/knoxsso/api/v1/websso #atlas.sso.knox.publicKey= ############ Atlas Metric/Stats configs ################ # Format: atlas.metric.query.<key>.<name> atlas.metric.query.cache.ttlInSecs=900 #atlas.metric.query.general.typeCount= #atlas.metric.query.general.typeUnusedCount= #atlas.metric.query.general.entityCount= #atlas.metric.query.general.tagCount= #atlas.metric.query.general.entityDeleted= # #atlas.metric.query.entity.typeEntities= #atlas.metric.query.entity.entityTagged= # #atlas.metric.query.tags.entityTags= ######### Compiled Query Cache Configuration ######### # The size of the compiled query cache. Older queries will be evicted from the cache # when we reach the capacity. #atlas.CompiledQueryCache.capacity=1000 # Allows notifications when items are evicted from the compiled query # cache because it has become full. A warning will be issued when # the specified number of evictions have occurred. If the eviction # warning threshold <= 0, no eviction warnings will be issued. #atlas.CompiledQueryCache.evictionWarningThrottle=0 ######### Full Text Search Configuration ######### #Set to false to disable full text search. #atlas.search.fulltext.enable=true ######### Gremlin Search Configuration ######### #Set to false to disable gremlin search. atlas.search.gremlin.enable=false ########## Add http headers ########### #atlas.headers.Access-Control-Allow-Origin=* #atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST #atlas.headers.<headerName>=<headerValue> ######### UI Configuration ######## atlas.ui.default.version=v1 ######### Hive Hook Configs ####### atlas.hook.hive.synchronous=false atlas.hook.hive.numRetries=3 atlas.hook.hive.queueSize=10000 atlas.cluster.name=primary
Integrated hbase
ln -s /app/hbase/conf/ /app/atlas/conf/hbase/ cp /app/hbase/conf/* /app/atlas/conf/hbase/
Integrated solr
cp -r /app/atlas/conf/solr /app/solr/ cd /app/solr/ mv solr/ atlas-solr scp -r ./atlas-solr/ hadoop01:/app/solr/ scp -r ./atlas-solr/ hadoop02:/app/solr/ restart solr cd /app/solr/bin/ ./solr stop -force ./solr start -force stay solr Create index in solr create -c vertex_index -d /app/solr/atlas-solr/ -shards 3 -replicationFactor 2 -force solr create -c edge_index -d /app/solr/atlas-solr/ -shards 3 -replicationFactor 2 -force solr create -c fulltext_index -d /app/solr/atlas-solr/ -shards 3 -replicationFactor 2 -force
kafka related operations
stay kafka Create correlation in topic kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK
Integrated hive
cd /app/atlas/conf (This is very important. It must be in this directory) zip -u /app/atlas/hook/hive/hive-bridge-shim-2.1.0.jar atlas-application.properties cp -r /app/atlas/hook/hive/* /app/hive/lib/ scp -r /app/atlas/hook/hive/* hadoop01:/app/hive/lib/ scp -r /app/atlas/hook/hive/* hadoop02:/app/hive/lib/ cp ./atlas-application.properties /app/hive/conf/ scp ./atlas-application.properties hadoop01:/app/hive/conf/ scp ./atlas-application.properties hadoop02:/app/hive/conf/
hive related configuration
3 All machines need configuration cd /app/hive/conf hive-env.sh Add in export JAVA_HOME=/app/jdk export HIVE_AUX_JARS_PATH=/app/hive/lib/ hive-site.xml Add to: <property> <name>hive.exec.post.hooks</name> <value>org.apache.atlas.hive.hook.HiveHook</value> </property>
Start atlas
cd /app/atlas/bin ./atlas_start.py Description: first start atlas After a long wait, even if the display is started, it will take some time to access atlas web ui Can be in/app/atlas/logs View logs and report errors in the directory
Import hive metadata after startup
cd /app/atlas/bin ./import-hive.sh After that, you can check the normal blood relationship