Set up GFS Distributed File System--Practice

1. Introduction to GlusterFS:

GFS is an extensible distributed file system for large, distributed applications that access a large amount of data.It runs on inexpensive common hardware and provides fault tolerance.It can provide high overall performance services to a large number of users.

Open source distributed file system;
It consists of a storage server, a client, and an NFS/Samba storage gateway.
(1) GlusterFS features:

Extensibility and high performance;
High availability;
Global Unified Namespace;
Flexible volume management;
Standard Protocol Based
(2) Modular stack architecture:

1. Modularization, stack structure;
2. Implement complex functions through the combination of modules;

3. GlusterFS workflow:

4. Elastic HASH algorithm:

(1) A 32-bit integer is obtained by HASH algorithm;
(2) It is divided into N connected subspaces, one Brick for each space;
(3) Advantages of the Elastic HASH algorithm:
(4) Ensure that the data is evenly distributed in each Brick;
(5) Resolve the dependence on the metadata server, thereby resolving the single point of failure and service access bottleneck.

2. Volume type of GlusterFS:

(1) Distributed volumes:

(1) The files are not processed in blocks;
(2) Save HASH values by extending file attributes;
(3) The underlying file systems supported are ext3, ext4, ZFS, XFS, etc.
Characteristic:

(1) Files are distributed on different servers and do not have redundancy;
(2) It is easier and cheaper to expand the size of volumes;
(3) Single point failure may cause data loss;
(4) Depend on the underlying data protection.
(2) Strip roll:

(1) divide the file into N blocks (N strip nodes) based on the offset, and store the polling in each Brick (2) Server node;
(3) When storing large files, the performance is particularly prominent;
(4) No redundancy, similar to raid0
Characteristic:

(1) The data is divided into smaller blocks and distributed across different strips in the block server cluster;
(2) Distribution reduces load and smaller files speed up access;
(3) No data redundancy
(3) Copy volume:

(1) Keep one or more copies of the same document;
(2) The replication mode has a low disk utilization rate because the replica is to be saved;
(3) Storage space on multiple nodes is inconsistent, then install barrel effect to the minimum capacity of nodes (4) as the total capacity of the volume
Characteristic:

(1) All servers in the volume keep a complete copy;
(2) The number of copies of a volume can be determined by the time the customer creates it;
(3) At least two or more block servers;
(4) It is disaster-tolerant.
(4) Distributed strip volume:

(1) Consider both distributed and strip-and-roll functions;
(2) Mainly used for large file access processing;
(3) At least four servers are required.
(5) Distributed replication volumes:

(1) Consider both distributed and replicated volumes;
(2) Used when redundancy is required

3. GlusterFS begins to operate:

Five virtual machines: one as a client and four as nodes, each with four new disks (20G per disk)

1. Partition, format and mount each disk first. You can use the following script

vim disk.sh //Mount disk script, one-click operation

#! /bin/bash
echo "the disks exist list:"
fdisk -l |grep 'disk /dev/sd[a-z]'
echo "=================================================="
PS3="chose which disk you want to create:"
select VAR in `ls /dev/sd*|grep -o 'sd[b-z]'|uniq` quit
do
    case $VAR in
    sda)
        fdisk -l /dev/sda
        break ;;
    sd[b-z])
        #create partitions
        echo "n
                p

                w"  | fdisk /dev/$VAR

        #make filesystem
        mkfs.xfs -i size=512 /dev/${VAR}"1" &> /dev/null
    #mount the system
        mkdir -p /data/${VAR}"1" &> /dev/null
        echo -e "/dev/${VAR}"1" /data/${VAR}"1" xfs defaults 0 0\n" >> /etc/fstab
        mount -a &> /dev/null
        break ;;
    quit)
        break;;
    *)
        echo "wrong disk,please check again";;
    esac
done

2. Operations on four node s
(1) Modify the host name (node1, node2, node3, node4), and close the firewall.

(2) Edit the hosts file (When the user enters a web address in the browser that needs to be logged in, the system will first automatically look for the corresponding IP address from the Hosts file. Once found, the system will open the corresponding web page immediately. If not found, the system will submit the web address to the DNS domain name resolution server for IP address resolution.), add hostname and IP address

vim   /etc/hosts

192.168.220.172 node1
192.168.220.131 node2
192.168.220.140 node3
192.168.220.136 node4

(3) Write a library of yum sources and install GlusterFS:

cd /opt/
mkdir /abc
mount.cifs //192.168.10.157/MHA/abc //Remote mount to local
cd /etc/yum.repos.d/
mkdir bak  
mv Cent* bak/   //Move all original sources to a new folder

vim GLFS.repo   //Create a new source
[GLFS]
name=glfs
baseurl=file:///abc/gfsrepo
gpgcheck=0
enabled=1

(4) Install software packages

yum -y install glusterfs glusterfs-server glusterfs-fuse glusterfs-rdma

(5) Opening services

systemctl start glusterd
systemctl status glusterd

(6) View status

3. Time synchronization, each node needs operation

ntpdate ntp1.aliyun.com   //time synchronization

Add a storage trust pool by adding three additional nodes on one host:
This is the operation on the node1 node:

gluster peer probe node2
gluster peer probe node3
gluster peer probe node4

gluster peer status //View the status of all nodes

IV. Creating Various Volumes

1. Build Distributed Volumes

gluster volume create dis-vol node1:/data/sdb1 node2:/data/sdb1 force
  //Create using two disks on node1 and node2; dis-vol is the disk name; force is mandatory

gluster volume start dis-vol    //start-up
gluster volume info dis-vol     //View Status

2. Build strip and roll

gluster volume create stripe-vol stripe 2 node1:/data/sdc1 node2:/data/sdc1 force

gluster volume start stripe-vol
gluster volume info stripe-vol

3. Build Copy Volumes

gluster volume create rep-vol replica 2 node3:/data/sdb1 node4:/data/sdb1 force

gluster volume start rep-vol
gluster volume info rep-vol

4. Distributed Strip Volume

gluster volume create dis-stripe stripe 2 node1:/data/sdd1 node2:/data/sdd1 node3:/data/sdd1 node4:/data/sdd1 force

gluster volume start dis-stripe
gluster volume info dis-stripe

5. Distributed replication volumes

gluster volume create dis-rep replica 2 node1:/data/sde1 node2:/data/sde1 node3:/data/sde1 node4:/data/sde1 force

gluster volume start dis-rep
gluster volume info dis-rep

6. Client Configuration
(1) Close the firewall

(2) Configure and install the GFS source:

cd /opt/
mkdir /abc
mount.cifs //192.168.10.157/MHA/abc //Remote mount to local
cd /etc/yum.repos.d/

vim GLFS.repo   //Create a new source
[GLFS]
name=glfs
baseurl=file:///abc/gfsrepo
gpgcheck=0
enabled=1

(3) Installation package

yum -y install glusterfs glusterfs-fuse  

(4) Modify the hosts file:

vim /etc/hosts

192.168.220.172 node1
192.168.220.131 node2
192.168.220.140 node3
192.168.220.136 node4

(5) Create temporary mount points:

mkdir -p /text/dis   //Recursively create a mount point
mount.glusterfs node1:dis-vol /text/dis/         //Mount Distributed Volumes

mkdir /text/strip
mount.glusterfs node1:stripe-vol /text/strip/     //Mount strip volume

mkdir /text/rep
mount.glusterfs node3:rep-vol /text/rep/          //Mount Copy Volume

mkdir /text/dis-str
mount.glusterfs node2:dis-stripe /text/dis-str/    //Mount Distributed Strip Volume

mkdir /text/dis-rep
mount.glusterfs node4:dis-rep /text/dis-rep/        //Mount Distributed Replication Volumes

(6) df-hT: View mount information:

5. Testing individual volumes

(1) Create five 40M files:

dd if=/dev/zero of=/demo1.log bs=1M count=40
dd if=/dev/zero of=/demo2.log bs=1M count=40
dd if=/dev/zero of=/demo3.log bs=1M count=40
dd if=/dev/zero of=/demo4.log bs=1M count=40
dd if=/dev/zero of=/demo5.log bs=1M count=40

(2) The five files created are copied to different volumes:

cp /demo* /text/dis
cp /demo* /text/strip
cp /demo* /text/rep/
cp /demo* /text/dis-str
cp /demo* /text/dis-rep

(3) See how each volume is distributed: ll-h/data/sdb1
1. Distributed Volumes:
You can see that each file is complete.


2. Strip volume:
All files are divided into half for distributed storage.


3. Copy volume:
All files are fully copied and stored.


4. Distributed strip volume:




5. Distributed replication volumes:


(4) Failure damage test:
Now shut down the second node server and simulate downtime; then view the individual volumes on the client:

Summary:

1. All files in the distributed volume are present;
2. All files on the copied volume are in place;
3. There is only one file missing from demo5.log to mount the distributed strip volume, and four files are missing.
4. Mount the distributed replication volume with all files in it.
5. All files of the strip volume are missing.
(5) Other operations:

1. Delete the volume (stop before deleting):

gluster volume stop volume name
 gluster volume delete volume name

2. Blacklist settings:

gluster volume set Volume Name auth.reject 192.168.220.100     //Deny a host to mount

gluster volume set Volume Name auth.allow 192.168.220.100      //Allow a host to mount

Keywords: vim yum firewall DNS

Added by Dennis Madsen on Sat, 21 Dec 2019 09:26:28 +0200