1. Introduction to GlusterFS:
GFS is an extensible distributed file system for large, distributed applications that access a large amount of data.It runs on inexpensive common hardware and provides fault tolerance.It can provide high overall performance services to a large number of users.
Open source distributed file system;
It consists of a storage server, a client, and an NFS/Samba storage gateway.
(1) GlusterFS features:
Extensibility and high performance;
High availability;
Global Unified Namespace;
Flexible volume management;
Standard Protocol Based
(2) Modular stack architecture:
1. Modularization, stack structure;
2. Implement complex functions through the combination of modules;
3. GlusterFS workflow:
4. Elastic HASH algorithm:
(1) A 32-bit integer is obtained by HASH algorithm;
(2) It is divided into N connected subspaces, one Brick for each space;
(3) Advantages of the Elastic HASH algorithm:
(4) Ensure that the data is evenly distributed in each Brick;
(5) Resolve the dependence on the metadata server, thereby resolving the single point of failure and service access bottleneck.
2. Volume type of GlusterFS:
(1) Distributed volumes:
(1) The files are not processed in blocks;
(2) Save HASH values by extending file attributes;
(3) The underlying file systems supported are ext3, ext4, ZFS, XFS, etc.
Characteristic:
(1) Files are distributed on different servers and do not have redundancy;
(2) It is easier and cheaper to expand the size of volumes;
(3) Single point failure may cause data loss;
(4) Depend on the underlying data protection.
(2) Strip roll:
(1) divide the file into N blocks (N strip nodes) based on the offset, and store the polling in each Brick (2) Server node;
(3) When storing large files, the performance is particularly prominent;
(4) No redundancy, similar to raid0
Characteristic:
(1) The data is divided into smaller blocks and distributed across different strips in the block server cluster;
(2) Distribution reduces load and smaller files speed up access;
(3) No data redundancy
(3) Copy volume:
(1) Keep one or more copies of the same document;
(2) The replication mode has a low disk utilization rate because the replica is to be saved;
(3) Storage space on multiple nodes is inconsistent, then install barrel effect to the minimum capacity of nodes (4) as the total capacity of the volume
Characteristic:
(1) All servers in the volume keep a complete copy;
(2) The number of copies of a volume can be determined by the time the customer creates it;
(3) At least two or more block servers;
(4) It is disaster-tolerant.
(4) Distributed strip volume:
(1) Consider both distributed and strip-and-roll functions;
(2) Mainly used for large file access processing;
(3) At least four servers are required.
(5) Distributed replication volumes:
(1) Consider both distributed and replicated volumes;
(2) Used when redundancy is required
3. GlusterFS begins to operate:
Five virtual machines: one as a client and four as nodes, each with four new disks (20G per disk)
1. Partition, format and mount each disk first. You can use the following script
vim disk.sh //Mount disk script, one-click operation #! /bin/bash echo "the disks exist list:" fdisk -l |grep 'disk /dev/sd[a-z]' echo "==================================================" PS3="chose which disk you want to create:" select VAR in `ls /dev/sd*|grep -o 'sd[b-z]'|uniq` quit do case $VAR in sda) fdisk -l /dev/sda break ;; sd[b-z]) #create partitions echo "n p w" | fdisk /dev/$VAR #make filesystem mkfs.xfs -i size=512 /dev/${VAR}"1" &> /dev/null #mount the system mkdir -p /data/${VAR}"1" &> /dev/null echo -e "/dev/${VAR}"1" /data/${VAR}"1" xfs defaults 0 0\n" >> /etc/fstab mount -a &> /dev/null break ;; quit) break;; *) echo "wrong disk,please check again";; esac done
2. Operations on four node s
(1) Modify the host name (node1, node2, node3, node4), and close the firewall.
(2) Edit the hosts file (When the user enters a web address in the browser that needs to be logged in, the system will first automatically look for the corresponding IP address from the Hosts file. Once found, the system will open the corresponding web page immediately. If not found, the system will submit the web address to the DNS domain name resolution server for IP address resolution.), add hostname and IP address
vim /etc/hosts 192.168.220.172 node1 192.168.220.131 node2 192.168.220.140 node3 192.168.220.136 node4
(3) Write a library of yum sources and install GlusterFS:
cd /opt/ mkdir /abc mount.cifs //192.168.10.157/MHA/abc //Remote mount to local cd /etc/yum.repos.d/ mkdir bak mv Cent* bak/ //Move all original sources to a new folder vim GLFS.repo //Create a new source [GLFS] name=glfs baseurl=file:///abc/gfsrepo gpgcheck=0 enabled=1
(4) Install software packages
yum -y install glusterfs glusterfs-server glusterfs-fuse glusterfs-rdma
(5) Opening services
systemctl start glusterd systemctl status glusterd
(6) View status
3. Time synchronization, each node needs operation
ntpdate ntp1.aliyun.com //time synchronization
Add a storage trust pool by adding three additional nodes on one host:
This is the operation on the node1 node:
gluster peer probe node2 gluster peer probe node3 gluster peer probe node4 gluster peer status //View the status of all nodes
IV. Creating Various Volumes
1. Build Distributed Volumes
gluster volume create dis-vol node1:/data/sdb1 node2:/data/sdb1 force //Create using two disks on node1 and node2; dis-vol is the disk name; force is mandatory gluster volume start dis-vol //start-up gluster volume info dis-vol //View Status
2. Build strip and roll
gluster volume create stripe-vol stripe 2 node1:/data/sdc1 node2:/data/sdc1 force gluster volume start stripe-vol gluster volume info stripe-vol
3. Build Copy Volumes
gluster volume create rep-vol replica 2 node3:/data/sdb1 node4:/data/sdb1 force gluster volume start rep-vol gluster volume info rep-vol
4. Distributed Strip Volume
gluster volume create dis-stripe stripe 2 node1:/data/sdd1 node2:/data/sdd1 node3:/data/sdd1 node4:/data/sdd1 force gluster volume start dis-stripe gluster volume info dis-stripe
5. Distributed replication volumes
gluster volume create dis-rep replica 2 node1:/data/sde1 node2:/data/sde1 node3:/data/sde1 node4:/data/sde1 force gluster volume start dis-rep gluster volume info dis-rep
6. Client Configuration
(1) Close the firewall
(2) Configure and install the GFS source:
cd /opt/ mkdir /abc mount.cifs //192.168.10.157/MHA/abc //Remote mount to local cd /etc/yum.repos.d/ vim GLFS.repo //Create a new source [GLFS] name=glfs baseurl=file:///abc/gfsrepo gpgcheck=0 enabled=1
(3) Installation package
yum -y install glusterfs glusterfs-fuse
(4) Modify the hosts file:
vim /etc/hosts 192.168.220.172 node1 192.168.220.131 node2 192.168.220.140 node3 192.168.220.136 node4
(5) Create temporary mount points:
mkdir -p /text/dis //Recursively create a mount point mount.glusterfs node1:dis-vol /text/dis/ //Mount Distributed Volumes mkdir /text/strip mount.glusterfs node1:stripe-vol /text/strip/ //Mount strip volume mkdir /text/rep mount.glusterfs node3:rep-vol /text/rep/ //Mount Copy Volume mkdir /text/dis-str mount.glusterfs node2:dis-stripe /text/dis-str/ //Mount Distributed Strip Volume mkdir /text/dis-rep mount.glusterfs node4:dis-rep /text/dis-rep/ //Mount Distributed Replication Volumes
(6) df-hT: View mount information:
5. Testing individual volumes
(1) Create five 40M files:
dd if=/dev/zero of=/demo1.log bs=1M count=40 dd if=/dev/zero of=/demo2.log bs=1M count=40 dd if=/dev/zero of=/demo3.log bs=1M count=40 dd if=/dev/zero of=/demo4.log bs=1M count=40 dd if=/dev/zero of=/demo5.log bs=1M count=40
(2) The five files created are copied to different volumes:
cp /demo* /text/dis cp /demo* /text/strip cp /demo* /text/rep/ cp /demo* /text/dis-str cp /demo* /text/dis-rep
(3) See how each volume is distributed: ll-h/data/sdb1
1. Distributed Volumes:
You can see that each file is complete.
2. Strip volume:
All files are divided into half for distributed storage.
3. Copy volume:
All files are fully copied and stored.
4. Distributed strip volume:
5. Distributed replication volumes:
(4) Failure damage test:
Now shut down the second node server and simulate downtime; then view the individual volumes on the client:
Summary:
1. All files in the distributed volume are present;
2. All files on the copied volume are in place;
3. There is only one file missing from demo5.log to mount the distributed strip volume, and four files are missing.
4. Mount the distributed replication volume with all files in it.
5. All files of the strip volume are missing.
(5) Other operations:
1. Delete the volume (stop before deleting):
gluster volume stop volume name gluster volume delete volume name
2. Blacklist settings:
gluster volume set Volume Name auth.reject 192.168.220.100 //Deny a host to mount gluster volume set Volume Name auth.allow 192.168.220.100 //Allow a host to mount