GlusterFS -- distributed file system

I summary

  • GlusterFS: open source distributed file system without metadata server
  • FS: file system

1) File system composition

1. File system interface (API)

  • 2. Function module (management and storage tool): software collection for object management
    • 3. Objects and properties (consumers using this file system)

2) File system FS function

From the perspective of system, file system is a system that organizes and allocates the space of file storage device, is responsible for file storage, and protects and retrieves the stored files; It is mainly responsible for creating files for users, storing, reading, modifying and dumping files, and controlling file access

3) Mount usage of file system FS

In addition to the root file system, it needs to be mounted to the mount point before it can be accessed
 The mount point is a directory file associated with the partition device file

4) Composition of distributed file system (GFS)

Storage server

  • client
  • . NFS/samba storage gateway composition

5) Distributed file system (GFS) features

Scalability and high performance

  • High availability (redundancy)
  • Global unified namespace
  • Elastic volume management (replication volume, striped volume)
  • Based on standard protocol (http https)

6) Distributed file system (GFS) terminology

brick block: the server that actually stores user data

  • volume: partition of the local file system
  • Fuse: the file system in user space (pseudo file system), the tool for connecting the client and server, the local data is transmitted through the network, and the client service port connection tool requests to fuse
  • VFS: virtual interface. The user submits a request to VFS. VFS gives the file to fuse, then to GFS client, and finally to remote storage
  • Glusterd: Server

7. Role of metadata server

Store metadata to help users locate file location, index and other information
In the file system with metadata server, if the metadata is damaged, the file system will be unavailable directly (single point of failure - server location)
See the figure below for details: if A goes down, data cannot be obtained.

GFS discards the metadata server and uses the form of data mounting to mount to the client. The mounted gateway is NFS/Samba to realize the data sharing of three GFS servers ABC. Even if server A goes down, you can get data from server B/C

II Modular stack architecture

A variety of functional modules are combined to form a stack architecture
  • Modular and stack architecture
  • Through the combination of modules, complex functions are realized

Module part:

  • gige: Gigabit port
  • TCP/ip: network protocol
  • infiniband: network protocol, TCP and IP have the characteristics of forwarding lost packets, and the probability of packet loss is small

RDMA: responsible for data transmission and data transmission protocol
Solve the delay of data processing between client and server in transmission
posix: portable system interface to solve the portability of different operating systems

III Working principle of GFS

User: install the client, install fuse, mount to the server, and store the files

1. First, the system calls VFS (virtual interface), and then sends the data to fuse (pseudo file system) to connect the client and server
2. Save the data in a virtual file
3. Transfer data to the server through the client
4. The server uses VFS interface and saves it in the actual file system

GFS workflow

Writing process
GFS client server
Requirement: save the data in the GFS file system (the client sends a write request locally)
The API of VFS accepts the request – > combine the function modules and give the data to fuse (kernel pseudo file system)
FUSE can simulate the operating system and transfer the file system. The device location for transfer is: / dev/fuse (device virtual device file for transfer)
/The dev/fuse virtual device file will deliver the data to the GFS client
The GFS client will process the data according to the configuration file, and then send it to the GFS server through TCP/IB/rdma network
GFS server
After receiving data
The data is saved on the corresponding storage node
See the figure below for details

IV GFS volume type

1) Distributed volume

  • The file is not partitioned
  • Save HASH values through extended file properties
  • The supported underlying file systems are EXT3, EXT4, ZFS and XFS


  • The files are stored separately without splitting size, and the efficiency is not improved
  • Files are distributed on different servers without redundancy
  • A single point of failure can cause data loss
  • Rely on underlying data protection

Create command

  • Create a distributed volume named dis volume, and the files will be distributed in server1:/dir1, server2:/dir2 and server3:/dir3 according to HASH
gluster volume create dis-volume server1:/dir1 server2:/dir2

2) Strip roll

 Size split, polling storage on each node
 Without redundancy,


  • The data is divided into smaller pieces and distributed to different strips in the block server cluster
  • Distribution reduces load and speeds up access with smaller files
  • No data redundancy

Create command

  • A striped volume named stripe volume is created. The file will be stored in two bricks of Server1:/dir1 and Server2:/dir2 by block polling
gluster volume create stripe-volume stripe 2 transport tcp server1:/dir1 server2:/dir2

3) Copy volume

Redundant backup, the storage space of multiple nodes is inconsistent, and the access is based on the node with the minimum barrel effect
 At least two servers and more


  • All servers in the volume keep a complete copy
  • The number of copies of a volume can be determined by the customer when it is created
  • By at least two block servers or more
  • Redundancy

Create command

  • Create a replication volume named rep volume, and the file will store two copies at the same time, in Server1:/dir1 and Server2:/dir2 bricks respectively
gluster volume create rep-volume replica 2 transport tcp server1:/dir1 server2:/dir2

4) Distributed striped volume

Distributed, striped volume function
 Large file access processing, at least 4 servers

Create command

A distributed striped volume named dis rep is created. When configuring a distributed replication volume, the number of storage servers contained in the Brick in the volume must be a multiple of the replication number (> = 2 times)

gluster volume create rep-volume replica 2 transport tcp server1:/dir1 server2:/dir2

5) Distributed replication volume

 Distributed, replication volume function
 Features of both distributed and replicated volumes

Create command
A distributed striped volume named dis rep is created. When configuring a distributed replication volume, the number of storage servers contained in the Brick in the volume must be a multiple of the replication number (> = 2 times)

gluster volume create rep-volume replica 2 transport tcp server1:/dir1 server2:/dir2 server3:/dir3 server4:/dir4

6) Striped copy volume

similar RAID 10,It has the characteristics of striped volume and replicated volume at the same time

7) Distributed striped volume

Composite volumes of three basic volumes are commonly used for classes Map Reduce application

V Relevant maintenance commands

gluster volume listViewing GlusterFS volumes
gluster volume infoView information for all volumes
gluster volume statusView the status of all volumes
gluster volume stop dis-stripeStop a volume
gluster volume delete dis-stripeDelete a volume

Note: when deleting a volume, you need to stop the volume first, and no host in the trust pool is down, otherwise the deletion will not succeed

gluster volume set dis-rep auth.allow + IPSet access control for volume deny only
gluster volume set dis-rep auth.allow + IPOnly allowed

Vi GFS deployment

Node nameIP address
Node 1192.168.10.129

Turn off the firewall:

systemctl stop firewalld
setenforce 0

1. Partition and mount the disk on the node

Node1 node: 192.168 ten point one two nine
Node2 node: 192.168 ten point one three four
Node3 node: 192.168 ten point one three zero
Node4 node: 192.168 ten point one four two

NEWDEV=`ls /dev/sd* | grep -o 'sd[b-z]' | uniq`
for VAR in $NEWDEV
   echo -e "n\np\n\n\n\nw\n" | fdisk /dev/$VAR &> /dev/null
   mkfs.xfs /dev/${VAR}"1" &> /dev/null
   mkdir -p /data/${VAR}"1" &> /dev/null
   echo "/dev/${VAR}"1" /data/${VAR}"1" xfs defaults 0 0" >> /etc/fstab
mount -a &> /dev/null

chmod +x


echo "the disks exist list:"
##grep out the disk of the system
fdisk -l |grep 'disk /dev/sd[a-z]'    
echo "=================================================="
PS3="chose which disk you want to create:"
 ##Select the disk number you want to create
select VAR in `ls /dev/sd*|grep -o 'sd[b-z]'|uniq` quit   
    case $VAR in
 ##The local disk exits the case statement
        fdisk -l /dev/sda   
        break ;;
        #create partitions
        echo "n    ##create disk
                w"  | fdisk /dev/$VAR
        #make filesystem
        mkfs.xfs -i size=512 /dev/${VAR}"1" &> /dev/null    
	#mount the system
        mkdir -p /data/${VAR}"1" &> /dev/null
###Permanent mount
        echo -e "/dev/${VAR}"1" /data/${VAR}"1" xfs defaults 0 0\n" >> /etc/fstab
###Make mount effective
        mount -a &> /dev/null
        break ;;
        echo "wrong disk,please check again";;

2. Modify the host name and configure the / etc/hosts file

#Take Node1 node as an example:

hostnamectl set-hostname node1

echo " node1" >> /etc/hosts
echo " node2" >> /etc/hosts
echo " node3" >> /etc/hosts
echo " node4" >> /etc/hosts
echo " client" >> /etc/hosts

-----Install and start GlusterFS (operate on all node nodes)-----
#Upload the gfsrepo software to the / opt directory

#Upload gfsrepo Zip to / opt extract

cd /etc/yum.repos.d/
mkdir repo.bak
mv *.repo repo.bak

Open profile

vim  /etc/yum.repos.d/glfs.repo
yum clean all && yum makecache


#yum -y install centos-release-gluster			#If the official YUM source is used for installation, it can directly point to the Internet warehouse
yum -y install glusterfs glusterfs-server glusterfs-fuse glusterfs-rdma

Solution: uninstall the higher version and reinstall it

rpm -e --nodeps glusterfs-api 
rpm -e --nodeps glusterfs-libs 
rpm -e --nodeps glusterfs-fuse 
rpm -e --nodeps glusterfs-cli
yum -y install glusterfs-client-xlators.x86_64 
yum -y install glusterfs-api.x86_64 
yum install -y  glusterfs-cli
 If the version of other software packages is too high, uninstall them directly and reinstall them


systemctl start glusterd.service  Open service
systemctl enable glusterd.service  Startup and self startup service
systemctl status glusterd.service   View status

Time synchronization


-----Add node to storage trust pool (operate on node1 node)-----
#Just add other nodes on one Node

gluster peer probe node1
gluster peer probe node2
gluster peer probe node3
gluster peer probe node4

View cluster status on each Node

gluster peer status

1. Create distributed volumes
#Create a distributed volume without specifying the type. The default is to create a distributed volume

gluster volume create dis-volume node1:/data/sdb1 node2:/data/sdb1 force	

#View volume list

gluster volume list

#Start new distributed volume

gluster volume start dis-volume

#Viewing information about creating distributed volumes

gluster volume info dis-volume

2. Create a striped volume
#The specified type is stripe, the value is 2, and followed by 2 brick servers, so a striped volume is created

gluster volume create stripe-volume stripe 2 node1:/data/sdc1 node2:/data/sdc1 force
gluster volume start stripe-volume
gluster volume info stripe-volume

3. Create replication volume
#The specified type is replica, the value is 2, and followed by 2 brick servers, so a replication volume is created

gluster volume create rep-volume replica 2 node3:/data/sdb1 node4:/data/sdb1 force
gluster volume start rep-volume
gluster volume info rep-volume

4. Create distributed striped volumes
#The specified type is stripe, the value is 2, and followed by four brick servers, twice as much as 2, so a distributed striped volume is created

gluster volume create dis-stripe stripe 2 node1:/data/sdd1 node2:/data/sdd1 node3:/data/sdd1 node4:/data/sdd1 force
gluster volume start dis-stripe
gluster volume info dis-stripe

5. Create distributed replication volumes
The specified type is replica, and the value is 2, followed by four brick servers, which is twice as much as 2. Therefore, a distributed replication volume is created

gluster volume create dis-rep replica 2 node1:/data/sde1 node2:/data/sde1 node3:/data/sde1 node4:/data/sde1 force
gluster volume start dis-rep
gluster volume info dis-rep	
gluster volume list

----Deploy Gluster client-----
1. Install client software
#Upload gfsrepo software to / opt

cd /etc/yum.repos.d/
mkdir repo.bak
mv *.repo repo.bak
vim glfs.repo
yum clean all && yum makecache
yum -y install glusterfs glusterfs-fuse

2. Create a mount directory
mkdir -p /test/{dis,stripe,rep,dis_stripe,dis_rep}
ls /test

3. Configure the / etc/hosts file

echo " node1" >> /etc/hosts
echo " node2" >> /etc/hosts
echo " node3" >> /etc/hosts
echo " node4" >> /etc/hosts
echo " client" >> /etc/hosts

4. Mount the Gluster file system
#Temporary mount

mount.glusterfs node1:dis-volume /test/dis
mount.glusterfs node1:stripe-volume /test/stripe
mount.glusterfs node1:rep-volume /test/rep
mount.glusterfs node1:dis-stripe /test/dis_stripe
mount.glusterfs node1:dis-rep /test/dis_rep

df -Th

-----Testing the Gluster file system-----
1. Write files to the volume, and the client operates

cd /opt
dd if=/dev/zero of=/opt/demo1.log bs=1M count=20
dd if=/dev/zero of=/opt/demo2.log bs=1M count=20
dd if=/dev/zero of=/opt/demo3.log bs=1M count=20
dd if=/dev/zero of=/opt/demo4.log bs=1M count=20
dd if=/dev/zero of=/opt/demo5.log bs=1M count=20
cp demo* /test/dis
cp demo* /test/stripe/
cp demo* /test/rep/
cp demo* /test/dis_stripe/
cp demo* /test/dis_rep/

2. View file distribution
#View distributed file distribution

[root@node1 ~]# ls -lh /data/sdb1					#The data is not fragmented
[root@node2 ~]# ll -h /data/sdb1

#View striped volume file distribution

[root@node1 ~]# ls -lh /data/sdc1	#The data is fragmented, 50% without replica and redundancy

root@node2 ~]# ll -h /data/sdc1					#The data is fragmented, 50% without replica and redundancy

#View replication volume distribution

[root@node3 ~]# ll -h /data/sdb1					#The data is not fragmented, there are copies, and there is redundancy     

[root@node3 ~]# ll -h /data/sdd1

root@node4 ~]# ll -h /data/sdd1

root@node4 ~]# ll -h /data/sdd1

#View distributed replication volume distribution
#The data is not fragmented, there are copies, and there is redundancy

[root@node1 ~]# ll -h /data/sde1

[root@node2 ~]# ll -h /data/sde1

[root@node3 ~]# ll -h /data/sde1

[root@node4 ~]# ll -h /data/sde1

----Destructive test-----
#Suspend the node2 node or shut down the glusterd service to simulate a failure
[root@node2 ~]# systemctl stop glusterd.service

#Check whether the file is normal on the client
#Distributed volume data viewing

[root@localhost dis]# ll 						#Found demo5 missing on the customer Log file, which is on node2

#Strip roll

[root@localhost text]# cd stripe/					#Unreachable, striped volumes are not redundant
[root@localhost stripe]# ll

#Distributed striped volume

[root@localhost dis_and_stripe]# ll					#Inaccessible, distributed striped volumes are not redundant

#Distributed replication volume

[root@localhost dis_and_rep]# ll					#Yes, distributed replication volumes are redundant

If node2 is repaired, join the cluster and the data is normal

#Suspend node2 and node4 nodes and check whether the files are normal on the client
#Test whether the replicated volume is normal

[root@localhost rep]# ls -l						#Test the normal data on the client

Test whether the distributed stripe volume is normal

[root@localhost dis_stripe]# ll						#The test on the client is normal and there is no data 

#Test whether the distributed replication volume is normal

[root@localhost dis_and_rep]# ll					#Test whether there is normal data on the client

#Other maintenance commands:
1. View GlusterFS volumes
gluster volume list

2. View the information of all volumes
gluster volume info

3. View the status of all volumes
gluster volume status

4. Stop a volume
gluster volume stop dis-stripe

5. Delete a volume. Note: when deleting a volume, you need to stop the volume first, and no host in the trust pool is down, otherwise the deletion will not succeed
gluster volume delete dis-stripe

6. Set the access control of the volume
#Reject only
gluster volume set dis-rep auth.deny

#Only allowed
gluster volume set dis-rep auth.allow 192.168.10. * # setting 192.168 All IP addresses of the 10.0 network segment can access the dis rep volume (distributed replication volume)

Added by artweb on Fri, 24 Dec 2021 15:06:35 +0200