Preceding text
- Assume you have basic knowledge of Azure and SQL Server HA
- Assume you have basic knowledge of Azure Cli
- The goal is to create an availability group with three replicas on Azure Linux VM and implement listener and Fencing configuration
Environmental Science
- SQL Server 2019 Developer on Linux
- Azure VM Fencing agent
- Azure Cli implementation part configuration
- CentOS 7.7 Azure VM, SQL19N1,SQL19N2,SQL19N3 respectively, located in the same VNet
step
-
Create resource groups and availability sets for VM S
# East China 2 create resource group az group create --name SQL-DEMO-RG --location chinaeast2 # Create Availability Set for VM, configure 2 fault domains and 2 update domains az vm availability-set create \ --resource-group SQL-DEMO-RG \ --name AGLinux-AvailabilitySet \ --platform-fault-domain-count 2 \ --platform-update-domain-count 2
-
Using Template to deploy 3 VM S
When creating a VM for the first time, a template will be generated, then downloaded and saved. After modifying the parameter values, you can easily create a VM with similar configuration. VM configuration mainly includes:
- Use previous availability set
- Use the same subnet
- IP uses Standard
- SSH public key configuration
The template and parameter files are too long to display. It can be obtained on Azure Portal.
# The following is the configuration of SQL19N2. After modifying the parameter file, it can be directly used to create SQL19N3 templateFile="./templateFile" paramFile="./vmParams-sql19n2.json" az deployment group validate --name sql19n2vm \ -g SQL-DEMO-RG --template-file $templateFile --parameters $paramFile
-
Configure VM to use fixed intranet IP and public DNS Label
The configuration of all three VMS needs to be modified. The following is just an example of one VM
# Find out nic and IP information az network nic list -g SQL-DEMO-RG --query "[].{nicName:name,configuration:ipConfigurations[].{ipName:name,ip:privateIpAddress,method:privateIpAllocationMethod}}" -o yaml # Modify privateIpAllocationMethod to Static az network nic ip-config update -g SQL-DEMO-RG --nic-name sql19n1152 --name ipconfig1 --set privateIpAllocationMethod=Static # Find out the pbulic ip name az network public-ip list -g SQL-DEMO-RG --query "[].name" -o tsv # Configure DNS name of Public IP, only data and small letters can be used az network public-ip update -g SQL-DEMO-RG -n SQL19N1ip851 --dns-name sql19n1
-
Install HA related packages
It is better to update the software package of the system before installing HA related software.
yum update -y yum install -y pacemaker pcs fence-agents-all resource-agents fence-agents-azure-arm reboot
-
Open firewall ports for clusters and SQL Server
# Ports for Pacemaker and Corosync # TCP: Ports 2224,3121,21064,5405 # UDP: Port 5405 firewall-cmd --add-port=2224/tcp --permanent firewall-cmd --add-port=2224/tcp --permanent firewall-cmd --add-port=21064/tcp --permanent firewall-cmd --add-port=5405/tcp --permanent firewall-cmd --add-port=5405/udp --permanent # SQL Server port and AG mirror port # TCP: 1433,5022 firewall-cmd --add-port=1433/tcp --permanent firewall-cmd --add-port=5022/tcp --permanent firewall-cmd --reload
-
Add hosts record
vi /etc/hosts 172.17.2.8 SQL19N1 172.17.2.9 SQL19N2 172.17.2.10 SQL19N3
-
Create a Pacemaker cluster
# Set the default user password of Pacemaker on three VM S passwd hacluster # Setting up pacemaker and pcsd self start on three VM S systemctl enable pcsd systemctl start pcsd systemctl enable pacemaker # Create a cluster on the master node sudo pcs cluster auth SQL19N1 SQL19N2 SQL19N3 -u hacluster sudo pcs cluster setup --name agcluster SQL19N1 SQL19N2 SQL19N3 --token 30000 --force sudo pcs cluster start --all sudo pcs cluster enable --all # View cluster status pcs status # Modify the expected votes of quorum to 3 on three nodes, in fact, the default value of three node cluster is 3 # The settings indicate that three tickets are required for the cluster to survive. This modification only affects the current running cluster and will not become the permanent configuration of the cluster to be saved pcs quorum expected-votes 3
-
Configure servic principinpal for Fencing Agent on Azure
# 1. Create aad app, and record the corresponding appID after success az ad app create --display-name sqldemorg-app --identifier-uris http://localhost \ --password "1qaz@WSX3edc" --end-date '2030-04-27' --credential-description "sql19 ag secret" # 2. Create the Service Principal of aad App az ad sp create --id <appID> # 3. Assign the service Principal to the management role corresponding to the VM, and execute the # I'm assigning the Owner role here, which is not a safe way. You should use a custom role to give only the minimum permissions # Custom role requires that Azure subscription be PP1 or PP2 level az role assignment create --assignee <appID> --role owner \ --scope /subscriptions/<subscription-ID>/resourceGroups/<resourceGroup-Name>/providers/Microsoft.Compute/virtualMachines/SQL19N1
-
Create Azure's tonith device
I use Azure China, so I need to specify cloud=china. If I use global Azure, I don't need to specify this parameter.
Execute fence? Azure? Arm - h to see more help for this resource agent
pcs property set stonith-timeout=900 pcs stonith create rsc_st_azure fence_azure_arm login="<ApplicationID>" passwd="<servicePrincipalPassword>" resourceGroup="<resourceGroupName>" tenantId="<tenantID>" subscriptionId="<subscriptionId>" power_timeout=240 pcmk_reboot_timeout=900 cloud=china
-
Install SQL 2019 and tools
# Install SQL 2019 and HA resource agent sudo curl -o /etc/yum.repos.d/mssql-server.repo https://packages.microsoft.com/config/rhel/7/mssql-server-2019.repo sudo yum install -y mssql-server sudo /opt/mssql/bin/mssql-conf setup sudo yum install mssql-server-ha # Install MSSQL tools sudo curl -o /etc/yum.repos.d/msprod.repo https://packages.microsoft.com/config/rhel/7/prod.repo sudo yum install -y mssql-tools unixODBC-devel # Add MSSQL Tools Directory to aPATH for easy use echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc source ~/.bashrc # Install MSSQL cli sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc sudo curl -o /etc/yum.repos.d/mssql-cli.repo https://packages.microsoft.com/config/rhel/7/prod.repo sudo yum install mssql-cli # View SQL status systemctl status mssql-server
If you are familiar with SQL Server related PowerShell, it is recommended to install PowerShell on as well as SQL server module. For the configuration of SQL server, using PowerShell will be very convenient
yum install powershell -y pwsh Install-Module SQLServer # View SQL related commands Get-Command -Module SQLServer
-
Configure AG
-
Create PowerShell function to facilitate subsequent T-SQL execution
# Open the profile file of PowerShell and create it if necessary vi /root/.config/powershell/Microsoft.PowerShell_profile.ps1 # Add the following functions to the profile file, which can be called directly every time pwsh is opened # The function has two parameters, $sql represents the T-SQL to be executed. It is better to use here string to avoid character escape # $servers represents the target instance, array type. The default is three instances in the current environment function run-sql ($sql,$servers=("SQL19N1","SQL19N2","SQL19N3")) { $secpasswd = "1qaz@WSX"|ConvertTo-SecureString -AsPlainText -Force $cred=New-Object System.Management.Automation.PSCredential -ArgumentList 'sa', $secpasswd $sql "---------" foreach($svr in $servers) {"Running T-SQL on $svr..."; Invoke-Sqlcmd -ServerInstance $svr -Credential $cred -Query $sql} }
-
Enable hadr function, each instance
sudo /opt/mssql/bin/mssql-conf set hadr.hadrenabled 1 sudo systemctl restart mssql-server
-
Start Ag extended event session
# T-SQL, each instance ALTER EVENT SESSION AlwaysOn_health ON SERVER WITH (STARTUP_STATE=ON); GO
-
Create a certificate on the primary replica instance that verifies Mirroring endpoint communication. Copy the certificate and private key to the same directory location on another node. Grant mssql user access
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '1qaz@WSX'; GO CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'dbm'; GO BACKUP CERTIFICATE dbm_certificate TO FILE = '/var/opt/mssql/data/dbm_certificate.cer' WITH PRIVATE KEY ( FILE = '/var/opt/mssql/data/dbm_certificate.pvk', ENCRYPTION BY PASSWORD = '1qaz@WSX' );
# Copy certificate and private key to secondary replica hosts SQL19N2 and SQL19N3 cd /var/opt/mssql/data scp dbm_certificate.* root@SQL19N2:/var/opt/mssql/data/ scp dbm_certificate.* root@SQL19N3:/var/opt/mssql/data/ # Modify permission on secondary replica node cd /var/opt/mssql/data chown mssql:mssql dbm_certificate.*
-
Create master key in secondary replica instance and import certificate
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '1qaz@WSX'; GO CREATE CERTIFICATE dbm_certificate FROM FILE = '/var/opt/mssql/data/dbm_certificate.cer' WITH PRIVATE KEY ( FILE = '/var/opt/mssql/data/dbm_certificate.pvk', DECRYPTION BY PASSWORD = '1qaz@WSX' );
-
Create the mirror port of AG, note the exception of firewall and NSG configuration port
CREATE ENDPOINT [Hadr_endpoint] AS TCP (LISTENER_PORT = 5022) FOR DATABASE_MIRRORING ( ROLE = ALL, AUTHENTICATION = CERTIFICATE dbm_certificate, ENCRYPTION = REQUIRED ALGORITHM AES ); GO ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
-
Create three replicas, AG in synchronization mode, execute on the primary replica instance
CREATE AVAILABILITY GROUP [ag1] WITH (DB_FAILOVER = ON, CLUSTER_TYPE = EXTERNAL) FOR REPLICA ON N'SQL19N1' WITH ( ENDPOINT_URL = N'tcp://SQL19N1:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL) ), N'SQL19N2' WITH ( ENDPOINT_URL = N'tcp://SQL19N2:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL) ), N'SQL19N3' WITH( ENDPOINT_URL = N'tcp://SQL19N3:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL) ); GO ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE; GO
-
Create sql login and authorization for Pacemaker, each instance
USE [master] GO CREATE LOGIN [pacemakerLogin] with PASSWORD= N'1qaz@WSX' go ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin]; GO
-
Save the login information of pacemaker to a local file
echo "pacemakerLogin" >> /var/opt/mssql/secrets/passwd echo "1qaz@WSX" >> /var/opt/mssql/secrets/passwd # Only root is allowed to read chown root:root /var/opt/mssql/secrets/passwd chmod 400 /var/opt/mssql/secrets/passwd
-
Add secondary replica to AG, secondary replica executes
ALTER AVAILABILITY GROUP [ag1] JOIN WITH (CLUSTER_TYPE = EXTERNAL); GO # Permissions required by auto editing function ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE; GO
-
If you do not want pacemaker login to have sysadmin privileges, you can remove it from sysadmin and grant the following privileges. Each instance
ALTER SERVER ROLE [sysadmin] DROP MEMBER [pacemakerLogin] GO GRANT ALTER, CONTROL, VIEW DEFINITION ON AVAILABILITY GROUP::ag1 TO pacemakerLogin; GO GRANT VIEW SERVER STATE TO pacemakerLogin; GO
-
Add database to AG, primary copy execution
CREATE DATABASE [db1]; GO ALTER DATABASE [db1] SET RECOVERY FULL; GO BACKUP DATABASE [db1] TO DISK = N'nul'; GO ALTER AVAILABILITY GROUP [ag1] ADD DATABASE [db1]; GO
-
Availability database status
SELECT * FROM sys.databases WHERE name = 'db1'; GO SELECT DB_NAME(database_id) AS 'database', synchronization_state_desc FROM sys.dm_hadr_database_replica_states;
-
Configuring AG in a Pacemaker cluster
-
Create AG resource, AG name to be specified as the AG name created before
pcs resource create agcluster ocf:mssql:ag ag_name=ag1 meta failure-timeout=30s master notify=true
-
Create virtual IP resource
# Disable fencing pcs property set stonith-enabled=false # Create VIP pcs resource create virtualip ocf:heartbeat:IPaddr2 ip=172.17.2.7
-
To create a collaboration constraint, vip and master must be started on the same node
pcs constraint colocation add virtualip agcluster-master INFINITY with-rsc-role=Master
-
To create an ordering constraint, vip must be started before the master replica resource
pcs constraint order promote agcluster-master then start virtualip # View current constraints pcs constraint show --full
-
Re enable stop and view cluster status
pcs property set stonith-enabled=true pcs status # Status information in my environment --------------------------------------- Cluster name: agcluster Stack: corosync Current DC: SQL19N3 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum Last updated: Wed Apr 29 04:24:50 2020 Last change: Wed Apr 29 04:24:45 2020 by root via cibadmin on SQL19N1 3 nodes configured 5 resources configured Online: [ SQL19N1 SQL19N2 SQL19N3 ] Full list of resources: rsc_st_azure (stonith:fence_azure_arm): Started SQL19N1 Master/Slave Set: agcluster-master [agcluster] Masters: [ SQL19N1 ] Slaves: [ SQL19N2 SQL19N3 ] virtualip (ocf::heartbeat:IPaddr2): Started SQL19N1 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
-
Testing Failover and Fencing
# Manual failover pcs resource move agcluster-master SQL19N2 --master pcs status # Manual failover will generate a constraint to avoid AG resources returning to the original node # If you want AG to fail over in the future, you need to delete it manually pcs constraint show --full pcs constraint remove cli-prefer-agcluster-master # Try Fencing cluster nodes. Each node tries # The fencing of the following command only restarts the node. If you want to safely shut down the node, use the -- off parameter pcs stonith fence SQL19N3 --debug