Velero (formerly Heptio Ark) is an open source tool that can safely backup and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes. Velero can be deployed in a self built Kubernetes cluster or a K8S environment hosted by a public cloud, such as QKE (Kubesphere). Velero can be used to:
- Backup cluster resources
- And restore in case of loss.
- Migrate cluster resources to other clusters.
- Copy the production cluster resources to the development and test cluster.
Velero has two backup methods:
- Backup in restic mode, which backs up persistent volume data at the file system level and sends it to Velero's object store. The execution speed depends on the local IO capability, network loan and object storage performance, which is slower than snapshot backup. However, if there is a problem with the current cluster or storage, because all resources and data are stored on the remote object storage, the application can be easily restored by using restic backup.
- For snapshot backup, Velero uses a set of BackupItemAction plug-ins to back up PersistentVolumeClaims. Fast execution speed. It creates a VolumeSnapshot object with PersistentVolumeClaim as the source This VolumeSnapshot object is in the same namespace as the PersistentVolumeClaim used as the source. The volumesnaphotcontent object corresponding to VolumeSnapshot is a cluster wide resource that points to the actual disk based snapshot in the storage system. During Velero backup, all VolumeSnapshots and volumesnaphotcontents objects are uploaded to the object storage system, but the data resources after Velero backup are still saved on the storage of the cluster. Data availability depends on the high availability of local storage, because if the application problem is caused by storage failure, Velero's snapshot backup mechanism cannot recover the application data.
Aiming at the limitation of velero snapshot backup, this paper will manually store the backup applications and data to AWS compatible S3 objects, such as minio in private environment or QingStor in Qingyun on public cloud. Here, QingStor is taken as an example.
This experiment will deploy a wordpress application in the wordpress project (namespace) of Kubesphere cluster. First, use Velero snapshot to back up the applications and data under this namespace to QingStor, manually export the data resources from the primary storage to QingStor, and then simulate the failure of the primary storage, recover the data resources from QingStor and apply them to another cluster. The following are the specific experimental steps:
Experimental environment and preconditions:
Install the Velero open source tool and configure the corresponding object storage. Create wordpress namespace based on rook CEPH, and run wordpress and mysql applications
root$ kubectl -n wordpress get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mysql-pv-claim Bound pvc-8a3b3541-1718-4af5-94fc-e24ebe026172 10Gi RWO rook-ceph-block 47m wp-pv-claim Bound pvc-355352e5-0ecf-4b40-9056-5514015eb392 2Gi RWO rook-ceph-block 47m root$ kubectl -n wordpress get pods NAME READY STATUS RESTARTS AGE wordpress-589f976cd5-4ns55 1/1 Running 0 45m wordpress-mysql-d9b8d8884-2kmtb 1/1 Running 0 45m
Backup data
- In order to prove that the data can be recovered, first publish a new article on wordpress, and then check whether the article has been recovered after the whole backup and recovery process.
<img src="https://gitee.com/jibutech/tech-docs/raw/master/images/wordpress-demo.png" style="zoom:50%;" />
- Use velero to make a snapshot backup of the workpress project. The CR used when we create a velero backup under the namespace velero where velero runs.
# wp-snap-manual.yaml apiVersion: velero.io/v1 kind: Backup metadata: annotations: velero.io/source-cluster-k8s-gitversion: v1.19.5 velero.io/source-cluster-k8s-major-version: "1" velero.io/source-cluster-k8s-minor-version: "19" namespace: velero name: wp-snap-manual spec: defaultVolumesToRestic: false hooks: {} snapshotVolumes: true includedNamespaces: - wordpress storageLocation: qingstor-vbbf8 volumeSnapshotLocations: - qingstor-bd0a9b2b-7add-4b97-ba26-d8182d1a2d8e ttl: 2h0m0s
Create a backup velero. io CR
root$ kubectl apply -f wp-snap-manual.yaml backup.velero.io/wp-snap-manual created
You can view the generated volumesnapshot resource under wordpress namespace and view the corresponding volumesnapshotcontent information
root$ kubectl -n wordpress get volumesnapshot NAME AGE velero-mysql-pv-claim-hmthh 58m velero-wp-pv-claim-lgmh5 58m root$ kubectl -n wordpress get volumesnapshot velero-mysql-pv-claim-hmthh -o yaml | grep bound boundVolumeSnapshotContentName: snapcontent-428c9f1d-69e1-46b0-93d5-dac44b795aaa root$ kubectl -n wordpress get volumesnapshot velero-wp-pv-claim-lgmh5 -o yaml | grep bound boundVolumeSnapshotContentName: snapcontent-6f2ca29b-75a5-46b4-89a3-2f7a4eeff958
- Delete volumesnapshot under Wordpress namespace
root$ kubectl -n wordpress delete volumesnapshot velero-mysql-pv-claim-hmthh velero-wp-pv-claim-lgmh5 volumesnapshot.snapshot.storage.k8s.io "velero-mysql-pv-claim-hmthh" deleted volumesnapshot.snapshot.storage.k8s.io "velero-wp-pv-claim-lgmh5" deleted
- Create a new namespace poc. Create a volumesnapshot in poc. The volumeSnapshotContentName of source in spec is the volumeSnapshotContentName in step 2
# velero-wp-snapshot.yaml apiVersion: snapshot.storage.k8s.io/v1beta1 kind: VolumeSnapshot metadata: finalizers: - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection labels: velero.io/backup-name: wp-snap-manual manager: snapshot-controller name: velero-wp-snapshot namespace: poc spec: source: volumeSnapshotContentName: snapcontent-6f2ca29b-75a5-46b4-89a3-2f7a4eeff958 volumeSnapshotClassName: csi-rbdplugin-snapclass
# velero-mysql-snapshot.yaml apiVersion: snapshot.storage.k8s.io/v1beta1 kind: VolumeSnapshot metadata: finalizers: - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection labels: velero.io/backup-name: wp-snap-manual manager: snapshot-controller name: velero-mysql-snapshot namespace: poc spec: source: volumeSnapshotContentName: snapcontent-428c9f1d-69e1-46b0-93d5-dac44b795aaa volumeSnapshotClassName: csi-rbdplugin-snapclass
root$ kubectl create ns poc root$ kubectl -n poc apply -f velero-mysql-snapshot.yaml -f velero-wp-snapshot.yaml
- Change the yaml of two volumesnapshotcontent s so that their volumeSnapshotRef points to the volumesnapshot of the new namespace
root$ kubectl edit volumesnapshotcontent snapcontent-428c9f1d-69e1-46b0-93d5-dac44b795aaa
Find the VolumeSnapshotRef field and update it to the volumesnapshot content under poc
volumeSnapshotRef: apiVersion: snapshot.storage.k8s.io/v1beta1 kind: VolumeSnapshot name: velero-wp-snapshot namespace: poc uid: 4c1a4a4a-9949-425a-a3a9-1970f494aaca
volumeSnapshotRef: apiVersion: snapshot.storage.k8s.io/v1beta1 kind: VolumeSnapshot name: velero-mysql-snapshot namespace: poc uid: 4c1a4a4a-9949-4277-a3a9-1970f494aaff
- Create PVC in the new namespace and specify two volume snapshots of the current namespace of the data source
# mysql-pv-claim.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mysql-pv-claim spec: storageClassName: rook-ceph-block dataSource: name: velero-mysql-snapshot kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
# wp-pv-claim.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: wp-pv-claim spec: storageClassName: rook-ceph-block dataSource: name: velero-wp-snapshot kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
root$ kubectl -n poc apply -f wp-pv-claim.yaml -f mysql-pv-claim.yaml root$ kubectl -n poc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mysql-pv-claim Bound pvc-7fcf6a03-abd5-4692-92d5-63504891de57 10Gi RWO rook-ceph-block 4s wp-pv-claim Bound pvc-a1234890-18a0-4d5a-9435-1b74632e8f17 2Gi RWO rook-ceph-block 4s
- After the PVC is successfully created, the cluster will create a new PV for them and update the callback policy of the PV to Retain. The purpose is to continue to Retain the two PVS after deleting the PVC so that the data can continue to be saved.
kubectl patch pv pvc-7fcf6a03-abd5-4692-92d5-63504891de57 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' --type=merge kubectl patch pv pvc-a1234890-18a0-4d5a-9435-1b74632e8f17 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' --type=merge
- Delete PVC MySQL PV claim and WP PV claim in poc, and specify volumename as the name of the PV created above, so that the new PVC will be bound with the PV created above. Through the above steps (volume snapshot content - > PVC - > PV - > pvc2), the snapshot data of wordpress is completely consistent with the data resources specified by the current PVC.
# mysql-pv-claim-2.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mysql-pv-claim spec: storageClassName: rook-ceph-block accessModes: - ReadWriteOnce resources: requests: storage: 10Gi volumeName: pvc-89b028cc-c5c1-4c63-9398-05f54c80860a
# wp-pv-claim-2.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: wp-pv-claim spec: storageClassName: rook-ceph-block accessModes: - ReadWriteOnce resources: requests: storage: 10Gi volumeName: pvc-1d985bd1-d03d-40e6-b1cc-aaf2deb1d403
root$ kubectl -n poc apply -f wp-pv-claim-2.yaml -f mysql-pv-claim-2.yaml
- Change the RECLAIM POLICY of two PV S back to Delete
kubectl patch pv pvc-7fcf6a03-abd5-4692-92d5-63504891de57 -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}' --type=merge kubectl patch pv pvc-a1234890-18a0-4d5a-9435-1b74632e8f17 -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}' --type=merge
- Create a temporary pod in the new namespace and bind PVC.
root$ kubectl -n poc get pods NAME READY STATUS RESTARTS AGE stage-wordpress-589f976cd5-4ns55-krjxs 1/1 Running 0 56s stage-wordpress-mysql-d9b8d8884-2kmtb-q9qnf 1/1 Running 0 56s
- Call Velero for file system level backup. After the backup is successful, all wordpress data resources have been exported to the remote QingStor
# poc-filesystem-manual.yaml apiVersion: velero.io/v1 kind: Backup metadata: annotations: velero.io/source-cluster-k8s-gitversion: v1.19.5 velero.io/source-cluster-k8s-major-version: "1" velero.io/source-cluster-k8s-minor-version: "19" namespace: velero name: poc-filesystem-manual spec: defaultVolumesToRestic: true hooks: {} snapshotVolumes: false includedNamespaces: - wordpress storageLocation: qingstor-vbbf8 ttl: 2h0m0s
root$ kubectl -n velero apply -f poc-filesystem-manual.yaml root$ kubectl -n velero get backups.velero.io poc-filesystem-manual NAME AGE poc-filesystem-manual 95s root$ kubectl -n velero describe backups.velero.io poc-filesystem-manual ... Status: Completion Timestamp: 2021-11-18T05:49:18Z Expiration: 2021-12-18T05:47:35Z Format Version: 1.1.0 Phase: Completed Progress: Items Backed Up: 31 Total Items: 31 Start Timestamp: 2021-11-18T05:47:35Z Version: 1 Events: <none>
Recover data
After the backup is successful, all the data has been transferred to the remote object storage. If there is a failure and storage failure at this time, we will describe how to recover the namespace and data from the remote object storage.
- Simulate the disaster state and delete the wordpress namespace
root$ kubectl delete ns wordpress
- Create Velero's restore CR, restore the data we backed up earlier to the wordpress namespace, and delete two temporary pod s after successful recovery.
# poc-restore-manual.yaml apiVersion: velero.io/v1 kind: Restore metadata: name: poc-restore namespace: velero spec: backupName: poc-filesystem-manual excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io hooks: {} namespaceMapping: poc: wordpress restorePVs: true
root$ kubectl -n velero apply -f poc-restore-manual.yaml
- Then use Velero to restore the CR and other resources of wordpress snapshot backup to the wordpress namespace. exclude PV and PVC resources on recovery. Wait for the pod to run.
# wp-restore-manual.yaml apiVersion: velero.io/v1 kind: Restore metadata: name: poc-restore namespace: velero spec: backupName: wp-snapshot-manual excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io - persistentvolume - persistentvolumeclaim hooks: {} namespaceMapping: wordpress: wordpress restorePVs: true
root$ kubectl -n velero apply -f wp-restore-manual.yaml
root$ kubectl -n wordpress get pods NAME READY STATUS RESTARTS AGE wordpress-mysql-d9b8d8884-2kmtb 1/1 Running 0 18s wordpress-589f976cd5-4ns55 1/1 Running 0 18s
- Now we can verify whether all wordpress data is recovered. Open wordpress and you can see that the article in the figure below still exists. So far, it indicates that the recovery is successful.
<img src="https://gitee.com/jibutech/tech-docs/raw/master/images/wordpress-demo-2.png" style="zoom:50%;" />
automation
For the backup and recovery process, the author has written a small tool to automate the above manual process. You are welcome to try it data-mover And put forward valuable suggestions.
The following is the program run output:
1. Backup data
root data-mover % go run main.go --action backup --backupName wp-backup-snap-76mxp-hzb2f --namespace wordpress === Step 0. Create temporay namespace + dm-wp-backup-snap-76mxp-hzb2f === Step 1. Create new volumesnapshot in temporary namespace name: velero-mysql-pv-claim-q6jgv, uid: 532b6050-1bd7-4a6f-abfc-1a900bb52fc1, pvc: mysql-pv-claim, content_name: snapcontent-532b6050-1bd7-4a6f-abfc-1a900bb52fc1 Deleted volumesnapshot: velero-mysql-pv-claim-q6jgv in namesapce wordpress Created volumesnapshot: velero-mysql-pv-claim-q6jgv in dm-wp-backup-snap-76mxp-hzb2f name: velero-wp-pv-claim-p4lhl, uid: ada383d6-c23d-48fc-93fd-cad20f863cf4, pvc: wp-pv-claim, content_name: snapcontent-ada383d6-c23d-48fc-93fd-cad20f863cf4 Deleted volumesnapshot: velero-wp-pv-claim-p4lhl in namesapce wordpress Created volumesnapshot: velero-wp-pv-claim-p4lhl in dm-wp-backup-snap-76mxp-hzb2f === Step 2. Update volumesnapshot content to new volumesnapshot in temporary namespace Update volumesnapshotcontent snapcontent-532b6050-1bd7-4a6f-abfc-1a900bb52fc1 to remove snapshot reference Update volumesnapshotcontent snapcontent-ada383d6-c23d-48fc-93fd-cad20f863cf4 to remove snapshot reference === Step 3. Create pvc reference to the new volumesnapshot in temporary namespace Created pvc mysql-pv-claim in dm-wp-backup-snap-76mxp-hzb2f Created pvc wp-pv-claim in dm-wp-backup-snap-76mxp-hzb2f === Step 4. Recreate pvc to reference pv created in step 3 Get pvc mysql-pv-claim and pv pvc-7fb33118-02a7-42db-9b18-2ba2a88c1346 Patch pv pvc-7fb33118-02a7-42db-9b18-2ba2a88c1346 with retain option Deleted pvc mysql-pv-claim Update pv pvc-7fb33118-02a7-42db-9b18-2ba2a88c1346 to remove reference in dm-wp-backup-snap-76mxp-hzb2f Update pv pvc-7fb33118-02a7-42db-9b18-2ba2a88c1346 to remove reference in dm-wp-backup-snap-76mxp-hzb2f Create pvc mysql-pv-claim in dm-wp-backup-snap-76mxp-hzb2f with pv pvc-7fb33118-02a7-42db-9b18-2ba2a88c1346 Patch pv pvc-7fb33118-02a7-42db-9b18-2ba2a88c1346 with delete option Get pvc wp-pv-claim and pv pvc-297cb6ad-322b-4a9a-80a8-e51057d0e28a Patch pv pvc-297cb6ad-322b-4a9a-80a8-e51057d0e28a with retain option Deleted pvc wp-pv-claim Update pv pvc-297cb6ad-322b-4a9a-80a8-e51057d0e28a to remove reference in dm-wp-backup-snap-76mxp-hzb2f Update pv pvc-297cb6ad-322b-4a9a-80a8-e51057d0e28a to remove reference in dm-wp-backup-snap-76mxp-hzb2f Create pvc wp-pv-claim in dm-wp-backup-snap-76mxp-hzb2f with pv pvc-297cb6ad-322b-4a9a-80a8-e51057d0e28a Patch pv pvc-297cb6ad-322b-4a9a-80a8-e51057d0e28a with delete option === Step 5. Create pod with pvc created in step 4 build stage pod wordpress-589f976cd5-vbj5z build stage pod wordpress-mysql-d9b8d8884-9g4r5 === Step 6. Invoke velero to backup the temporary namespace using file system copy Get velero backup plan wp-backup-snap-76mxp-hzb2f Created velero backup plan generate-backup-kql6f
2. Recover data
root data-mover % go run main.go --action restore --backupName wp-backup-snap-76mxp-hzb2f --namespace wordpress === Step 1. Get filesystem copy backup generate-backup-kql6f === Step 2. Delete namespace === Step 3. Invoke velero to restore the temporary namespace to given namespace Created velero restore plan generate-restore-ppdmp === Step 4. Delete pod in given namespace Deleted pod stage-wordpress-589f976cd5-vbj5z-d4zg7 Deleted pod stage-wordpress-mysql-d9b8d8884-9g4r5-xzr8r === Step 5. Invoke velero to restore original namespace Created velero restore plan generate-restore-tfqhz
reference resources
Container Storage Interface Snapshot Support in Velero
https://velero.io/docs/v1.7/csi/#docs
Backup Storage Locations and Volume Snapshot Locations
https://velero.io/docs/v1.7/locations/#limitations--caveats
What resources does Velero back up
What resources does Velero back up - Kubernetes velero Chinese community
Escorting cloud native critical workloads -- Velero backup disaster recovery best practices
Escorting cloud native critical workloads -- Velero backup disaster recovery best practices