1. Addressing of CEPH rbd and rgw (download of rbd block / object storage file)
1.1. Storage of indexes
The indexes of ceph are stored in omap
- rbd – one rbd per rbd pool_ Directory file
- rgw – each bucket has one or more index files
1.2. rbd addressing
-
Receive rbd client requests
In the request
- rbd authentication information
- rbd pool name
- rbd name
-
How to find the corresponding 4m object file on the osd through the request information
-
Each rbd pool has one rbd_directory file
The file / metadata of the file (format 1/format 2) stores bidirectional mappings of all image IDs and image name s of the poolfor example
rados -p test001 listomapvals rbd_directory id_603d6b8b4567 value (8 bytes) : 00000000 04 00 00 00 74 65 73 74 |....test| 00000008 id_60486b8b4567 value (8 bytes) : 00000000 04 00 00 00 61 61 61 61 |....aaaa| 00000008 name_aaaa value (16 bytes) : 00000000 0c 00 00 00 36 30 34 38 36 62 38 62 34 35 36 37 |....60486b8b4567| 00000010 name_test value (16 bytes) : 00000000 0c 00 00 00 36 30 33 64 36 62 38 62 34 35 36 37 |....603d6b8b4567| 00000010
You can quickly find the corresponding rbd id/rbd name through rbd name /rbd id
-
The object name of rbd is rbd_head.{id}. {serial number}
So after you know the id, you can find all the object files corresponding to this volume -
Other information about rbd, such as snapshots, can be obtained by obtaining rbd_ header. The omap information of {ID} can be obtained
- create_timestamp – timestamp created
- features – rbd features
- object_prefix – rbd prefix(rbd_data.{id})
- order – 2**order = rbd block
- Parent – parent volume id and snapid, (parent volume - > snap - > clone)
- size – rbd size
- snap_seq – snapshot information
create_timestamp value (8 bytes) : 00000000 bb 39 68 60 72 aa 49 1b |.9h`r.I.| 00000008 features value (8 bytes) : 00000000 3d 00 00 00 00 00 00 00 |=.......| 00000008 object_prefix value (25 bytes) : 00000000 15 00 00 00 72 62 64 5f 64 61 74 61 2e 36 30 34 |....rbd_data.604| 00000010 38 36 62 38 62 34 35 36 37 |86b8b4567| 00000019 order value (1 bytes) : 00000000 16 |.| 00000001 parent value (48 bytes) : 00000000 01 01 2a 00 00 00 08 00 00 00 00 00 00 00 0e 00 |..*.............| 00000010 00 00 66 62 64 66 61 63 35 63 65 66 61 31 66 38 |..fbdfac5cefa1f8| 00000020 de 00 00 00 00 00 00 00 00 00 00 00 19 00 00 00 |................| 00000030 size value (8 bytes) : 00000000 00 00 00 80 02 00 00 00 |........| 00000008 snap_seq snapshot_0000000000000173 value (113 bytes) : 00000000 06 01 6b 00 00 00 73 01 00 00 00 00 00 00 04 00 |..k...s.........| 00000010 00 00 74 65 73 74 00 00 00 00 19 00 00 00 3d 00 |..test........=.| 00000020 00 00 00 00 00 00 01 01 2a 00 00 00 08 00 00 00 |........*.......| 00000030 00 00 00 00 0e 00 00 00 66 62 64 66 61 63 35 63 |........fbdfac5c| 00000040 65 66 61 31 66 38 de 00 00 00 00 00 00 00 00 00 |efa1f8..........| 00000050 00 00 19 00 00 00 00 00 00 00 00 00 00 00 00 01 |................| 00000060 01 04 00 00 00 00 00 00 00 c4 7e 69 60 bc 83 2c |..........~i`..,| 00000070 0b |.| 00000071
-
-
After finding it and returning it to the client assembly, you can get a complete rbd
1.3. Small notes
By the way, remember the flow of omap operation commands such as rados listompvals
- mon uses the crush algorithm to calculate the osd of the object
- Then check the omap on the osd and return the results to the client
1.4. Addressing of rgw
-
Receive client requests
Request with
- Authentication information
- bucket name
- Object name
-
How to obtain the small file objects of all 4m blocks corresponding to the rgw object through the request information
- The index object id can be obtained through the bucket
- Calculate the osd of the index through crush
- Obtain all object information by querying omap information
- Return to rgw
- rgw assembly return
1.5. Data recovery ideas
1.5.1. scene
After the machine room is powered down, the data and omap are damaged and the service cannot be started
The cluster cannot read or write, nor can it read rbd information
1.5.2. thinking
Create a new cluster, import metadata, and then import data
- The number of new clusters is the same as that of the current cluster
- The information about creating a pool can be obtained through the meta directory, osdmap crush map, or osdmap
# crush get osdmaptool osdmap.43__0_641716DC__none --export-crush /tmp/crushmap.bin crushtool -d /tmp/crushmap.bin -o /tmp/crushmap.txt
# omap osdmaptool --print osdmap.43__0_641716DC__none #---- epoch 43 fsid 8685ec71-96a6-413a-9e4d-ff47071dc4f5 created 2020-12-22 16:57:37.845173 modified 2021-04-04 15:51:46.729929 flags sortbitwise,recovery_deletes,purged_snapdirs crush_version 6 full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 require_min_compat_client jewel min_compat_client jewel require_osd_release luminous pool 1 '.rgw.root' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 7 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 2 'test001' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 40 flags hashpspool stripe_width 0 removed_snaps [1~3] pool 3 'default.rgw.control' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 21 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 4 'default.rgw.meta' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 23 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 5 'default.rgw.log' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 25 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 6 'default.rgw.buckets.index' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 28 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 7 'default.rgw.buckets.data' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 31 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 8 'default.rgw.buckets.non-ec' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 34 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw max_osd 1 osd.0 up in weight 1 up_from 42 up_thru 42 down_at 41 last_clean_interval [37,40) 9.134.1.121:6801/12622 9.134.1.121:6802/12622 9.134.1.121:6803/12622 9.134.1.121:6804/12622 exists,up 59ebf8d5-e7f7-4c46-8e05-bac5140eee89
- Gets the RBD of the pool_ Directory metadata
- Get rbd_header metadata
- Get RBD from map_ directory rbd_ The osd where the header is located, and put the metadata to omap
- After metadata reconstruction, omap can be replaced with the old cluster, or the data of the old cluster can be copied to the new cluster to start osd
It seems feasible, but the relationship between pg and osdmap and the omap version are not easy to solve. Come back when there are scenarios or needs