Addressing of Ceph rbd (download of rbd block file)

1. Addressing of CEPH rbd and rgw (download of rbd block / object storage file)

1.1. Storage of indexes

The indexes of ceph are stored in omap

  • rbd – one rbd per rbd pool_ Directory file
  • rgw – each bucket has one or more index files

1.2. rbd addressing

  • Receive rbd client requests

    In the request

    • rbd authentication information
    • rbd pool name
    • rbd name
  • How to find the corresponding 4m object file on the osd through the request information

    • Each rbd pool has one rbd_directory file
      The file / metadata of the file (format 1/format 2) stores bidirectional mappings of all image IDs and image name s of the pool

      for example

      rados -p test001  listomapvals rbd_directory
      id_603d6b8b4567
      value (8 bytes) :
      00000000  04 00 00 00 74 65 73 74                           |....test|
      00000008
      
      id_60486b8b4567
      value (8 bytes) :
      00000000  04 00 00 00 61 61 61 61                           |....aaaa|
      00000008
      
      name_aaaa
      value (16 bytes) :
      00000000  0c 00 00 00 36 30 34 38  36 62 38 62 34 35 36 37  |....60486b8b4567|
      00000010
      
      name_test
      value (16 bytes) :
      00000000  0c 00 00 00 36 30 33 64  36 62 38 62 34 35 36 37  |....603d6b8b4567|
      00000010
      

      You can quickly find the corresponding rbd id/rbd name through rbd name /rbd id

    • The object name of rbd is rbd_head.{id}. {serial number}
      So after you know the id, you can find all the object files corresponding to this volume

    • Other information about rbd, such as snapshots, can be obtained by obtaining rbd_ header. The omap information of {ID} can be obtained

      • create_timestamp – timestamp created
      • features – rbd features
      • object_prefix – rbd prefix(rbd_data.{id})
      • order – 2**order = rbd block
      • Parent – parent volume id and snapid, (parent volume - > snap - > clone)
      • size – rbd size
      • snap_seq – snapshot information
      create_timestamp
      value (8 bytes) :
      00000000  bb 39 68 60 72 aa 49 1b                           |.9h`r.I.|
      00000008
      
      features
      value (8 bytes) :
      00000000  3d 00 00 00 00 00 00 00                           |=.......|
      00000008
      
      object_prefix
      value (25 bytes) :
      00000000  15 00 00 00 72 62 64 5f  64 61 74 61 2e 36 30 34  |....rbd_data.604|
      00000010  38 36 62 38 62 34 35 36  37                       |86b8b4567|
      00000019
      
      order
      value (1 bytes) :
      00000000  16                                                |.|
      00000001
      
      parent
      value (48 bytes) :
      00000000  01 01 2a 00 00 00 08 00  00 00 00 00 00 00 0e 00  |..*.............|
      00000010  00 00 66 62 64 66 61 63  35 63 65 66 61 31 66 38  |..fbdfac5cefa1f8|
      00000020  de 00 00 00 00 00 00 00  00 00 00 00 19 00 00 00  |................|
      00000030
      
      size
      value (8 bytes) :
      00000000  00 00 00 80 02 00 00 00                           |........|
      00000008
      
      snap_seq
      snapshot_0000000000000173
      value (113 bytes) :
      00000000  06 01 6b 00 00 00 73 01  00 00 00 00 00 00 04 00  |..k...s.........|
      00000010  00 00 74 65 73 74 00 00  00 00 19 00 00 00 3d 00  |..test........=.|
      00000020  00 00 00 00 00 00 01 01  2a 00 00 00 08 00 00 00  |........*.......|
      00000030  00 00 00 00 0e 00 00 00  66 62 64 66 61 63 35 63  |........fbdfac5c|
      00000040  65 66 61 31 66 38 de 00  00 00 00 00 00 00 00 00  |efa1f8..........|
      00000050  00 00 19 00 00 00 00 00  00 00 00 00 00 00 00 01  |................|
      00000060  01 04 00 00 00 00 00 00  00 c4 7e 69 60 bc 83 2c  |..........~i`..,|
      00000070  0b                                                |.|
      00000071
      
  • After finding it and returning it to the client assembly, you can get a complete rbd

1.3. Small notes

By the way, remember the flow of omap operation commands such as rados listompvals

  • mon uses the crush algorithm to calculate the osd of the object
  • Then check the omap on the osd and return the results to the client

1.4. Addressing of rgw

  • Receive client requests

    Request with

    • Authentication information
    • bucket name
    • Object name
  • How to obtain the small file objects of all 4m blocks corresponding to the rgw object through the request information

    • The index object id can be obtained through the bucket
    • Calculate the osd of the index through crush
    • Obtain all object information by querying omap information
    • Return to rgw
    • rgw assembly return

1.5. Data recovery ideas

1.5.1. scene

After the machine room is powered down, the data and omap are damaged and the service cannot be started
The cluster cannot read or write, nor can it read rbd information

1.5.2. thinking

Create a new cluster, import metadata, and then import data

  • The number of new clusters is the same as that of the current cluster
  • The information about creating a pool can be obtained through the meta directory, osdmap crush map, or osdmap
# crush get
osdmaptool osdmap.43__0_641716DC__none --export-crush /tmp/crushmap.bin
crushtool -d /tmp/crushmap.bin -o /tmp/crushmap.txt
# omap
osdmaptool --print osdmap.43__0_641716DC__none
#----
epoch 43
fsid 8685ec71-96a6-413a-9e4d-ff47071dc4f5
created 2020-12-22 16:57:37.845173
modified 2021-04-04 15:51:46.729929
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 6
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release luminous

pool 1 '.rgw.root' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 7 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 2 'test001' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 40 flags hashpspool stripe_width 0
	removed_snaps [1~3]
pool 3 'default.rgw.control' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 21 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.meta' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 23 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.log' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 25 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 6 'default.rgw.buckets.index' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 28 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 7 'default.rgw.buckets.data' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 31 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 8 'default.rgw.buckets.non-ec' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 34 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw

max_osd 1
osd.0 up   in  weight 1 up_from 42 up_thru 42 down_at 41 last_clean_interval [37,40) 9.134.1.121:6801/12622 9.134.1.121:6802/12622 9.134.1.121:6803/12622 9.134.1.121:6804/12622 exists,up 59ebf8d5-e7f7-4c46-8e05-bac5140eee89
  • Gets the RBD of the pool_ Directory metadata
  • Get rbd_header metadata
  • Get RBD from map_ directory rbd_ The osd where the header is located, and put the metadata to omap
  • After metadata reconstruction, omap can be replaced with the old cluster, or the data of the old cluster can be copied to the new cluster to start osd

It seems feasible, but the relationship between pg and osdmap and the omap version are not easy to solve. Come back when there are scenarios or needs

Keywords: Ceph

Added by artist-ink on Sat, 01 Jan 2022 19:52:31 +0200