Ceph Delete Add osd

1. Scenes

Current state of cluster

# ceph -s                                               
    cluster e6ccdfaa-a729-4638-bcde-e539b1e7a28d
     health HEALTH_OK
     monmap e1: 3 mons at {bdc2=172.16.251.2:6789/0,bdc3=172.16.251.3:6789/0,bdc4=172.16.251.4:6789/0}                
            election epoch 82, quorum 0,1,2 bdc2,bdc3,bdc4
     osdmap e3132: 27 osds: 26 up, 26 in                    
     flags sortbitwise                 
      pgmap v13259021: 4096 pgs, 4 pools, 2558 GB data, 638 kobjects
            7631 GB used, 89048 GB / 96680 GB avail
                4096 active+clean
  client io 34720 kB/s wr, 0 op/s rd, 69 op/s wr

You can see that the state of the cluster is OK, but 27 OSDs can have a state of down+up >Supplementary knowledge: osd status

up: The daemon can provide IO services while it is running;
down: The daemon is not running and cannot provide IO services;
in: Contains data;
out: does not contain data
# ceph osd tree |grep down                              
 0  3.63129         osd.0     down        0          1.00000

This means that the osd.0 process is not running and does not contain data, where data is stored by the ceph cluster and can be validated.

  • First is the daemon

    # systemctl status ceph-osd@0 
    ● ceph-osd@0.service - Ceph object storage daemon
       Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled)
       Active: failed (Result: start-limit) since Four 2017-04-06 09:26:04 CST; 1h 2min ago
      Process: 480723 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
      Process: 480669 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS) 
     Main PID: 480723 (code=exited, status=1/FAILURE)  
    4 Month 06 09:26:04 bdc2 systemd[1]: Unit ceph-osd@0.service entered failed state.  
    4 Month 06 09:26:04 bdc2 systemd[1]: ceph-osd@0.service failed. 
    4 Month 06 09:26:04 bdc2 systemd[1]: ceph-osd@0.service holdoff time over, scheduling restart.  
    4 Month 06 09:26:04 bdc2 systemd[1]: start request repeated too quickly for ceph-osd@0.service  
    4 Month 06 09:26:04 bdc2 systemd[1]: Failed to start Ceph object storage daemon.
    4 Month 06 09:26:04 bdc2 systemd[1]: Unit ceph-osd@0.service entered failed state.  
    4 Month 06 09:26:04 bdc2 systemd[1]: ceph-osd@0.service failed.

    View logs for osd.0

    # tail -f /var/log/ceph/ceph-osd.0.log
    2017-04-06 09:26:04.531004 7f75f33d5800  0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)  
    2017-04-06 09:26:04.531520 7f75f33d5800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
    2017-04-06 09:26:04.531528 7f75f33d5800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' co
    nfig option
    2017-04-06 09:26:04.531548 7f75f33d5800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice is supported  
    2017-04-06 09:26:04.532318 7f75f33d5800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)  
    2017-04-06 09:26:04.532384 7f75f33d5800  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is disabled by conf   
    2017-04-06 09:26:04.730841 7f75f33d5800 -1 filestore(/var/lib/ceph/osd/ceph-0) Error initializing leveldb : IO error: /var/lib/ceph/osd/ceph-0/current/omap/MANIFEST-004467: In
    put/output error   
    
    2017-04-06 09:26:04.730870 7f75f33d5800 -1 osd.0 0 OSD:init: unable to mount object store  
    2017-04-06 09:26:04.730879 7f75f33d5800 -1  ** ERROR: osd init failed: (1) Operation not permitted

    Again, check the data

    # cd /var/lib/ceph/osd/ceph-0/current 
    # ls -lrt |tail -10 
    drwxr-xr-x  2 ceph ceph   58 4 February 200:45 4.2e9_TEMP  
    drwxr-xr-x  2 ceph ceph   58 4 February 200:45 4.355_TEMP  
    drwxr-xr-x  2 ceph ceph   58 4 February 200:45 4.36c_TEMP  
    drwxr-xr-x  2 ceph ceph   58 4 February 200:45 4.3ae_TEMP  
    drwxr-xr-x  2 ceph ceph   58 4 February 200:46 4.3b2_TEMP  
    drwxr-xr-x  2 ceph ceph   58 4 February 200:46 4.3e8_TEMP  
    drwxr-xr-x  2 ceph ceph   58 4 February 200:46 4.3ea_TEMP  
    -rw-r--r--. 1 ceph ceph   10 4 February 08:53 commit_op_seq   
    drwxr-xr-x. 2 ceph ceph   349 4 May 10:01 omap
    -rw-r--r--. 1 ceph ceph   0 4 June 0609:26 nosnap   

    Select two PGs at will for viewing

    
    # ceph pg dump|grep 4.3ea   
    dumped all in format plain 
    4.3ea   2   0   0   0   0   8388608 254 254 active+clean2017-04-06 01:55:04.754593  1322'2543132:122[26,2,12]   26[
    26,2,12]26  1322'2542017-04-06 01:55:04.754546  1322'2542017-04-02 00:46:12.611726 
    # ceph pg dump|grep 4.3e8   
    dumped all in format plain 
    4.3e8   1   0   0   0   0   4194304 12261226active+clean2017-04-06 01:26:43.827061  1323'1226   3132:127[2,15,5]2 [
    2,15,5] 2   1323'1226   2017-04-06 01:26:43.827005  1323'1226   2017-04-06 01:26:43.827005                   ```                                                  

You can see that the three copies of 4.3ea very 4.3e8 are on [26,2,12] and [2,15,5] osd s, respectively

  • Summary:

    >Clearly, the ceph cluster is now in a normal state, has kicked osd.0 out of the cluster, the daemon of osd.0 cannot start, restarting the service will report errors in the log (solution unknown), and the data stored in it will not be in the ceph cluster anymore. That is to say, this OSD is chicken ribs now, so we plan to kick it out completelyExit the cluster, scrub (zap), and rejoin the cluster.

2. Remove osd

  1. Move Out of Cluster (Manage Node Execution)
    # ceph osd out 0 (REWEIGHT value becomes 0 in ceph osd tree)
  2. Stop service (target node execution)
    # systemctl stop ceph-osd@0 (in CEPH OSD tree, state changes to DOWN)

    The osd is already out and down, skipping 1,2 steps

  3. Remove crush
    # ceph osd crush remove osd.0
  4. Delete key
    # ceph auth del osd.0
  5. Remove osd
    # ceph osd rm 0
  6. Uninstall mounted directories
    # df -h |grep ceph-0
    /dev/sdc1                  3.7T  265G  3.4T    8% /var/lib/ceph/osd/ceph-0                                                                                                     
    # umount /var/lib/ceph/osd/ceph-0
     umount: /var/lib/ceph/osd/ceph-0: Target busy.                                                                                                                                          
        (In some cases, by lsof(8) or fuser(1)                                                                                                                                         
         Find useful information about the process using the device)        

    Tips cannot be uninstalled, use the fuser command to see what's occupied

    # fuser -m -v /var/lib/ceph/osd/ceph-0                                                                                                                            
                     User Process Number Permission Command                                                                                                                                        
    /var/lib/ceph/osd/ceph-0:                                                                                                                                                      
                     root     kernel mount /var/lib/ceph/osd/ceph-0                                                                                                            
                     root      212444 ..c.. bash  

    Terminate this occupied bash process

    # kill -9 212444
     Or use fuser to kill
    # fuser -m -v -i -k /var/lib/ceph/osd/ceph-0

Re-uninstall

# umount /var/lib/ceph/osd/ceph-0

Uninstall Successful

7. Erase Disks When df-h above, you already see that ceph-0 corresponds to / dev/sdc (you can also use the ceph-disk list to view it)

# ceph-disk zap /dev/sdc  
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

****************************************************************************   
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk   
verification and recovery are STRONGLY recommended.
****************************************************************************   
GPT data structures destroyed! You may now partition the disk using fdisk or   
other utilities.   
Creating new GPT entries.  
The operation has completed successfully.    

View the ceph status at this time

# ceph -s                                                                                                                                                   
    cluster e6ccdfaa-a729-4638-bcde-e539b1e7a28d                                                                                                                             
     health HEALTH_WARN                                                                                                                                                        
            170 pgs backfill_wait                                                                                                                                              
            10 pgs backfilling                                                                                                                                                 
            362 pgs degraded                                                                                                                                                   
            362 pgs recovery_wait                                                                                                                                              
            436 pgs stuck unclean                                                                                                                                              
            recovery 5774/2136302 objects degraded (0.270%)                                                                                                                    
            recovery 342126/2136302 objects misplaced (16.015%)                                                                                                                
     monmap e1: 3 mons at {bdc2=172.16.251.2:6789/0,bdc3=172.16.251.3:6789/0,bdc4=172.16.251.4:6789/0}                                                                         
            election epoch 82, quorum 0,1,2 bdc2,bdc3,bdc4                                                                                                                     
     osdmap e3142: 26 osds: 26 up, 26 in; 180 remapped pgs                                                                                                                     
            flags sortbitwise                                                                                                                                                  
      pgmap v13264634: 4096 pgs, 4 pools, 2558 GB data, 639 kobjects                                                                                                           
            7651 GB used, 89029 GB / 96680 GB avail                                                                                                                            
            5774/2136302 objects degraded (0.270%)                                                                                                                             
            342126/2136302 objects misplaced (16.015%)                                                                                                                         
                3554 active+clean                                                                                                                                              
                 362 active+recovery_wait+degraded                                                                                                                             
                 170 active+remapped+wait_backfill                                                                                                                             
                  10 active+remapped+backfilling
     recovery io 354 MB/s, 89 objects/s
     client io 1970 kB/s wr, 0 op/s rd, 88 op/s wr 

Wait for cluster recovery to end and return to OK state before adding a new OSD.(I don't know why I have to wait until OK to add it.)

3. Adding osd

The disks corresponding to the uninstalled osd have been erased above. It is convenient to add them here, just use the ceph-deploy tool to add them directly.

# ceph-deploy --overwrite-conf osd create bdc2:/dev/sdc                                                                                                  

At the end of the command execution, you can see that the osd has been added again, and the id is 0

# df -h |grep ceph-0
/dev/sdc1                  3.7T   74M  3.7T    1% /var/lib/ceph/osd/ceph-0  

Initially fresh, not much space used

# ceph-disk list |grep osd                                                                                                                                 
 /dev/sdc1 ceph data, active, cluster ceph, osd.0, journal /dev/sdc2
 /dev/sdd1 ceph data, active, cluster ceph, osd.1, journal /dev/sdd2
 /dev/sde1 ceph data, active, cluster ceph, osd.2, journal /dev/sde2
 /dev/sdf1 ceph data, active, cluster ceph, osd.3, journal /dev/sdf2

Review the ceph status at this time, restore to 27 osd s, wait until the cluster returns to ok state


# ceph -s                                                                                                                                                  
    cluster e6ccdfaa-a729-4638-bcde-e539b1e7a28d                                                                                                                               
     health HEALTH_WARN                                                                                                                                                        
            184 pgs backfill_wait                                                                                                                                              
            6 pgs backfilling                                                                                                                                                  
            374 pgs degraded                                                                                                                                                   
            374 pgs recovery_wait                                                                                                                                              
            83 pgs stuck unclean                                                                                                                                               
            recovery 4605/2114056 objects degraded (0.218%)                                                                                                                    
            recovery 298454/2114056 objects misplaced (14.118%)                                                                                                                
     monmap e1: 3 mons at {bdc2=172.16.251.2:6789/0,bdc3=172.16.251.3:6789/0,bdc4=172.16.251.4:6789/0}                                                                         
            election epoch 82, quorum 0,1,2 bdc2,bdc3,bdc4                                                                                                                     
     osdmap e3501: 27 osds: 27 up, 27 in; 190 remapped pgs                                                                                                                     
            flags sortbitwise                                                                                                                                                  
      pgmap v13275552: 4096 pgs, 4 pools, 2558 GB data, 639 kobjects                                                                                                           
            7647 GB used, 92751 GB / 100398 GB avail                                                                                                                           
            4605/2114056 objects degraded (0.218%)                                                                                                                             
            298454/2114056 objects misplaced (14.118%)                                                                                                                         
                3532 active+clean                                                                                                                                              
                 374 active+recovery_wait+degraded                                                                                                                             
                 184 active+remapped+wait_backfill                                                                                                                             
                   6 active+remapped+backfilling                                                                                                                            
recovery io 264 MB/s, 67 objects/s                                                                                                                              
  client io 1737 kB/s rd, 63113 kB/s wr, 60 op/s rd, 161 op/s wr                                                                             ```          

//Reference link: [http://www.cnblogs.com/sammyliu/p/555555218.html] (http://www.cnblogs.com/sammyliu/p/5555218.html)

Keywords: Ceph osd glibc lsof

Added by roygbiv on Fri, 12 Jul 2019 20:41:18 +0300