Detailed usage of the ceph-kvstore-tool tool

Article Directory

brief introduction

The ceph-kvstore-tool is used to obtain key-value metadata stored in a leveldb or rocksdb database.And the tool can configure the data in kvstore as if it were operating osd map on an offline osd
Using this tool, you need to install the ceph-test- package

This description of the tool is based on the ceph12.2.1 version


Execute ceph-kvstore-tool-h to see the following help information

[root@node1 ~]# ceph-kvstore-tool -h
Usage: ceph-kvstore-tool <leveldb|rocksdb|bluestore-kv> <store path> command [args...]

  list [prefix]
  list-crc [prefix]
  exists <prefix> [key]
  get <prefix> <key> [out <file>]
  crc <prefix> <key>
  get-size [<prefix> <key>]
  set <prefix> <key> [ver <N>|in <file>]
  rm <prefix> <key>
  rm-prefix <prefix>
  store-copy <path> [num-keys-per-tx]
  store-crc <path>
  compact-prefix <prefix>
  compact-range <prefix> <start> <end>

Let's take one parameter and see how to use it

  • list[prefix] Views the kv key value attributes stored in all databases. If a prefix field exists, the printed key value results will be printed with the encoding attributes of URL s
    1. View the key value contents of the mon database
      A. Cat/var/lib/ceph/mon/ceph-node1/kv_backend View mon database type
      [root@node1 ~]# cat /var/lib/ceph/mon/ceph-node1/kv_backend 
      b. systemctl stop ceph-mon@node1 needs to stop the Mon service, at which point the directory is locked and access to key-value content in the database is not allowed.
      If you do not stop the monservice, the following problems will occur, and it is clear that LOCK files exist when mon runs, preventing other tools from attempting to obtain database content
      [root@node1 ~]# ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ list
      failed to open type rocksdb path /var/lib/ceph/mon/ceph-node1/store.db/: (22) Invalid argument
      2019-08-09 19:59:31.796330 7fb705a48e80 -1 rocksdb: IO error: lock /var/lib/ceph/mon/ceph-node1/store.db//LOCK: Resource temporarily unavailable
      Access after stopping mon

      The list command lists all the table items stored in the current mon database, each of which is similar to'auth': 251
      c. See what table items are in the mon database
      ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ list|awk '{print $1}'|uniq
      The output is as follows:
      You can see that mon maintains a very large number of cluster tables, basically including all the component tables of ceph
      d. View the osdmap table entries
      ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ list|grep osdmap |head -10
      The output is as follows:
      health	osdmap
      osdmap	1000
      osdmap	1001
      osdmap	1002
      osdmap	1003
      osdmap	1004
      osdmap	1005
      osdmap	1006
      osdmap	1007
      osdmap	1008
    2. View the contents of key values stored in the bluestore database
      a. Similarly, you need to stop an osd first
      systemctl stop ceph-osd@1
      b. Since the bluestore database does not exist explicitly, it needs to be retrieved with the corresponding tool ceph-bluestore-tool, which can be used as a reference. Use of ceph-bluestore-tool
      Mkdir/ceph-1 Create a folder to store the bluestore rocksdb database
      ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/ceph-1 --out-dir /ceph-1/
      The output is as follows
      infering bluefs devices from bluestore path
      action bluefs-export
       slot 1 /var/lib/ceph/osd/ceph-1/block
      The / ceph-1 Directory now holds the exported db folder
      The bluestore mainly stores the object's metadata, so its ordered byte table (.sst) is significantly more than mon's
      c. View the kv content of the bluestore as follows
      ceph-kvstore-tool rocksdb /ceph-1/db/ list|head -10
      B	blocks
      B	blocks_per_key
      B	bytes_per_block
      B	size
      C	1.0s2_head
      C	1.10s0_head
      C	1.11s0_head
      C	1.12s2_head
      C	1.13s0_head
      C	1.14s0_head
  • list-crc [prefix] print CRC checkcodes for kv key-value pairs in a database
    ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ list-crc |grep osdmap |head -10
    The output is as follows:
    [root@node1 ~]# ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ list-crc |grep osdmap |head -10
    health	osdmap	3928512586
    osdmap	1000	798511892
    osdmap	1001	1507770938
    osdmap	1002	2750577144
    osdmap	1003	4273498913
    osdmap	1004	1590290088
    osdmap	1005	636668385
    osdmap	1006	1658794114
    osdmap	1007	2689193714
    osdmap	1008	2971171276
    Verify again that the crc check code is unique for each table item as follows:
    [root@node1 ~]# ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ list-crc |grep osdmap |wc -l
    [root@node1 ~]# ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ list-crc |grep osdmap|awk '{print $3}'|uniq |wc -l
    You can see that the crc checkcodes for each table item are different, so it's interesting for partners to study the algorithm for the encryption mechanism of the crc checkcodes
  • Exists <prefix> [key] Checks for the existence of a corresponding map in the kv database, and if a map exists, checks for the existence of a version of the corresponding map. This subcommand checks for missing table entries for related components in the database
    [root@node1 ~]# ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ exists osdmap
    (osdmap, ) exists
    [root@node1 ~]# ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ exists osdmap 1005
    (osdmap, 1005) exists
    [root@node1 ~]# ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ exists osdmap 6005
    (osdmap, 6005) does not exist
  • Get <prefix> <key> [out <file>] This subcommand retrieves the contents of the corresponding version of the table item and specifies the output file of the result
    [root@node1 ~]# ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ get osdmap 1000
    (osdmap, 1000)
    00000000  08 07 94 16 00 00 05 01  97 15 00 00 fa 27 f0 41  |.............'.A|
    00000010  0c e9 4d f1 a4 bd 5e 37  67 88 34 bd e8 03 00 00  |..M...^7g.4.....|
    00000020  95 b5 4a 5d a5 ba 74 35  ff ff ff ff ff ff ff ff  |..J]..t5........|
    00000030  ff ff ff ff 00 00 00 00  00 00 00 00 ff ff ff ff  |................|
    00000040  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00  |................|
    00000050  03 00 00 00 01 01 01 1c  00 00 00 01 00 00 00 19  |................|
    00000060  48 00 00 10 00 00 00 02  00 1a 90 0a c0 37 b1 00  |H............7..|
    00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    00000080  01 00 00 01 01 00 00 00  00 00 00 00 00 00 00 00  |................|
    00000090  ff ff ff ff 00 00 00 00  01 01 00 00 00 00 00 00  |................|
    000000a0  00 01 00 00 00 ff ff ff  ff 00 00 00 00 01 01 00  |................|
    000000b0  00 00 00 00 00 00 02 00  00 00 ff ff ff ff 00 00  |................|
    000000c0  00 00 01 01 00 00 00 00  00 00 00 03 00 00 00 ff  |................|
    From the above output, because after CEPH serializes the above data, we don't know exactly what osdmap is about, so we can start with a new tool, ceph-dencoder
    1. Here is a newer osdmap content, first enter the full_999 version into a file
      ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ get osdmap full_999 out ./osdmap.full
    2. Deserialization parsing using ceph-dencoder tool to get OSDMap content
      ceph-dencoder import osdmap.full type OSDMap decode dump_json
      Because one version of osdmap is still very informative, only a few are listed here
      "epoch": 999,
      "fsid": "fa27f041-0ce9-4df1-a4bd-5e37678834bd",
      "created": "2019-07-22 15:43:30.494296",
      "modified": "2019-08-07 19:26:59.891852",
      "flags": "noout,nobackfill,norecover,sortbitwise,recovery_deletes,purged_snapdirs",
      "crush_version": 30,
      "full_ratio": 0.950000,
      "backfillfull_ratio": 0.900000,
      "nearfull_ratio": 0.850000,
      "cluster_snapshot": "",
      "pool_max": 15,
      "max_osd": 10,
      "require_min_compat_client": "jewel",
      "min_compat_client": "jewel",
      "require_osd_release": "luminous",
      "pools": [
              "pool": 1,
              "pool_name": "data",
              "flags": 5,
              "flags_names": "hashpspool,ec_overwrites",
              "type": 3,
  • crc <prefix> <key>Get the crc check code for the corresponding version of the table item
    [root@node1 ~]# ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ crc osdmap 1000
    (osdmap, 1000)  crc 4064685290
  • Get-szie [<prefix> <key>] Gets the evaluated storage capacity size or the occupied storage size for the corresponding table item version
    [root@node1 ~]# ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ get-size osdmap 1000
    log - 0
    misc - 8580
    sst - 17752013
    total - 17760593
    total: 17760593
    estimated store size: 17760593
    (osdmap,1000) size 5786
  • Set <prefix> <key> [ver <N>lin <file>] Sets the version number of a table item by specifying the corresponding version or text as the setting value
    The version number of the specified table item in the osdmap can be reassigned as follows, which can also be injected through a version file fetched from the get we described earlier
    ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node1/store.db/ set osdmap 1000 ver 1001
    Similarly, we can configure our own cluster map and serialize the encode after modification and inject it into the database
  • RM <prefix> <key>Delete the version number of the specified table item
  • Rm-prefix <prefix>Delete all versions of specified table items with caution
  • Store-copy <path> [num-keys-per-tx] Copies all key value attributes under the specified path, num-keys-per-tx is the number of key values to be copied in each copy transaction
    The puzzle here is what is the difference between this copy and our common cp?Is it possible to specify a directory by itself, without considering whether it exists or not?Don't quite understand
  • Store-crc <path> [num-keys-per-tx] backup CRC for all key values
  • compact triggers compaction of the rocksdb database by default and some disk space is freed up after compaction.This is for all key value attributes
  • Compact-prefix <prefix>Triggers a compaction operation on a specified table item
  • Compacti-range <prefix> <start> <end> compaction s the version range of the specified table item


Individually, this tool is mainly used to check whether the database of each component is damaged, and it can provide an interface to get the content of the metadata layer of each component of ceph. Understanding how the metadata is stored by CEPH enables us to have a deeper understanding of the reliability and autonomy of the CEPH storage system.Understand.
It is also useful to manually repair some corrupted metadata map versions through get and set subcommands

Tags: Ceph Database osd CentOS

Posted on Fri, 09 Aug 2019 19:37:09 -0700 by abid786