Details of Zookeeper data structure

Zookeeper

https://zookeeper.apache.org/doc/current/zookeeperOver.html

ZooKeeper is a distributed, open-source coordination service for distributed applications.
It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming.

It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. 

Zookeeper is a high-performance, highly available and strictly ordered distributed coordination service, which provides unified configuration, naming, synchronization and group service.

At the same time, zookeeper itself supports replication clusters. Instances are connected in pairs, maintaining the data state in memory and persistently storing transaction logs and snapshots. As long as most servers are available, the ZooKeeper service will be available. It's particularly fast in a "read first" workload. The ZooKeeper application runs on thousands of computers and performs best at a rate of about 10:1 when reading is more common than writing.

Features guaranteed by Zookeeper

  • Sequential Consistency: updates from clients are performed in the order they are sent.
  • Atomicity: update succeeded or failed. There is no intermediate state result.
  • Single System Image: no matter which server the client connects to, the client will see the data view of the same service.
  • Reliability: once the data update is performed, it will continue until the client overwrites the update.
  • Timelines: ensure that the customer view of the system is up-to-date within a specific time frame.

Data structure of Zookeeper

The namespace provided by ZooKeeper is very similar to that of a standard file system.

Names are a series of path elements separated by slashes (/). Each node in the ZooKeeper namespace is uniquely identified by a path.

Hierarchical namespace of ZooKeeper

Unlike standard file systems, each node in the ZooKeeper namespace can have data and child nodes associated with it. Just as having a file system allows files to be directories. Each node can store data, but it should be noted that the storage capacity is limited, generally no more than 1MiB.

Znode type
  • There are three types of Znode:

    • persistent node node will be persistent
    • The temporary node. After the client disconnects, ZooKeeper will automatically delete the temporary node
    • Sequential node. Each time a sequential node is created, the ZooKeeper will automatically add 10 digits after the path, starting from 1, and the maximum is 2147483647 (2^32-1) Each sequential node has a separate counter, which is monotonically increasing and maintained by the leader instance of Zookeeper.
  • There are actually four forms of Znode, and the default is persistent.

    • PERSISTENT persistent node: for example, create / test / a "hello", which is specified as a PERSISTENT node through the create < Path > < data > parameter
    • PERSISTENT_SEQUENTIAL (persistent sequential node / s00000000001), specified as a sequential node through the create - s < Path > < data > parameter
    • The EPHEMERAL temporary node is specified as a sequential node through the create - e < Path > < data > parameter
    • EPHEMERAL_ Sequence (temporary sequence node / s00000000001), which is specified as a temporary and sequence node through the create - S - e < Path > < data > parameter

  • Here are some examples:

(1) Create order node

[zk: 127.0.0.1:2281(CONNECTED) 0] create /seq_test/ ""
Created /seq_test
[zk: 127.0.0.1:2281(CONNECTED) 1] create -s /seq_test/s "hello"
Created /seq_test/s0000000001
[zk: 127.0.0.1:2281(CONNECTED) 2] create -s /seq_test/s "hello"
Created /seq_test/s0000000002
[zk: 127.0.0.1:2281(CONNECTED) 3] ls /seq_test
[s0000000001, s0000000002]

(2) Create temporary node

[zk: 127.0.0.1:2281(CONNECTED) 0] create /ephe_test/ ""
Created /ephe_test
[zk: 127.0.0.1:2281(CONNECTED) 1] create -e /ephe_test/e "hello"
Created /ephe_test/e
[zk: 127.0.0.1:2281(CONNECTED) 2] ls /ephe_test
[e]

Disconnect reconnect

[zk: 127.0.0.1:2281(CONNECTED) 0] ls /ephe_test
[]

(3) Create a temporary order node

[zk: 127.0.0.1:2281(CONNECTED) 0] create /ephe_seq_test/ ""
Created /ephe_seq_test
[zk: 127.0.0.1:2281(CONNECTED) 1] create -e -s /ephe_seq_test/s "hello"
Created /ephe_seq_test/s0000000001
[zk: 127.0.0.1:2281(CONNECTED) 2] ls /ephe_seq_test
[s0000000001]
Zxid(ZooKeeper Transaction Id)

Each change will produce a unique transaction id, Zxid (ZooKeeper Transaction Id), which is maintained by the leader instance of Zookeeper. The changes here include:

  • Any client connected to the Server
  • Any client is disconnected from the Server
  • Any Znode node is create d, set modified, delete d, rmr

Zxid is a 64 bit number. The high 32 bits represent the era. Starting from 1, each time a new leader is elected, it will increase by 1. The low 32 bits are the monotonous increasing numbers maintained in the current era, also starting from 1.

Properties of Znode

  • cZxid: transaction ID created.
  • ctime: created timestamp
  • mZxid: the modified transaction ID. mZxid and mtime will be updated after each modification (set).
  • mtime: modified timestamp
  • pZxid: the last updated transaction ID of the direct child node. When the child node changes (create, modify set, delete, rmr), pZxid will be updated.
  • Cversion: version number of the direct child node. When a child node changes (create, modify set, delete, rmr), the value of cversion increases by 1.
  • dataVersion: the version number of the node data. Each time the node is modified (set), the value of dataVersion will increase by 1 (even if the same data is set).
  • aclVersion: the version number of the node ACL. Each time the ACL of the node changes, the aclVersion value will increase by 1.
  • Ehemeralowner: when the current node is a temporary node, the value of this ehemeralowner is the session id held by the client.
  • dataLength: the length of data stored by the node, in B (bytes).
  • numChildren: number of direct child nodes.
➜ zkCli.sh -server 127.0.0.1:2281
[zk: 127.0.0.1:2281(CONNECTED) 0] get /

cZxid = 0x0
ctime = Thu Jan 01 08:00:00 CST 1970
mZxid = 0x0
mtime = Thu Jan 01 08:00:00 CST 1970
pZxid = 0x0
cversion = -1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 1

The default root node / and / zookeeper are present, so cZxid is 0x0.

[zk: 127.0.0.1:2281(CONNECTED) 1] ls /
[zookeeper]
[zk: 127.0.0.1:2281(CONNECTED) 2] create /test "hello"
Created /test
[zk: 127.0.0.1:2281(CONNECTED) 3] ls /
[zookeeper, test]

When a / test node is created, there are more children of the root node.

[zk: 127.0.0.1:2281(CONNECTED) 4] get /test
hello
cZxid = 0x100000002
ctime = Sat May 23 15:43:10 CST 2020
mZxid = 0x100000002
mtime = Sat May 23 15:43:10 CST 2020
pZxid = 0x100000002
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 5
numChildren = 0

Why the high 32-bit of cZxid is 0x1 indicates that the current era is the first generation, and the low 32-bit is 00000002 indicates that the second transaction operation of the current era created the / test node. Why 2? Because we passed zkCli.sh If the client connects to the Server, Zxid will be consumed once.

[zk: 127.0.0.1:2281(CONNECTED) 5] set /test "hello world"
cZxid = 0x100000002
ctime = Sat May 23 15:43:10 CST 2020
mZxid = 0x100000003
mtime = Sat May 23 15:43:41 CST 2020
pZxid = 0x100000002
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 11
numChildren = 0

Modify the data of / test node, mZxid, mtime and dataVersion have changed, and dataLength has changed to ("hello world" takes 11 bytes).

[zk: 127.0.0.1:2281(CONNECTED) 6] create /test/a ""   
Created /test/a
[zk: 127.0.0.1:2281(CONNECTED) 7] get /test              
hello world
cZxid = 0x100000002
ctime = Sat May 23 15:43:10 CST 2020
mZxid = 0x100000003
mtime = Sat May 23 15:43:41 CST 2020
pZxid = 0x100000004
cversion = 1
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 11
numChildren = 1

Create the child node / test/a, check the attribute of the / test node, and find that pZxid, cversion and numchildren have changed.

[zk: 127.0.0.1:2281(CONNECTED) 8] get /test/a

cZxid = 0x100000004
ctime = Sat May 23 15:44:12 CST 2020
mZxid = 0x100000004
mtime = Sat May 23 15:44:12 CST 2020
pZxid = 0x100000004
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0

Looking at the attributes of / test/a node, it is found that cZxid is consistent with pZxid of the parent node, which proves that pZxid is the last updated transaction ID of the direct child node.

Monitoring of Znode (Watch)

ZooKeeper supports Watch. The client can set the Watch on the znode.

When the znode changes, monitoring is triggered and removed. When monitoring is triggered, the client receives a packet indicating that the znode has changed.

If the client is disconnected from one of the ZooKeeper servers, the client receives a local notification.

New functions in 3.6.0:

The client can also set permanent recursive monitoring on the znode, which will not be deleted when triggered, and will trigger the change of registered znode and all sub znode recursively.

Client commands that support Watch:

  • stat path [watch]

  • ls path [watch]

  • ls2 path [watch]

  • get path [watch]

[zk: 127.0.0.1:2281(CONNECTED) 3] get /test/d watch

cZxid = 0x100000013
ctime = Sat May 23 16:47:41 CST 2020
mZxid = 0x100000013
mtime = Sat May 23 16:47:41 CST 2020
pZxid = 0x100000013
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0

At this time, use another client to change the data of the / test/d node, and we can see that the original client automatically receives a WATCHER notification.

[zk: 127.0.0.1:2281(CONNECTED) 4] 
WATCHER::

WatchedEvent state:SyncConnected type:NodeDataChanged path:/test/d

@SvenAugustus(https://www.flysium.xyz/)
More attention to WeChat official account, focus on sharing the dry cargo related to server development and programming:

Keywords: Big Data Zookeeper Apache Session Attribute

Added by sfmnetsys on Sun, 24 May 2020 12:33:08 +0300