Saturday, November 19, 2016

How Couchbase identifies Node for Data Access

This post talks about how Couchbase identifies where exactly the document is stored to facilitate quick read/update.

Couchbase uses a term Bucket which is equivalent to the term Database in the relational world to logically partition all documents across the cluster (or data nodes). Being a distributed DB, it tries to evenly distribute or partition (or shard) data into virtual buckets known as vBuckets. Each vbucket owns a subset of the keys or document id (and of course corresponding data as well). Documents get mapped to vBucket by applying the hash function on the key. Once the vBucket is identified there is a separate lookup table to know which node hosts the vBucket. The thing which maps different virtual buckets to nodes is known as vBucket map. (Note: Cluster Map contains the mapping of which service belong to which node at any given point of time)



Steps Involved (as shown in the diagram):
  1. Hash(key) to get vBucket identifier (vb-k) which hosts/owns Key.
  2. Looking up vBucket map tells vb-k is owned by node or server n-t
  3. The request is sent directly to the primary server node, n-t to fetch the document. 

Both hashing function as well number of vBucket is configurable. By default, Couchbase automatically divides each bucket into 1024 active vBuckets and 1024 replica buckets (per replica). When there is only one node, all vBuckets reside on that node. 

What if a new node gets added to the cluster?
When the number of nodes scales up or down the information stored in vBuckets are re-distributed among the available nodes and then the corresponding vBucket map is also updated. This entire process is known as rebalancing. Rebalancing doesn't happen automatically; as an administrator/developer you need to trigger it either from UI or through CLI.

What if the primary node is down?
All read/update request by default go to the primary node. So, if a primary node fails for some reason, Couchbase takes off that node from the cluster (if configured to do so) and promotes the replica to become the primary node. So you can fail-over to replica node manually or automatically. You can address the issue with the node, fix it and add it back to the cluster by performing rebalancing. Below table shows a sample mapping.
+------------+---------+---------+
| vbucket id | active | replica |
| 0 | node A | node B |
+------------+---------+---------+ | 1 | node B | node C |
+------------+---------+---------+
| 2 | node C | node D |
| 3 | node D | node A |


References:

---
happy learning !!!

No comments:

Post a Comment