Sunday, April 9, 2017

What if one of Couchbase Data Node goes Down

This post talks about impact on your system when one of data nodes of Couchbase is removed from cluster for some reason (software, hardware, or network). In terms of CAP theorem, this actually means that Network Partitioning has occurred. 
Couchbase Version: 4+ (Supports MultiDimensional Scaling). 


Let's make one thing very clear, before digging deeper into this situation. Overall, your system is still functional as only one node is out of cluster (hopefully you have more data nodes :D, and they are live and running!). Only, data which is stored on the down/unreachable node is going to get impacted. 

Couchbase shards data in vBuckets and they get distributed across all the data nodes. So, each data node will have a list of vBucket, some of them will be hosting primary data and others replica data. Nothing happens to replica data, as they will get serviced from their primary node.

Now, real question is how Couchbase is going to deal with the request (read/update/delete) for primary data-

To preserve strong consistency, Couchbase allows access only from primary node. If this is not enforced, and assume (primary) node writes data and immediately goes down (before replication is successful). Now, what if Couchbase decides to service the request from replica node ? Stale data will be returned or updated. This will make system highly Available, but inconsistent.

Couchbase designers chose consistency over availability. So any request for primary data will fail until the node is failed over, which will activate replica data on some other node. To fail over we need to click Fail Over on down node from Couchbase UI. After this we can manually rebalance, in which case data which was not replicated will get lost.

Related Posts:
My MDS in Couchbase
How Couchbase Identifies Node For Data Access
Create Couchbase Cluster using Docker


References
https://forums.couchbase.com/t/what-happens-when-a-node-in-the-cluster-goes-down/36
https://blog.jtclark.ca/2014/07/simple-recovery-of-a-couchbase-cluster-when-one-or-all-nodes-fail/

No comments:

Post a Comment