Saturday, January 21, 2017

Scaling your Data Storage

The traditional approach of storing all data is - store it in a single data source and handle all read-write operations in a more centralized fashion. In this approach, you keep adding more memory and/or processing power (or buy a bigger and more powerful machine) as the load on your system grows. Well, this works for most cases, but a time will come when this doesn't scale. Upgrading hardware doesn't always help to achieve scalability.

So obvious solution to this problem is - start partitioning your data. 
This post covers possible partitioning approaches.

Let's take an example of an online system where you are storing user profile details and photographs (of course, along with other details). 

Dividing Schema / Scale Up

One way to partition data is - store profile detail on one server and photographs on other.  This way only specific read and write queries are sent and hence scaling is possible. This approach of partitioning is known as vertical partitioning. This approach basically divides your schema. So if you want to fetch complete data of a given user, you need to fetch data from two different data sources. 

Replicating Schema / Sharding / Scale Out

In this approach, you replicate the schema and then decide what data goes where. So all instances are exactly the same.  In this approach, profile details, as well as the photograph of a user, will be in a single instance.

Advantages:
  • If an instance goes down, only some of the users will get affected (High Availability), so your system overall doesn't get impacted. 
  • There is no master-slave thing here so we can perform write in parallel. Writing is a major bottleneck for many systems. 
  • We can load balance our web server and access shard instances on different paths, this leads to faster queries. 
  • Data which are used together are stored together. This leads to data denormalization, but need to keep in mind that, it doesn't mean that data is not separated logically (so we are still storing profile and photographs logically separate). So no complex join, data is read and written in one operation. 
  • Segregating data in smaller shards helps in performance, as it can remain in cache. It also makes managing data easier, fast backup and restore. 


No comments:

Post a Comment