Thursday, December 21, 2017

PUT vs POST for Modifying a Resource

Debate around PUT vs POST for resource update is quite common; I have had my share as well. Debate is NOT un-necessary as the difference is very subtle. One simple line of defence by many people is that if the update is IDEMPOTENT then we should use PUT else we can use POST. This explanation is correct to a good extent; provided we clearly understand if a request is truly Idempotent or not. 

Also, lot of content is available online which causes confusion. So, I tried to see what the originators of REST architectural style themselves say. This post might again be opinionated, or have missed few important aspects. I have tried to be as objective as possible. Feel free to  post your comments/openions :)

Updating a Resource

For our understanding, let's take a case that we are dealing with an account resource which has three attributes: firstName, lastName and status. 

Updating Status field:
Advocates of PUT consider below request to be IDEMPOTENT. 

HTTP 1.1 PUT /account/a-123

Reality is that, above request is NOT idempotent as it's updating a partial document. To make it idempotent you need to send all the attributes. So that line of defence is NOT perfect. 

HTTP 1.1 PUT /account/a-123

Below article tells very clearly that, if you want to use PUT to update a resource, you must send all attributes of the resource where as you can use POST for either partial or full update. 

So, you can use POST for either full or partial update (until PATCH support becomes universal). 

What the originator of REST style says

The master himself suggest that we can use POST if you are modifying part of the resource. 

My Final Recommendation

Prefer POST if you are doing partial update of the resource.  If you doing full update of the resource, use PUT if the request is IDEMPOTENT else use POST. 

Thursday, December 14, 2017

Build Tools in Java World

Build tools in Java (or JVM ecosystem) have evolved over period of time. Each successive build tool has tried to solve some of the pains of the previous one. But before going further down on tools, let’s start with basic features of standard build tools. 

Dependency Management
Each project requires some external libraries for build to be successful.  So these incoming files/jars/libraries are called as dependencies of the project.  Managing dependencies in a centralized manner is de-facto feature of modern build tools.  Output artifact of the project also gets published and then managed by dependency management. Apache IVY and Maven are two most popular tools which support dependency management.

Build By Convention
Build script needs to be as simple and compact as possible. Imagine specifying each and every action which needs to be performed during build (compile all files from src directory, copy them to dir file, create jar file and so on); this will definitely make the script huge and hence managing and evolving it becomes a daunting task. So, modern build tools uses convention like by default (or can be configured as well) it knows that source files are in src directory. This minimizes number of lines in the build file and hence it becomes easier to write and manage build scripts.  
So any standard build tool should have above two as de-facto. Below are list of tools which have these features.

ANT + IVY (Apache IVY for dependency management)
·         MAVEN
·         GRADLE

I have listed only most popular build tools above. ANT by default doesn’t have dependency management but other two have native support for dependency management. Java world is basically divided between MAVEN and GRADLE.  So, I have focused below on these two tools.

Maven vs Gradle

  • MAVEN uses XML to write build script where as GRADLE uses a DSL language based on Groovy (one of the JVM language). GRADLE build script tends to be shorter and cleaner compared to maven build script.
  • GRADLE build script is written in Groovy (and can also be extended using Java). This definitely gives more flexibility to customize the build process. Groovy is a real programming language (unlike XML). Also, GRADLE doesn’t force to always use convention, it can be overridden. 
  • GRADLE has first class support for multi-project build whereas multi-project build of MAVEN is broken. GRADLE support dependency management natively using Apache open source project IVY (is an excellent dependency management tool).  Dependency management of GRADLE is better than MAVEN.
  • MAVEN is quite popular tool so it has wide community and Java community have been using it for a while; GRADLE on the other hand is quite new so there will be learning curve for developers.
  • Both are plugin based (and GRADLE being a newer); finding plugin might be difficult for GRADLE. But adoption of GRADLE is growing at good pace, Google supports GRADE for Android. Integration of GRADLE with servers, IDEs and CI tools is not as much as that of MAVEN (as of now).


    Most of the cons for GRADLE are mainly because it’s a new kid in the block. Other than this, rest all looks quite impressive about GRADLE. It scores better on both core features i.e. Dependency Management and Build by Convention. IMO, configuring build through a programming language is going to be more seamless once we overcome the initial learning curve.
    Also, considering we are going down the microservices path, so we will have option and flexibility to experiment with build tool as well (along with language/framework).


Tuesday, December 5, 2017

How AWS structures its Infrastructure

This post talks about how AWS structures its global infrastructure. 

AWS' most basic infrastructure is Data Center.  A single Data Center houses several thousand servers. AWS core applications are deployed in N+1 configuration to ensure smooth functioning in the event of a data center failure. 

AWS data centers are organized into Availability Zones. One DC can only be part of one AZ. Each AZ is designed as an independent failure zone for fault isolation. Two AZs are interconnected with high-speed private links. 

Two or more AZs form a Region. As of now (dec '17) AWS has 16 regions across the globe.  Communication among regions use public infrastructure (i.e. internet), therefore use appropriate encryption methods to encrypt sensitive data. Data stored in a specific region is not replicated across other regions automatically. 

AWS also has 60+ global Edge Locations. Edge locations help lower latency and improve performance for end users. Helpful for services like Route 53 and Cloud Front. 

Guidlines for designing 

  • Design your system to survive temporary or prolonged failure of an Availability Zone. This brings resiliency to your system in case of natural disasters or system failures. 
  • AWS recommends replicating across AZs for resiliency. 
  • When you put data in a specific region, it's your job to move it to other regions if you require. 
  • AWS products and services are available by region so you may not see a service available in your region. 
  • Choose your region appropriately to reduce latency for your end-users. 

Saturday, October 21, 2017

How Kafka Achieves Durability

Durability is a guarantee that, once the Kafka broker confirms that the data is written, it will be permanent. Databases implement it by storing it in non-volatile storage. Kafka doesn't follow the DB approach!

Short Answer
Short answer is that, Kafka doesn't rely on the physical storage (i.e. file system) as the criteria that a message write is complete. It relies on the replicas.

Long Answer
When the message arrives to the broker, it first writes it to the in-memory copy of leader replica. Now it has following things to do before considering the write successful. 
Assume that, replication factor > 1. 

1. Persist the message in the file system of the partition leader.
2. Replicate the message to the all ISRs (in-sync replicas).

In ideal scenario, both above are important and should be done irrespective of order. But, the real question is, when does Kafka considers that the message write is complete? To answer this, let's try to answer below question-

If a consumer asks for a message 4 which just go persisted on the leader, will the leader return the data? And the answer is NO!

It's interesting to note that, not all data that exists on the leader is available for clients to read. Clients can read only those messages that were written to in-sync replicas. The replica leader knows which messages were replicated to which replica, so until it's replicated it will not be returned to the client. Attempt to read those messages will result in empty response.

So, now it's obvious, just writing the message to leader (including persisting to the file system) is hardly of any use. Kafka considers a message written only if it's replicated to all in-sync replicas.

~Happy replication!

Saturday, October 7, 2017

My favourite fiz-buzz problem for Senior Programmers

This post, I will be discussing one of my favourite fiz-buzz problems for senior programmers/engineers. 

Find Kth largest element from a list of 1 million integers. 


Find Kth largest element at any given point of time from a stream of integers, count is not known. 

This problem is interesting as it has multiple approaches to solve and it checks the fundamentals of algorithms and data structure. Quite often, candidate start with asking questions like is the list sorted? Or can I sort the list ? In such case, I go and check on which sorting algorithm the candidate proposes. This gives me an opportunity to start conversation around complexity of the approach (particularly, time complexity). Most of the candidates are quick to point out algorithms (like Quick Sort , Merge Sort) which take O(NlogN) for sorting a list. This is right time to point out that why do you need to sort the complete array/list if you just need to find out 100th or kth largest/smallest element. Now the conversation usually go in either of the direction - 
  1. Candidate sometime suggest that, sorting is more quicker way to solve this problem - missing altogether the complexity aspect. If someone doesn't even realize that sorting is not the right way to handle this problem, then it kind of red signal for me. 
  2. At times candidates acknowledge the in-efficiency of sorting approach and then start looking for better approach. I suggest, candidates to think out loud which will give me insight about their thought process and how are they approaching it. When I see them not moving ahead; I suggest them on optimizing Quick sort approach ? Is there any way to cut down the problem size in half in every iteration ? Can you use divide and concur to improve on your O(NlogN) complexity ?   
This problem can be solved by Quick Select as well as using Heap data structure. This problem also has a brute force approach (i.e. run loop for k time; in each iteration find the maximum number lower than the last one). 

If the candidate doesn't make much progress then I try to simplify the problem by saying - find 3rd or 2nd largest element. I have seen some of the senior programmers failing to solve this trivial version as well. This is clear Reject sign for me.

Also, sometime I don't even ask candidate to code. I use this problem to just get an idea and skip the coding part if i see a programmer sitting right across me :)

-Happy problem solving !