Sunday, March 25, 2018

Designing REST URI for supporting multiple content type

Resources can be represented in multiple formats - JSON, XML, Atom, Binary formats like png, text and even proprietary formats. If client request a resource the REST service transfers the state of a resource (and not the resource itself) in the appropriate format.

Assume that you are designing RESTful interface for providing metadata for Cars and your service gets consumed by many clients, some traditionals as well as few startups. So each one of them have their own requirements to provide response in the given format. Let's see what are available options-

Approach 1: One URI per representation

The first URI is default representation of the resource and second one returns the response in xml format. Both URI are different so there will be different handlers (end point) and hence the response can be easily returned in appropriate format.

Approach 2: Use Parameter of URI

This approach is easy to read and understand.

Approach 3: Single URI for all representation

This approach comes from the fact that if client is essentially asking for the same resource then why do we need different URIs. Remember, REST uses HTTP; can we leverage HTTP ACCEPT header to get different representation of the same resource. This is process of selecting the best representation for a given resource- termed as Content Negotiation

Content Types
HTTP uses Internet media types (originally known as MIME types) in the content-type and accept header fields. Internet media types are divided into 5 top level categories: text, image, audio, video and application. And then these types are further divided into several subtypes:
  • text/xml : default content type for text message
  • text/html : commonly used type used in browsers 
  • text/xml, application/xml: Format used for xml exchanges 
  • image/gif, image/jpeg, image/png: image types
  • application/json: language independent light weight data-interchange text format 
GET /cars/
Accept: application/json

So respect the HTTP headers and everything works out. 
This approach could be bit code intensive for some frameworks like Django as you need to dig into headers and decode what clients wants. But most of the Java frameworks handle it though annotation. 

Final Note

No matter which approach you use. It would be great if you stick to one across the services.  Prefer to be consistent even if that leads to not being very right!


Thursday, December 21, 2017

PUT vs POST for Modifying a Resource

Debate around PUT vs POST for resource update is quite common; I have had my share as well. Debate is NOT un-necessary as the difference is very subtle. One simple line of defence by many people is that if the update is IDEMPOTENT then we should use PUT else we can use POST. This explanation is correct to a good extent; provided we clearly understand if a request is truly Idempotent or not. 

Also, lot of content is available online which causes confusion. So, I tried to see what the originators of REST architectural style themselves say. This post might again be opinionated, or have missed few important aspects. I have tried to be as objective as possible. Feel free to  post your comments/openions :)

Updating a Resource

For our understanding, let's take a case that we are dealing with an account resource which has three attributes: firstName, lastName and status. 

Updating Status field:
Advocates of PUT consider below request to be IDEMPOTENT. 

HTTP 1.1 PUT /account/a-123

Reality is that, above request is NOT idempotent as it's updating a partial document. To make it idempotent you need to send all the attributes. So that line of defence is NOT perfect. 

HTTP 1.1 PUT /account/a-123

Below article tells very clearly that, if you want to use PUT to update a resource, you must send all attributes of the resource where as you can use POST for either partial or full update. 

So, you can use POST for either full or partial update (until PATCH support becomes universal). 

What the originator of REST style says

The master himself suggest that we can use POST if you are modifying part of the resource. 

My Final Recommendation

Prefer POST if you are doing partial update of the resource.  If you doing full update of the resource, use PUT if the request is IDEMPOTENT else use POST. 

Thursday, December 14, 2017

Build Tools in Java World

Build tools in Java (or JVM ecosystem) have evolved over period of time. Each successive build tool has tried to solve some of the pains of the previous one. But before going further down on tools, let’s start with basic features of standard build tools. 

Dependency Management
Each project requires some external libraries for build to be successful.  So these incoming files/jars/libraries are called as dependencies of the project.  Managing dependencies in a centralized manner is de-facto feature of modern build tools.  Output artifact of the project also gets published and then managed by dependency management. Apache IVY and Maven are two most popular tools which support dependency management.

Build By Convention
Build script needs to be as simple and compact as possible. Imagine specifying each and every action which needs to be performed during build (compile all files from src directory, copy them to dir file, create jar file and so on); this will definitely make the script huge and hence managing and evolving it becomes a daunting task. So, modern build tools uses convention like by default (or can be configured as well) it knows that source files are in src directory. This minimizes number of lines in the build file and hence it becomes easier to write and manage build scripts.  
So any standard build tool should have above two as de-facto. Below are list of tools which have these features.

ANT + IVY (Apache IVY for dependency management)
·         MAVEN
·         GRADLE

I have listed only most popular build tools above. ANT by default doesn’t have dependency management but other two have native support for dependency management. Java world is basically divided between MAVEN and GRADLE.  So, I have focused below on these two tools.

Maven vs Gradle

  • MAVEN uses XML to write build script where as GRADLE uses a DSL language based on Groovy (one of the JVM language). GRADLE build script tends to be shorter and cleaner compared to maven build script.
  • GRADLE build script is written in Groovy (and can also be extended using Java). This definitely gives more flexibility to customize the build process. Groovy is a real programming language (unlike XML). Also, GRADLE doesn’t force to always use convention, it can be overridden. 
  • GRADLE has first class support for multi-project build whereas multi-project build of MAVEN is broken. GRADLE support dependency management natively using Apache open source project IVY (is an excellent dependency management tool).  Dependency management of GRADLE is better than MAVEN.
  • MAVEN is quite popular tool so it has wide community and Java community have been using it for a while; GRADLE on the other hand is quite new so there will be learning curve for developers.
  • Both are plugin based (and GRADLE being a newer); finding plugin might be difficult for GRADLE. But adoption of GRADLE is growing at good pace, Google supports GRADE for Android. Integration of GRADLE with servers, IDEs and CI tools is not as much as that of MAVEN (as of now).


    Most of the cons for GRADLE are mainly because it’s a new kid in the block. Other than this, rest all looks quite impressive about GRADLE. It scores better on both core features i.e. Dependency Management and Build by Convention. IMO, configuring build through a programming language is going to be more seamless once we overcome the initial learning curve.
    Also, considering we are going down the microservices path, so we will have option and flexibility to experiment with build tool as well (along with language/framework).


Tuesday, December 5, 2017

How AWS structures its Infrastructure

This post talks about how AWS structures its global infrastructure. 

AWS' most basic infrastructure is Data Center.  A single Data Center houses several thousand servers. AWS core applications are deployed in N+1 configuration to ensure smooth functioning in the event of a data center failure. 

AWS data centers are organized into Availability Zones. One DC can only be part of one AZ. Each AZ is designed as an independent failure zone for fault isolation. Two AZs are interconnected with high-speed private links. 

Two or more AZs form a Region. As of now (dec '17) AWS has 16 regions across the globe.  Communication among regions use public infrastructure (i.e. internet), therefore use appropriate encryption methods to encrypt sensitive data. Data stored in a specific region is not replicated across other regions automatically. 

AWS also has 60+ global Edge Locations. Edge locations help lower latency and improve performance for end users. Helpful for services like Route 53 and Cloud Front. 

Guidlines for designing 

  • Design your system to survive temporary or prolonged failure of an Availability Zone. This brings resiliency to your system in case of natural disasters or system failures. 
  • AWS recommends replicating across AZs for resiliency. 
  • When you put data in a specific region, it's your job to move it to other regions if you require. 
  • AWS products and services are available by region so you may not see a service available in your region. 
  • Choose your region appropriately to reduce latency for your end-users. 

Saturday, October 21, 2017

How Kafka Achieves Durability

Durability is a guarantee that, once the Kafka broker confirms that the data is written, it will be permanent. Databases implement it by storing it in non-volatile storage. Kafka doesn't follow the DB approach!

Short Answer
Short answer is that, Kafka doesn't rely on the physical storage (i.e. file system) as the criteria that a message write is complete. It relies on the replicas.

Long Answer
When the message arrives to the broker, it first writes it to the in-memory copy of leader replica. Now it has following things to do before considering the write successful. 
Assume that, replication factor > 1. 

1. Persist the message in the file system of the partition leader.
2. Replicate the message to the all ISRs (in-sync replicas).

In ideal scenario, both above are important and should be done irrespective of order. But, the real question is, when does Kafka considers that the message write is complete? To answer this, let's try to answer below question-

If a consumer asks for a message 4 which just go persisted on the leader, will the leader return the data? And the answer is NO!

It's interesting to note that, not all data that exists on the leader is available for clients to read. Clients can read only those messages that were written to in-sync replicas. The replica leader knows which messages were replicated to which replica, so until it's replicated it will not be returned to the client. Attempt to read those messages will result in empty response.

So, now it's obvious, just writing the message to leader (including persisting to the file system) is hardly of any use. Kafka considers a message written only if it's replicated to all in-sync replicas.

~Happy replication!