Wednesday, October 12, 2016

Containerization to achieve Scale

This post talks in general about Containers; their evolution and contribution in scaling systems.

Once upon a time, applications used to run on servers configured on bare mettle sitting in companies own data centers. Provisioning used to take anywhere from few days to few weeks. Then came Virtual Machines which use hardware visualization to provide isolation. They take time in minutes to create as they require significance resource.  Then finally; here, comes a brand new guy in the race, which takes 300 ms to couple of seconds to bootstrap a new instance; yes I am talking about containers. They don't use hardware virtualization. They interface directly with host's linux kernel .

Managing VMs at scale is not easy.  In-fact, I find difficult to manage even couple of VMs :D So just imagine how difficult it would be for companies like Google and Amazon which operate at internet scale.

Two features which has been part of Linux Kernel since 2007 are cgroups and namespacesEngineers at Google started exploring process isolation using these kernel features (to manage and scale their millions of computing units). This eventually resulted in what we know today as containers. Containers inherently are light weight and that makes them super flexible and fast. If containers even think of misbehaving, they can easily be replaced by another brand new container because the cost of doing so is not high at all. This means, they need to be run in a managed and well guarded environment. Their small footprint help in using them for specific purpose and they can easily be scheduled and re-arranged/load balanced. 

So one thing is quite clear, Containers are not brand new product or technology. They use existing features of OS. 

With containers the actual problem of making every component of a system resilient and bullet proof doesn’t hold good. This seems contradictory - we want to make systems more resilient but containers themselves are very fragile. This means any component deployed in them automatically becomes non-reliable. 
We can design our system with assumption that containers are fragile. If any instance failed - just mark it bad, replace it with a new instance. With containers the real hard problems are not isolation but orchestration and scheduling.

Read more in details on Containers vs VMs

Containers are also described as jail which guards the inmates to make sure that they behave themselves. Currently, one of the most popular container is Docker. And at the same time there are tools available to manage or orchestrate them (one of the most popular one is Kubernetes from Google).

Happy learning!!!

Wednesday, July 27, 2016

Vert.x Event Loop

Event Loop is not new and also not specific to Vert.x (used in Node.js).   I will be talking about it here, in a top down fashion. 

Vert.x is a library (yes, it's just jar) which allows to develop Reactive applications on JVM.  Reactive system is defined in Reactive Manifesto. Vert.x supports this manifesto, so it enables application written in Vert.x to be - Responsive, Elastic, Resilient and Message driven.

The last point (Message Driven) of reactive manifesto defines the essence of Vert.x - It's event/message driven and non-blocking/asynchronous. This means, different components interact with each other asynchronously by passing messages. 

//Synchronous and blocking API call
Object obj = new MyObject();

Traditional application make blocking API call, so calling thread waits until the response is available. This means, until the response is available the thread is sitting ideal and doing nothing. This is not a good thing from resource utilization point of view. Now, how about making that thread more specialized whose job is only to post request, i.e. it's not going to wait for response to arrive. The thread will go on doing only one thing till the sky falls. This way the thread will not be sitting ideal (unless there are NO request). Putting it in a more generic term, the thread will be passing on messages or events. 

Event Loop is basically a thread (or a group of threads; Vert.x matches it closest to CPU cores) whose job is to keep passing on messages to their specific handlers. The threads picks the event (from a queue) and then hands over the event to the right handler. Event loop maintains the order (as it picks the events internally from a queue) and it's also synchronous as there is going to be one or limited threads (Note: they themselves are synchronous). Vert.x allows to configure the number of threads (one per core of the CPU). Handlers are always called by the same event loop so there is no need of synchronization.

Event Loops are limited and so special, so blocking them will be disaster. Event loop calls the method asynchronously and in a non-blocking manner. Once the response arrives the same event loop calls the callback. 


Friday, July 22, 2016

Pushing a new project to Github

This post talks about pushing a new project on Github. Make sure that the project is not created already on the Github.

I have illustrated using a sample application named as websockets.  I have shown created a project with just one file ( and then pushing the project to github. You can run below commands in terminal or gitbash. 

$ mkdir websockets
$ cd websockets
$ echo "# websockets" >>

$ git init
Initialized empty Git repository in /Users/siddheshwar/Documents/gitRepo/websockets/.git/

$ git add

$ git commit -m "first commit"
[master (root-commit) 24fac01] first commit
1 file changed, 1 insertion(+)
create mode 100644

$ git remote add origin

$ git push -u origin master
Username for '':
Password for '': 
Counting objects: 3, done.
Writing objects: 100% (3/3), 233 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
 * [new branch]      master -> master

That's it, you are done !!!

Wednesday, July 20, 2016

Gradle - Create Project Structure Automatically

Gradle Init plugin can be used to bootstrap the process of creating a new Java, Groovy or Scala project. This plugin needs to be applied to a project before it can be used. So if we want to create default directory structure of a Java project this plugin can be handy (Especially if you don't have Gradle plugin in your IDE).

$gradle init --type java-library

The init plugin supports multiple types (it's 'java-library' in above command). Below is the command sequence and directory which gets created after successful execution.

$ mkdir hello-gradle
$ cd hello-gradle/
$ gradle init --type java-library


Total time: 8.766 secs

$ ls -ltr
total 20
drwxrwxr-x. 3 vagrant vagrant   20 Jul 20 06:00 gradle
-rwxrwxr-x. 1 vagrant vagrant 5080 Jul 20 06:00 gradlew
-rw-rw-r--. 1 vagrant vagrant 2404 Jul 20 06:00 gradlew.bat
-rw-rw-r--. 1 vagrant vagrant  643 Jul 20 06:00 settings.gradle
-rw-rw-r--. 1 vagrant vagrant 1212 Jul 20 06:00 build.gradle
drwxrwxr-x. 4 vagrant vagrant   28 Jul 20 06:00 src

So above command also installs the other gradle dependencies to run the build (i.e. bradlew, gradlew.bat). If you don't know what the appropriate type for your project, specify any value then it will list valid types.

$ gradle init --type something
Execution failed for task ':init'.
> The requested build setup type 'something' is not supported. Supported types: 'basic', 'groovy-library', 'java-library', 'pom', 'scala-library'.

So, if you just type any random text as type; Gradle tells the allowed types.

If you just use $gradle init , then gradle tries (it's best) to automatically detect the type. If it fails to identify type, then applies basic type. 

Importing Gradle Project to Eclipse

Note that, above command created gradle specific files along with default java directories (like src) but it didn't create eclipse specific files. This means, if you try to import above created project in eclipse it will not work. To achieve that, do below:
  1. Add eclipse plugin in gradle build file (i.e. build.gradle).  Put below after java plugin. 
          apply plugin: 'eclipse'
  1. Run $gradle eclipse

This basically creates files - .classpath, .project and .settings

Now, you are good to import above project in eclipse.

Alternatively, you can clone from my github repository

Happy coding !!!

Thursday, July 7, 2016

Microservices Explained

I have been reading about microservices for a while (must admit, I delayed it thinking, it's just old wine in new bottle), and the more I dive deeper, the more exciting I find it. I am a big fan of design principle, Single Responsibility Principle (SRP) as it helps in putting boundries on class (and even on methods). SRP helps in making code simpler and cleaner (ofcourse, other design principles are equally important, but my love for SRP is boundless!). And I always used to wonder, why can't we apply SRP at service layer? And finally, Microservices God have heard me!

For a service to be called as micro it should be really small in size and that's going to be possible, if and only if, your service does only one thing(i.e follows SRP). And it should do that one thing really well. This in turn will help to easily implement, change, build, distribute, and deploy the service. This will also help in creating a highly decentralized and scalable systems. I tried looking on web to find definition of miroservices, the one which I found covering all aspects is from Martin Fowler (link).

In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies.

Let's cover some of the aspects of microservices:
  1. Each service (or set of few) should have it's own data store. So having 100's of data store is normal. You can choose between relational, noSQL DB, or even in-memory db depending on your need. If you have 2 DB and dozen of services around it, you are not micro yet.
  2. They are organized functionally or around business functions. This helps in keeping boundaries separate and clear.
  3. Loosely coupled and highly cohesive (or another way to put, they do one thing really well).
  4. They are usually RESTFul (to maintain simplicity). They receive request, apply logic and produce response. But they also support other interaction styles as well like RPC, message, event, stream. 
  5. They are usually asynchronous and use simple message queues. The real intelligence of the system lies at either ends of the queue (i.e. with services).
  6. The complete infrastructure of build to production deployment should be automated (i.e. it should have CI/CD).
  7. In microservice world, your single monolithic service could be replaced by 100's of microservices. You should design each microservice keeping in mind worst; design for failure. A service being down in production is a normal thing, your system should be able to handle it gracefully.  
  8. Microservices stresses a lot on real time monitoring matrices like average response time, generate warning if a service is not responding etc. 
  9. In ideal scenario, event bus should replace your operational database. Kafka is one of the fast bus and it's fast because it's dumb (i.e. it just delivers events).
  10. Microservices make your system/application more distributed which in tern adds more complexity. 

Found this video interesting as it talks about challenges in implementing microservies, link.


Sunday, February 14, 2016

EJB Good Practices

EJBs abstract your middle ware or business logic layer. They are transactional in nature so when you hit your persistence layer (mostly through JPA), transaction is already there for your database session. As a result, all DB operations are going to complete or none of them, i.e. EJB operation is atomic. Let's cover some of the good practices:

Don't create EJB methods for CRUD operations

Imagine creating operations in your EJB for creating, fetching, updating or deleting your entity. It's not going to server the purpose; quite clearly, CRUD operations are not your business logic!

In fact CRUD operations will be part of your more sophisticated business operations. Let's take that you want to transfer x amount from a bank account A to another account B. There should be just a single method which reads appropriate records from DB, modifies them and performs update.

Also, creating CRUD operation gives impression that EJB is created for each entity. We should create EJB for a group of related problems like manage accounts, policy manager etc. You can abstract your CRUD operations in Data Access Layer though!

Minimise Injecting EJBs in Presentation layer

Imagine yourself working in presentation layer (Servlets, web services, ..) and having to deal with multiple EJBs to delegate the call. You are going to struggle to find appropriate EJB and then method for delegating the incoming calls.  Especially when you some one else is taking care of business layer!

This is going to defeat the separation of concern principle which is important to manage your complex distributed system. So what's the solution to deal with this - Bundle related EJBs in a single (super) EJB and inject this super EJB. 

But make sure that, in doing so, you are not putting un-related EJBs together just for the sake of minimizing number of EJBs.  Each EJB (including super) should adhere to single responsibility principle

Reusing EJB methods

Suppose you have quite complex-use cases which resulted in a big EJB class definition. So, obvious question would be, how do you achieve reusability with EJB methods?

Just like a normal Java class, you can create helper EJBs with reusable methods. Multiple EJBs can use the services provided by this helper EJB. And to make these helper utilities more clear, you can put this in an utility or helper package inside your main EJB package. 


would love to hear your suggestions/feedback about this post.

Sunday, January 24, 2016

The Cost of Concurrency

Concurrency is not free!

Modern libraries provide wonderful abstraction for programmers, so doing certain task concurrently or asynchronously is quite trivial. It is as simple as instantiating an object and calling few methods on it, and you are done! These libraries are abstracted in such a way that they don't even remind to programmers that you are going to deal with threads. And this is where, the lazy programmer can take things for granted.

You need to process 100 task, create 50 threads.
     Collection<Task> task = fetchTasks();   //from somewhere
     int numberOfThreads = 50;
    obj.executeConcurrently(tasks, numberOfThreads);
In object oriented world, all it takes is a method call. 

To understand the cost of concurrency, let's take a step back and ask yourself how is it implemented? It is implemented through locks. Locks provide mutual exclusion and ensures that the visibility of change occurs in ordered manner. 

Locks are expensive because they require arbitration when contended. This arbitration is achieved by a context switch at the OS level which will suspend threads waiting for lock until it is released. Context switch might cause performance penalty as OS might decide to do some other house keeping job and so will loose the cached instruction and data. In worst case, this might cause latency equivalent to that of an I/O operation. 

Another aspect of concurrency is managing life cycle of threads. OS does dirty job of creating threads and managing them on behalf of your platform (or run time environment). There is certain limits on number of threads which can be created at system level. So definitely, proper thoughts should be given on how many threads are required to accomplish a job.

Don't blindly decide to execute task concurrently!

Friday, January 22, 2016

Concurrency or Thread Model of Java

Thread Model in Java is build around shared memory and Locking!

Concurrency basically means two or more tasks happen in parallel and they compete to access a resource.  In object oriented world the resource would be an object which could abstracts a database, file, socket, network connection etc. In concurrent environment, multiple threads try to get hold of the same resource. Locks are used to ensure consistency in case there is possibility of concurrent execution of a piece of code. Let's cover these aspects briefly:

Concurrent Execution is about

  1. Mutual Exclusion or Mutex
  2. Memory Consistency or Visibility of change

Mutual Exclusion
Mutual exclusion ensures that at a given point of time only one thread can modify a resource.  If you write an algorithm which guarantees that a given resource can be modified by (only) one thread then mutual exclusion is not required.  

Visibility of Change
This is about propagating changes to all threads. If a thread modifies value of a resource and then (right after that) another thread wants to read the same then the thread model should ensure that read thread/task gets updated value. 

The most costly operation in a concurrent environment is contended write access. Write access to a resource, by multiple threads requires expensive and complex coordination. Both read as well as write requires that all changes are made visible to other threads. 


Locks provide mutual exclusion and ensures that visibility of change is guaranteed (Java implements locks using synchronized keyword which can be applied on a code block or method).

Read about cost of locks, here

Wednesday, January 20, 2016

Preventing logging overhead in Log4J

Have you seen something like below in your application, and ever wondered about the overhead (due to the if condition). This post covers about ways to get rid of the overhead with Log4j 2.x and Java 8.

   log.debug("Person="+ person);

Above style is quite common in log4j-1.x; though it adds few extra lines but it improves the performance.

Below log calls toString method on person even if it's not going to get logged.
log.debug(person);  //big NO; avoid this !

So how do we remove the overhead of if check

The if condition is an overhead and it's going to appear multiple places in method/class. Also, if you don't do logging judiciously then it can be spread all over.

log4j 2.x
log4j 2.x is out there after a long gap and this particular issue has been addressed. Inspired from SLF4J it has added parameterized logging.

log.debug("{}"+person); //will not call .toString method on person
log.debug("{}"+person.toString());   //this is still going to be an issue

So log4j 2.x parameterized logging mechanism will not call implicit (toString) method on person; but if you call it explicitly then it will make the call. So log4j 2 has partially addressed the problem.

log4j 2.4 + Java 8 lambda
log4j 2.4 supports lambda based mechanism which solves the problem completely. So it doesn't call the method (implicit or explicit) at all if the statement being logged is at level less specific than the current level.

log.debug(() -> person.toString());   //all good 
log.debug("{}"+ this::expensiveOperation());   //all good

Final Note:
Be mindful of logging overhead when you do code review!


Saturday, January 16, 2016

Extracting root cause from Stack Trace

Don't tell me problem; show me the logs first!

Whether you are a fresh graduate, experienced programmer, QA engineer, production engineer or even a product manager - a good understanding of stack trace is vital to crack critical issues. Your ability to find the real culprit from a lengthy stack trace will be instrumental in resolving a problem. This is even more important if you work on a distributed system where you use many many libraries so stack trace is not well structured. Let's start with a simple scenario:

Scenario 1: Simple

This is most trivial case, where exception gets thrown by a method of your project and during the call duration it doesn't go out of your code base. This is most trivial scenario which you will encounter but very important to understand how the stack trace gets printed.

Shown below is Eclipse screenshot of which two classes. Right click and run the program. 

Let's Decode Above stack trace:
  • RuntimeException is shown in line number 29, in method MyService.four()
  • Method MyService.four() gets called by MyService.three() in line number 25
  • Method MyService.three() gets called by MyController.two() in line 11
  • Method MyController.two() gets called by in line 6
  • Method gets called by MyController.main() in line 17

First frame of stack trace holds all important information required to know the root cause. 
Be mindful of the very important line number 

Scenario 2: Chain Of Exception

Let's modify above code a bit by catching exception at the origin and then throwing a brand new Exception. 

Let's Decode Above stack trace:

This stack trace has a caused by section. It has only one caused by but in real applications you can have multiple caused by sections. The last caused by will have the root cause of the exception.

Caused by: java.lang.RuntimeException: here, comes the exception!
at MyService.four(

... 4 more

But, if you are using external jars or libraries, finding the root cause could be bit tricky as the real reason might be nested deep inside. In such case you should look for Class.method name which belongs to your application. Also, you should look the complete stack trace carefully as the real root cause could lie in any part of stack trace.