Sunday, December 18, 2016

Git branching strategy

This post explores different Git branching options to optimally manage code base. Primarily, there are two options which most of the teams use - Git flow and GitHub flow. Let's cover them in detail.

Git Flow

Git Flow works on the idea of releases; you have many branches each specializing in specific stages of application. The branches are named as master, develop, feature, release and hotfix.

 You create a develop branch from the master branch which will be used by all developers for feature development and bug fixes for a specific release. If a developer needs to add a new feature then he will create a separate branch from develop branch so that both branches can evolve independent of each other. When feature is fully implemented, that branch gets merged back to develop. Idea is to have a stable and working develop branch; so feature gets merged back only when it's complete. Once all the features related to a release are implemented; develop branch is branched to release branch where formal testing of the release will commence. Release branch should be used only for bug fixes and not development. And finally, when it's ready for deployment it will be tagged inside master branch to support multiple feature releases. 


http://nvie.com/posts/a-successful-git-branching-model/
This flow is not ideal for projects where you do releases quite often. This helps where you have releases happening like every other month or even every month. 

GitHub Flow


GitHub has popularized it's own flow which is quite effective for open source projects and other projects which doesn't have longer release cycles (in this, you could release every day). It's based on the idea of features. 

https://guides.github.com/introduction/flow/

It's light weight version of git flow which removes all unwarranted overhead of maintaining so many branches (It's just have feature branches apart from master). If you want to work for a feature or bug you start a separate branch from master. Once feature/bug is implemented it get's merged back to master branch. Feature branches are given quite descriptive names like oauth-support or bug#123. Developers can raise pull request to collaborate with co-workers or to do code review on the feature branch. Once code is reviewed and sign off is received the featured branch is merged back to master branch. You can have as many feature branches as you wish and once feature branch is merged it can also be deleted as master branch will have all commit history (unless you don't want). 


GitHub flow assumes that every time you merge changes to master; you are ready for production deployment.

There is 3rd branching strategy as well; GitLab Flow.


References:
https://about.gitlab.com/2014/09/29/gitlab-flow/

--happy branching !!!

Tuesday, December 6, 2016

Show Directory Tree structure in Unix

Graphics based operating systems provide the option for you to see the complete directory structure graphically. But, in console-based Linux/Unix operating systems this is not possible (by default). At most, you can go inside each directory and do ls or if you are a scripting geek you can create one to do the same for you.

Recently, I came across directory listing program, tree which makes this job possible in Unix/Linux from the terminal. 

Installation

CentoOs/RHEL: $sudo yum install tree -y

Ubuntu: $sudo apt-get install tree

Using Tree

Use tree and the directory name. That's it!

$tree cookbooks/
cookbooks/
`-- learn_chef_httpd
    |-- Berksfile
    |-- chefignore
    |-- metadata.rb
    |-- README.md
    |-- recipes
    |   `-- default.rb
    |-- spec
    |   |-- spec_helper.rb
    |   `-- unit
    |       `-- recipes
    |           `-- default_spec.rb
    `-- test
        `-- recipes
            `-- default_test.rb

7 directories, 8 files

Saturday, November 19, 2016

How Couchbase identifies Node for Data Access

This post talks about how Couchbase identifies where exactly the document is stored to facilitate quick read/update.

Couchbase uses a term Bucket which is equivalent to the term Database in the relational world to logically partition all documents across the cluster (or data nodes). Being a distributed DB, it tries to evenly distribute or partition (or shard) data into virtual buckets known as vBuckets. Each vbucket owns a subset of the keys or document id (and of course corresponding data as well). Documents get mapped to vBucket by applying the hash function on the key. Once the vBucket is identified there is a separate lookup table to know which node hosts the vBucket. The thing which maps different virtual buckets to nodes is known as vBucket map. (Note: Cluster Map contains the mapping of which service belong to which node at any given point of time)



Steps Involved (as shown in the diagram):
  1. Hash(key) to get vBucket identifier (vb-k) which hosts/owns Key.
  2. Looking up vBucket map tells vb-k is owned by node or server n-t
  3. The request is sent directly to the primary server node, n-t to fetch the document. 

Both hashing function as well number of vBucket is configurable. By default, Couchbase automatically divides each bucket into 1024 active vBuckets and 1024 replica buckets (per replica). When there is only one node, all vBuckets reside on that node. 

What if a new node gets added to the cluster?
When the number of nodes scales up or down the information stored in vBuckets are re-distributed among the available nodes and then the corresponding vBucket map is also updated. This entire process is known as rebalancing. Rebalancing doesn't happen automatically; as an administrator/developer you need to trigger it either from UI or through CLI.

What if the primary node is down?
All read/update request by default go to the primary node. So, if a primary node fails for some reason, Couchbase takes off that node from the cluster (if configured to do so) and promotes the replica to become the primary node. So you can fail-over to replica node manually or automatically. You can address the issue with the node, fix it and add it back to the cluster by performing rebalancing. Below table shows a sample mapping.
+------------+---------+---------+
| vbucket id | active | replica |
| 0 | node A | node B |
+------------+---------+---------+ | 1 | node B | node C |
+------------+---------+---------+
| 2 | node C | node D |
| 3 | node D | node A |


References:

---
happy learning !!!

Wednesday, November 16, 2016

Why Multi-Dimensional Scaling in Couchbase

Couchbase has been supporting horizontal scaling in a monolithic fashion since its inception. You keep adding more nodes to the cluster to scale and improve performance (all nodes being exactly the same). This single dimension scaling works to a great extent as all services - Query, Index, and Data scale at the same rate. But, they all are unique and have specific resource requirement.

Let's profile these services in detail and their specific hardware requirements to drive home the point - why MDS is required! 

This feature got added in Couchbase 4.0.


Query Service primarily executes Couchbase native queries, N1QL(similar to SQL, pronounced as nickel - leverages the flexibility of JSON and power of SQL). The query engine parses the query, generates execution plan and then executes the query in collaboration with index service and data service. The faster queries are executed, the better the performance.

Faster query processing requires more CPU or fast processor (and less memory & HDD). More cores will help in processing queries in parallel. 

Reference on  - Query Data with n1ql

Index Service performs indexing with Global Secondary Indexes (GSI - similar to B+tree used commonly in relational DBs). Index is a data structure which provides quick and efficient means to access data.  Index service creates and maintains secondary indexes and also performs an index scan for N1QL queries. GSI/indexes are global across the cluster and are defined using CREATE INDEX statement in N1QL. 

Index service is disk intensive so Optimized storage / SSD  will help in boosting performance. It needs a basic processor and less RAM/memory.  As an administrator, you can configure GSI with either the standard GSI storage, which uses ForestDB underneath (since version 4.0), for indexes that cannot fit in memory or can pick the memory optimized GSI for faster in-memory indexing and queries. 

Data Service is central for Couchbase as data is the reason for any DB.  It stores all data and handles all fetch and update requests on data.  Data service is also responsible for creating and managing MapReduce views.   Active documents that exist only on the disk take much longer to access, which creates a bottleneck for both reading and writing data. Couchbase tries to keep as much data as possible in memory.
Data refers to (document) keys, metadata, and the working set or the actual document.   Couchbase relies on extensive caching to achieve high throughput and low read/write latency. In a perfect world, all data will be sitting in memory.

Data Service: Managed Cache (based on Memcached) + Storage Engine + View Query Engine

Memory and the speed of storage device affect performance (IO operations are queued by the server so faster storage helps to drain the queue faster). 


Why MDS

So, each type of service has it's own resource constraints. Couchbase introduced multi-dimensional scaling in version 4.0 so that these services can be independently optimized and assigned the kind of hardware which will help them excel. One size fits all is not going to work (especially when you are looking for higher throughput i.e. sub-milliseconds response times).  For example, storing data and executing queries on the same node will cause CPU contention. Similarly, storing data and indexes on the same node will cause disk IO contention.

http://blog.couchbase.com/introducing-multi-dimensional-scaling

Through MDS, we can separate, isolate and scale these three services independent of each other which will improve resource utilization as well as performance.

http://developer.couchbase.com/documentation/server/4.5/concepts/distributed-data-management.html



References
http://www.couchbase.com/multi-dimensional-scalability-overview
http://developer.couchbase.com/documentation/server/4.5/concepts/distributed-data-management.html

---
happy learning !!!

Friday, November 11, 2016

IP Address in Private Range

Ah, again; I forgot the range of private IP address. So, no more cursing of my memory. Now...instead of googling I will search on my blog :D

Below are permissible private IP ranges:
  • 10.0.0.0        - 10.255.255.255
  • 172.16.0.0    - 172.31.255.255
  • 192.168.0.0  - 192.168.255.255
These IP address are used for computers inside a network which needs to access inside resources. Routers inside the private network can route traffic between these private addresses. However, if they want to access resource in outside world (like internet) these network entities have to have a public address in order for response to reach to them. This is where NATing is used. 


Representing IP address in CIDR format


The number following the forward slash (/) is the prefix length, the number of shared initial bits, counting from the most significant bit of the address. 

Thus, a /20 block is a CIDR block with an unspecified 20-bit prefix. 

In below example, 10.0 is the network address and last two segments are device addresses. 



Wednesday, October 12, 2016

Containerize your application to achieve Scale

This post talks in general about Containers; their evolution and contribution in scaling systems.

Once upon a time, applications used to run on servers configured on bare mettle sitting in companies own data centers. Provisioning used to take anywhere from few days to few weeks. Then came Virtual Machines which use hardware visualization to provide isolation. They take time in minutes to create as they require significance resource.  Then finally; here, comes a brand new guy in the race, which takes 300 ms to couple of seconds to bootstrap a new instance; yes I am talking about containers. They don't use hardware virtualization. They interface directly with the host's linux kernel.


Managing VMs at scale is not easy.  In-fact, I find difficult to manage even couple of VMs :D So just imagine how difficult it would be for companies like Google and Amazon which operate at internet scale.

Two features which has been part of Linux Kernel since 2007 are cgroups and namespacesEngineers at Google started exploring process isolation using these kernel features (to manage and scale their millions of computing units). This eventually resulted in what we know today as containers. Containers inherently are light weight and that makes them super flexible and fast. If containers even think of misbehaving, they can easily be replaced by another brand new container because the cost of doing so is not high at all. This means, they need to be run in a managed and well guarded environment. Their small footprint help in using them for specific purpose and they can easily be scheduled and re-arranged/load balanced. 

So one thing is quite clear, Containers are not brand new product or technology. They use existing features of OS. 

With containers the actual problem of making every component of a system resilient and bullet proof doesn’t hold good. This seems contradictory - we want to make systems more resilient but containers themselves are very fragile. This means any component deployed in them automatically becomes non-reliable. 
We can design our system with assumption that containers are fragile. If any instance failed - just mark it bad, replace it with a new instance. With containers the real hard problems are not isolation but orchestration and scheduling.

Read more in details on Containers vs VMs


Containers are also described as jail which guards the inmates to make sure that they behave themselves. Currently, one of the most popular container is Docker. And at the same time there are tools available to manage or orchestrate them (one of the most popular one is Kubernetes from Google).


Happy learning!!!

Wednesday, July 27, 2016

Vert.x Event Loop

Event Loop is not new and also not specific to Vert.x (used in Node.js).   I will be talking about it, in a top down fashion. 

Vert.x is a library (yes, it's just jar) which allows to develop Reactive applications on JVM.  Reactive system is defined in Reactive Manifesto. Vert.x supports this manifesto, so it enables application written in Vert.x to be - Responsive, Elastic, Resilient and Message driven.

The last point (Message Driven) of reactive manifesto defines the essence of Vert.x - It's event/message driven and non-blocking/asynchronous. This means, different components interact with each other asynchronously by passing messages. 

//Synchronous and blocking API call
Object obj = new MyObject();
obj.fetchDataFromUrl(url);

The traditional application makes blocking API call, so calling thread waits until the response is available. This means until the response is available the thread is sitting ideal and doing nothing. This is not a good thing from a resource utilization point of view. Now, how about making that thread more specialized whose job is only to post request, i.e. it's not going to wait for a response to arrive. The thread will go on doing only one thing till the sky falls. This way the thread will not be sitting ideal (unless there is NO request). Putting it in a more generic term, the thread will be passing on messages or events. 

Event Loop is basically a thread (or a group of threads; Vert.x matches it closest to CPU cores) whose job is to keep passing on messages to their specific handlers. The threads pick the event (from a queue) and then hands over the event to the right handler. Event loop maintains the order (as it picks the events internally from a queue) and it's also synchronous as there is going to be one or limited threads (Note: they themselves are synchronous). Vert.x allows configuring the number of threads (one per core of the CPU). Handlers are always called by the same event loop so there is no need of synchronization.

Event Loops are limited and so special, so blocking them will be disaster. Event loop calls the method asynchronously and in a non-blocking manner. Once the response arrives the same event loop calls the callback. 


References
http://vertx.io/docs/vertx-core/java/#_in_the_beginning_there_was_vert_x
http://www.dre.vanderbilt.edu/~schmidt/PDF/reactor-siemens.pdf
http://vertx-lab.dynamis-technologies.com/
https://github.com/vietj/vertx-materials/blob/master/src/main/asciidoc/Demystifying_the_event_loop.adoc
https://strongloop.com/strongblog/node-js-event-loop/
http://www.enterprise-integration.com/blog/vertx-understanding-core-concepts/


Friday, July 22, 2016

Pushing a new project to Github

This post talks about pushing a new project on Github. Make sure that the project is not created already on the Github.

I have illustrated using a sample application named as websockets.  I have shown created a project with just one file (README.md) and then pushing the project to github. You can run below commands in terminal or gitbash. 

$ mkdir websockets
$ cd websockets
$ echo "# websockets" >> README.md

$ git init
Initialized empty Git repository in /Users/siddheshwar/Documents/gitRepo/websockets/.git/

$ git add README.md

$ git commit -m "first commit"
[master (root-commit) 24fac01] first commit
1 file changed, 1 insertion(+)
create mode 100644 README.md

$ git remote add origin https://github.com/raiskumar/websockets.git

$ git push -u origin master
Username for 'https://github.com': rai.skumar@gmail.com
Password for 'https://rai.skumar@gmail.com@github.com': 
Counting objects: 3, done.
Writing objects: 100% (3/3), 233 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/raiskumar/websockets.git
 * [new branch]      master -> master

That's it, you are done !!!

Wednesday, July 20, 2016

Gradle - Create Java Project Structure

Gradle Init plugin can be used to bootstrap the process of creating a new Java, Groovy or Scala project. This plugin needs to be applied to a project before it can be used. So if we want to create default directory structure of a Java project this plugin can be handy (Especially if you don't have Gradle plugin in your IDE).

$gradle init --type java-library

The init plugin supports multiple types (it's 'java-library' in above command). Below are the command sequence and directory which gets created after successful execution.


$ mkdir hello-gradle
$ cd hello-gradle/
$ gradle init --type java-library
:wrapper
:init

BUILD SUCCESSFUL


Total time: 8.766 secs

$ ls -ltr
total 20
drwxrwxr-x. 3 vagrant vagrant   20 Jul 20 06:00 gradle
-rwxrwxr-x. 1 vagrant vagrant 5080 Jul 20 06:00 gradlew
-rw-rw-r--. 1 vagrant vagrant 2404 Jul 20 06:00 gradlew.bat
-rw-rw-r--. 1 vagrant vagrant  643 Jul 20 06:00 settings.gradle
-rw-rw-r--. 1 vagrant vagrant 1212 Jul 20 06:00 build.gradle
drwxrwxr-x. 4 vagrant vagrant   28 Jul 20 06:00 src
-- 

So above command also installs the other Gradle dependencies to run the build (i.e. gradl, gradlew.bat). If you don't know what the appropriate type for your project, specify any value then it will list valid types.

$ gradle init --type something

Execution failed for task ':init'.
> The requested build setup type 'something' is not supported. Supported types: 'basic', 'groovy-library', 'java-library', 'pom', 'scala-library'.
--

So, if you just type any random text as type; Gradle tells the allowed types.

If you just use $gradle init , then gradle tries (it's best) to automatically detect the type. If it fails to identify the type, then applies basic type. 

Importing Gradle Project to Eclipse

Note that, above command created gradle specific files along with default java directories (like src) but it didn't create eclipse specific files. This means, if you try to import above created project in eclipse it will not work. To achieve that, do below:
  1. Add eclipse plugin in gradle build file (i.e. build.gradle).  Put below after java plugin. 
          apply plugin: 'eclipse'
  1. Run $gradle eclipse

This basically creates files - .classpath, .project and .settings

Now, you are good to import above project in eclipse.

Alternatively, you can clone from my GitHub repository


---
Happy coding !!!


Thursday, July 7, 2016

Microservices Explained

I have been reading about microservices for a while (must admit, I delayed it thinking, it's just old wine in new bottle), and the more I dive deeper, the more exciting I find it. I am a big fan of design principle, Single Responsibility Principle (SRP) as it helps in putting boundries on class (and even on methods). SRP helps in making code simpler and cleaner (ofcourse, other design principles are equally important, but my love for SRP is boundless!). And I always used to wonder, why can't we apply SRP at service layer? And finally, Microservices God have heard me!

For a service to be called as micro it should be really small in size and that's going to be possible, if and only if, your service does only one thing(i.e follows SRP). And it should do that one thing really well. This in turn will help to easily implement, change, build, distribute, and deploy the service. This will also help in creating a highly decentralized and scalable systems. I tried looking on web to find definition of miroservices, the one which I found covering all aspects is from Martin Fowler (link).


In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies.

Let's cover some of the aspects of microservices:
  1. Each service (or set of few) should have it's own data store. So having 100's of data store is normal. You can choose between relational, noSQL DB, or even in-memory db depending on your need. If you have 2 DB and dozen of services around it, you are not micro yet.
  2. They are organized functionally or around business functions. This helps in keeping boundaries separate and clear.
  3. Loosely coupled and highly cohesive (or another way to put, they do one thing really well).
  4. They are usually RESTFul (to maintain simplicity). They receive a request, apply logic and produce the response. But they also support other interaction styles as well like RPC, message, event, stream. 
  5. They are usually asynchronous and use simple message queues. The real intelligence of the system lies at either end of the queue (i.e. with services).
  6. The complete infrastructure of build to production deployment should be automated (i.e. it should have CI/CD).
  7. In a microservice world, your single monolithic service could be replaced by 100's of microservices. You should design each microservice keeping in mind worst; design for failure. A service being down in production is a normal thing, your system should be able to handle it gracefully.  
  8. Microservices stresses a lot on real-time monitoring matrices like average response time, generate the warning if a service is not responding etc. 
  9. In an ideal scenario, the event bus should replace your operational database. Kafka is one of the fast bus and it's fast because it's dumb (i.e. it just delivers events).
  10. Microservices make your system/application more distributed which intern adds more complexity. 

Found this video interesting as it talks about challenges in implementing microservices, link.

References




Sunday, February 14, 2016

EJB Good Practices

EJBs abstract your middleware or business logic layer. They are transactional in nature so when you hit your persistence layer (mostly through JPA), the transaction is already there for your database session. As a result, all DB operations are going to complete or none of them, i.e. EJB operation is atomic. Let's cover some of the good practices:

Don't create EJB methods for CRUD operations

Imagine creating operations in your EJB for creating, fetching, updating or deleting your entity. It's not going to serve the purpose; quite clearly, CRUD operations are not your business logic!

In fact, CRUD operations will be part of your more sophisticated business operations. Let's take that you want to transfer x amount from a bank account A to another account B. There should be just a single method which reads appropriate records from DB, modifies them and performs the update.

Also, creating a CRUD operation gives the impression that EJB is created for each entity. We should create EJB for a group of related problems like manage accounts, policy manager etc. You can abstract your CRUD operations in Data Access Layer though!

Minimise Injecting EJBs in the Presentation layer

Imagine yourself working in the presentation layer (Servlets, web services, ..) and having to deal with multiple EJBs to delegate the call. You are going to struggle to find appropriate EJB and then a method for delegating the incoming calls.  Especially when someone else is taking care of business layer!

This is going to defeat the separation of concern principle which is important to manage your complex distributed system. So what's the solution to deal with this - Bundle related EJBs in a single (super) EJB and inject this super EJB. 

But make sure that, in doing so, you are not putting un-related EJBs together just for the sake of minimizing the number of EJBs.  Each EJB (including super) should adhere to the single responsibility principle

Reusing EJB methods

Suppose you have quite complex use cases which resulted in a big EJB class definition. So, the obvious question would be, how do you achieve reusability with EJB methods?

Just like a normal Java class, you can create helper EJBs with reusable methods. Multiple EJBs can use the services provided by this helper EJB. And to make these helper utilities more clear, you can put this in a utility or helper package inside your main EJB package. 


References

---
would love to hear your suggestions/feedback about this post.

Sunday, January 24, 2016

The Cost of Concurrency

Concurrency is not free!

Modern libraries provide a wonderful abstraction for programmers, so doing certain task concurrently or asynchronously is quite trivial. It is as simple as instantiating an object and calling few methods on it, and you are done! These libraries are abstracted in such a way that they don't even remind to programmers that you are going to deal with threads. And this is where the lazy programmer can take things for granted.

You need to process 100 task, create 50 threads.
     Collection<Task> task = fetchTasks();   //from somewhere
     int numberOfThreads = 50;
    obj.executeConcurrently(tasks, numberOfThreads);
In object oriented world, all it takes is a method call. 

To understand the cost of concurrency, let's take a step back and ask yourself how is it implemented? It is implemented through locks. Locks provide mutual exclusion and ensure that the visibility of change occurs in an ordered manner. 

Locks are expensive because they require arbitration when contended. This arbitration is achieved by a context switch at the OS level which will suspend threads waiting for lock until it is released. Context switch might cause performance penalty as OS might decide to do some other housekeeping job and so will lose the cached instruction and data. This is even more evident in multicore CPUs where each core has its own cache.  In the worst case, this might cause latency equivalent to that of an I/O operation. 

Another aspect of concurrency is managing the lifecycle of threads. OS does the dirty job of creating threads and managing them on behalf of your platform (or runtime environment). There are certain limits on the number of threads which can be created at the system level. So definitely, proper thoughts should be given on how many threads are required to accomplish a job.


Don't blindly decide to execute task concurrently!

Friday, January 22, 2016

Concurrency or Thread Model of Java

Thread Model in Java is built around shared memory and Locking!

Concurrency basically means two or more tasks happen in parallel and they compete to access a resource.  In the object-oriented world, the resource would be an object which could abstract a database, file, socket, network connection etc. In the concurrent environment, multiple threads try to get hold of the same resource. Locks are used to ensuring consistency in case there is a possibility of concurrent execution of a piece of code. Let's cover these aspects briefly:

Concurrent Execution is about

  1. Mutual Exclusion or Mutex
  2. Memory Consistency or Visibility of change

Mutual Exclusion
Mutual exclusion ensures that at a given point of time only one thread can modify a resource.  If you write an algorithm which guarantees that a given resource can be modified by (only) one thread then mutual exclusion is not required.  


Visibility of Change
This is about propagating changes to all threads. If a thread modifies the value of a resource and then (right after that) another thread wants to read the same then the thread model should ensure that read thread/task gets the updated value. 

The most costly operation in a concurrent environment contends write access. Write access to a resource, by multiple threads, requires expensive and complex coordination. Both read as well as write requires that all changes are made visible to other threads. 


Locks

Locks provide mutual exclusion and ensure that visibility of change is guaranteed (Java implements locks using the synchronized keyword which can be applied on a code block or method).

Read about the cost of locks, here

Wednesday, January 20, 2016

Preventing logging overhead in Log4J

Have you seen something like below in your application, and ever wondered about the overhead (due to the if condition). This post covers ways to get rid of the overhead with Log4j 2.x and Java 8.

if(log.isDebugEnabled()){
   log.debug("Person="+ person);
}

Above style is quite common in log4j-1.x; though it adds few extra lines it improves the performance.


Below log calls toString method on the person even if it's not going to get logged.
log.debug(person);  //big NO; avoid this !

So how do we remove the overhead of if check

The if condition is an overhead and it's going to appear multiple places in method/class. Also, if you don't do logging judiciously then it can be spread all over.

log4j 2.x
log4j 2.x is out there after a long gap and this particular issue has been addressed. Inspired by SLF4J it has added parameterized logging.

log.debug("{}"+person); //will not call .toString method on person
log.debug("{}"+person.toString());   //this is still going to be an issue

So log4j 2.x parameterized logging mechanism will not call implicit (toString) method on the person, but if you call it explicitly then it will make the call. So log4j 2 has partially addressed the problem.

log4j 2.4 + Java 8 lambda
log4j 2.4 supports lambda based mechanism which solves the problem completely. So it doesn't call the method (implicit or explicit) at all if the statement being logged is at the level less specific than the current level.

log.debug(() -> person.toString());   //all good 
log.debug("{}"+ this::expensiveOperation());   //all good

Final Note:
Be mindful of logging overhead when you do code review!


References
https://logging.apache.org/log4j/2.0/manual/api.html
http://marxsoftware.blogspot.pt/2015/10/log4j2-non-logging-performance.html
http://www.jroller.com/bantunes/entry/java_lazy_logging_features_ways
http://www.infoq.com/news/2014/07/apache-log4j2
http://www.grobmeier.de/log4j-2-performance-close-to-insane-20072013.html

Saturday, January 16, 2016

Extracting root cause from Stack Trace

Don't tell me the problem; show me the logs first!

Whether you are a fresh graduate, an experienced programmer, QA engineer, production engineer or even a product manager - a good understanding of stack trace is vital to crack critical issues. Your ability to find the real culprit from a lengthy stack trace will be instrumental in resolving a problem. This is even more important if you work on a distributed system where you use many libraries so stack trace is not well structured. Let's start with a simple scenario-

Scenario 1: Simple

This is the most trivial case, where the exception gets thrown by a method of your project and during the call duration, it doesn't go out of your code base. This is the most trivial scenario which you will encounter but very important to understand how the stack trace gets printed.

Shown below is Eclipse screenshot of MyController.java which two classes. Right click and run the program. 

Let's Decode Above stack trace:
  • RuntimeException is shown in line number 29, in method MyService.four()
  • Method MyService.four() gets called by MyService.three() in line number 25
  • Method MyService.three() gets called by MyController.two() in line 11
  • Method MyController.two() gets called by MyController.one() in line 6
  • Method MyController.one() gets called by MyController.main() in line 17


First frame of stack trace holds all important information required to know the root cause. 
Be mindful of the very important line number 

Scenario 2: Chain Of Exception

Let's modify above code a bit by catching exception at the origin and then throwing a brand new Exception. 


Let's Decode Above stack trace:

This stack trace has a caused by section. It has only one caused by but in real applications you can have multiple caused by sections. The last caused by will have the root cause of the exception.


Caused by: java.lang.RuntimeException: here, comes the exception!
at MyService.four(MyController.java:30)


... 4 more


But, if you are using external jars or libraries, finding the root cause could be bit tricky as the real reason might be nested deep inside. In such case you should look for Class.method name which belongs to your application. Also, you should look the complete stack trace carefully as the real root cause could lie in any part of stack trace. 


References:
http://www.nurkiewicz.com/2011/09/logging-exceptions-root-cause-first.html
http://stackoverflow.com/questions/12688068/how-to-read-and-understand-the-java-stack-trace
http://stackoverflow.com/questions/3988788/what-is-a-stack-trace-and-how-can-i-use-it-to-debug-my-application-errors


Friday, January 15, 2016

Logging Hibernate Query

To log Hibernate SQL queries, we need to set below properties to true in the configuration of the session factory (only the first one is mandatory).

<property name="hibernate.show.sql" value="true" />
<property name="hibernate.format.sql" value="true" />
<property name="hibernate.use.sql.comments" value="true" />

show.sql : logs SQL queries
format.sql: pretty prints the SQL
use.sql.comments: adds explanatory comment

This is how a sample query will look like, after enabling above properties:

/* insert com.co.orm.AuditLogRecord  */
insert into 
audit_log
(creationTime, createdBy, deleted, name, operationType, rowId, details)
values
(?,?,?,?,?,?,?,?)

Decoding Above Query

Hibernate logs above query which gets sent to the JDBC driver. Hibernate logs prepared statement and that's why you are seeing? instead of the actual values.

Hibernate only knows about the prepared statement that it sends to the JDBC driver.  It's JDBC driver that builds the actual query and sends them to the database for execution.

This means that, if you want Hibernate to log the actual SQL query with the value embedded, it would have to generate them for you just for logging! 

If you want to see actual query:

Thursday, January 14, 2016

Binary Tree Level Order Traversal

Pre-order, In-order and Post-order traversal of tree use Depth First Search (DFS). These are called DFS since these techniques visit the tree deeper and deeper until it reaches the leaf node. Level-order traversal (as the name suggest) visits the nodes level by level. Level Order Traversal is Breadth First Search(BFS). Below diagram shows Level Order traversal of a binary tree. 



Let's see implementation techniques:

Without Using External Storage

Applying recursion in BFS traversal is non-trivial. So in this case, the most obvious approach could be to print the nodes at each level iteratively. Above tree has three level, which equals the height of the tree. Now, how do we print all nodes at a given level?

To print nodes at level 2(i.e. 2 and 3), we need to start at the root and keep traversing down the tree recursively. Each time we go one level down the level value needs to be reduced by 1. This way both left and right subtree will keep going down the tree independently until the level becomes 1.

   public void levelOrderIteratively(){
    int height = this.getHeight(root);
    for(int i=1; i<= height; i++){
    printNodesAtGivenLevel(root, i);
    }
    }

//other methods are given in full implementation

The below implementation prints all nodes in the same line. In above method, we are calling a separate method to print for each level so we can format it the way we want (something like below can be easily done by tweaking above for loop. 


Nodes at level 1 = 1 

Nodes at level 2 = 2 3 
Nodes at level 3 = 4 5 6 

Using external storage / Using Queue

One of the de-facto approaches to implementing any breadth-first traversal is using Queue (First in First out data structure). BFS enables to visit nodes of the tree level by level. 

Java Implementation

Below java implementation provides level order traversal using an iterative technique which doesn't require extra storage as well as using Queue. 

package algo;

import java.util.LinkedList;
import java.util.Queue;

/**
 * Generic Binary Tree Implementation
 * @author Siddheshwar
 *
 * @param <E>
 */
public class BinaryTree<E> {
 /**
  * Root node of the tree
  */
 Node<E> root;

 /**
  * Node class to represent each node of the tree
  */
 static class Node<E> {
  E data;
  Node<E> left;
  Node<E> right;

  Node(E d) {
   this.data = d;
   this.left = null;
   this.right = null;
  }
 }

 /**
  * Get height of the tree
  * 
  * @param node
  *            root node
  * @return height of the tree which is max of left subtree height or right
  *         subtree height + 1 (for root)
  */
 public int getHeight(Node<E> node) {
  if (node == null) {
   return 0;
  }

  return Math.max(getHeight(node.left), getHeight(node.right)) + 1;
 }

 /**
  * Print binary tree in level-order
  */
 public void levelOrderIteratively() {
  int height = this.getHeight(root);
  for (int i = 1; i <= height; i++) {
   printNodesAtGivenLevel(root, i);
  }
 }

 /**
  * Print nodes at the given level
  * 
  * @param node
  *            first call with root node
  * @param level
  *            first call with the level for which we need to print
  */
 private void printNodesAtGivenLevel(Node<E> node, int level) {
  if (node == null)
   return;

  if (level == 1) {
   System.out.print(node.data + " ");
   return;
  }

  printNodesAtGivenLevel(node.left, level - 1);
  printNodesAtGivenLevel(node.right, level - 1);

 }

 /**
  * Prints level order traversal using Queue
  */
 public void levelOrderQueue() {
  if (root == null) {
   return;
  }

  Queue<Node<E>> queue = new LinkedList<>();
  queue.add(root);

  while (!queue.isEmpty()) {
   Node<E> node = queue.poll();
   System.out.print(node.data + " ");

   if (node.left != null) {
    queue.add(node.left);
   }
   if (node.right != null) {
    queue.add(node.right);
   }
  }
 }

 /**
  * Method to test the tree construction
  */
 public static void main(String[] args) {
  BinaryTree<Integer> bt = new BinaryTree<Integer>();
  bt.root = new Node<>(1);
  bt.root.left = new Node<>(2);
  bt.root.right = new Node<>(3);
  bt.root.left.left = new Node<>(4);
  bt.root.right.left = new Node<>(5);
  bt.root.right.right = new Node<>(6);

  System.out.println(" height =" + bt.getHeight(bt.root));
  bt.levelOrderIteratively();
  System.out.println();
  bt.levelOrderQueue();
 }
}

--
keep coding !!!