Saturday, February 18, 2017

Java 8 method references

This post talks about one of the coolest feature of Java 8; method references!

Methods are no longer second class values in Java 8. Just like object references which can be passed around as values, methods can also be passed around as values. To achieve this, Java 8 provides notation, :: (along with lambda, off course)

You might argue that, lambda functions do the same thing. And you are write to a great extent. The only fundamental difference is that, this way you can simply refer to a method without writing the full body of the method (as we do in lambda).

Lambda vs Method Reference
Function<Double, Double> square = (Double d) -> d * d;   //lambda
Function<Double, Double> square = Arithmatic::square    //method reference

Arithmatic::square is a method reference to the method getSquare defined in the Arithmatic class. Brackets are not needed because we aren't calling the method actually.

(String s) -> System.out.println(s)     //lambda
System.out::println                            //method reference

In lambda, you just call the method without it's name but in method reference you explicitly refer it by name. This way, code is more readable.

happy functional programming !

Sunday, January 29, 2017

Applying RxJava on a Fibonacci Sequence Generator

This post shows how can we marry an exiting method which generates a Collection into an Event Stream and then React to it using RxJava. I have discussed fundamentals of RxJava in this previous post.

Let's assume that we have a method which generates first k Fibonacci sequences and returns them as a Collection.

//Generates first k fibonacci numbers
private static List<Integer> generateFibonacci(int count) {
        List<Integer> resp = new ArrayList<> ();
        resp.add (0);
        resp.add (1);
        for(int i=2; i< count; i++){
            resp.add(resp.get (i-1) + resp.get (i-2));
        return resp;

Reactive Programming is programming with Asynchronous Data Streams. In this previous post, I have discussed fundamentals of RxJava. This post will show different ways to apply RxJava constructs on above method.

Observable Creation

Observable provides methods to convert an array, Collection, Future, or even a normal function into Observable. 

from - Converts an Iterable sequence into an Observable that emits the items in the sequence. 

Observable.from (generateFibonacci (10) );

Please note that, method returns a Collection which is of type Iterable so it can be converted into an Observable. 

fromCallable - Returns an Observable that, when an observer subscribes to it, invokes a function you specify and then emits the value returned from that function.
Observable.fromCallable (() -> generateFibonacci (10));

Transforming Observable

RxJava provides different operation which can be applied on Observable to transform it. Let transform above observable in such a way that only fibonacci element which are less than 15 gets finally pushed to the subscriber. 
filter - Filters items emitted by an Observable by only emitting those that satisfy a specified predicate.
   //1st Approach
        Predicate<Integer> predicate = new Predicate<Integer> ( ) {
            public boolean test(Integer integer) {
                return integer < 15;

        Observable.from (generateFibonacci (10))
                .filter (i -> predicate.test (i));

        //2nd Approach
        Observable.from (generateFibonacci (10))
                .filter (integer -> {
                    return integer < 15;

        //3rd Approach
        Observable.from (generateFibonacci (10))
                .filter ((integer) -> integer < 15);

Subscribing Observable

Observables emit events and subscribers react to it. Both Observable and Subscriber/Observers are independent and they just agree to the contract. 


                .subscribe (System.out::println);

        //Or a more detailed and formal subscriber which gives all 3 event handlers!

        Observable.from (generateFibonacci (10))
                .subscribe(new Subscriber<Integer> () {
                    public void onNext(Integer s) { System.out.println(s); }

                    public void onCompleted() { System.out.println("Completed!"); }

                    public void onError(Throwable e) { System.out.println("Ouch!"); }

--happy Rx !!!

Thursday, January 26, 2017

Reactive Programming through RxJava

Rx was originally developed as Reactive Extension for C#. Later, Netflix ported it for JVM platform under Apache license, named as RxJava!
RxJava is also an extension of Observer Design Pattern.
This post, I will be focusing mostly on the fundamentals and constructs of RxJava.

Rx Building Blocks

The two fundamental constructs of Rx are Observables and Subscribers. Observables abstract event stream and emit them asynchronously to the subscibers (or observers).  There could be zero or more events and it terminates either by successfully completing or ending to an error. Streams can be created out of anything- list of integers, user clicks in UI, DB queries, tweeter feeds and what not. It's up to you, you can covert any sequence of data, input, function etc into a stream. And on top of streams we have functional toolkit to filter, modify, combine, merge these events/streams. We can merge two stream to form a single stream, map values from one stream to another. Below image shows UI click represented as events. An instance of Subscription represents connection between Observables and Subscribers. 

Observables: rx.Observable<T>

Observable is the first core component of Rx. Let's see what are ways through which observables can be created-

List<Integer> list = Arrays.asList (1,2,3,4 );
Observable<List<Integer>> listObservable = Observable.just (list);

//from can take an Array, Iterable and Future

Callable<String> call = () -> function1();

Observable.fromCallable(()-> {throw new RuntimeException("some error")}); 

Observable class provides method, subscribe which will be used to push values to the observers. The subscriber is basically an implementation of Observer interface. Observable pushes three kinds of events as shown below as part of Observer interface. 

Provides a mechanism for receiving push-based notification. This interface provides three methods which will get called accordingly.

public Interface Observer<T>{
      void onCompleted();  
      void onError(Throwable e);
     void onNext(T t);

onCompleted() notifies the observer that observable has finished sending events.
onError() notifies the observer that observable has encountered an error / unhandled exception.
onNext() provides the subscribed observer with the new item/event.

Usually, in the code you will not be able to explicitly notice Observers and those three callback methods; thanks to lambda and some other short hands. To subscribe to an Observable, it's not necessay to provide instance of Observer at all. Also, it's not always necessary to provide implementation of onNext, onCompleted and onError; you could just provide a subset of them.

Consuming Observables (i.e. Subscribing)

Consuming Observables basically means Subscribing to them. Only when you subscribe, you can receive events abstracted by observable. 

                .just(1, 2, 3)
                .subscribe(new Subscriber<Integer>() {
                    public void onCompleted() {
                        System.out.println("Completed Observable.");

                    public void onError(Throwable throwable) {
                        System.err.println("Whoops: " + throwable.getMessage());

                    public void onNext(Integer integer) {
                        System.out.println("Got: " + integer);

Got: 1
Got: 2
Got: 3

Completed Observable.

A well formed Observable will call onNext 0 or more times and then will call either onError or onNext exactly once. 

Now let's simulate through an example; how observable can throw an Error.

                .just(1, 2, 3)
                .doOnNext(new Action1<Integer>() {
                    public void call(Integer integer) {
                        if (integer.equals(2)) {
                            throw new RuntimeException("I don't like 2");
                .subscribe(new Subscriber<Integer>() {
                    public void onCompleted() {
                        System.out.println("Completed Observable.");

                    public void onError(Throwable throwable) {
                        System.err.println("Oops: " + throwable.getMessage());

                    public void onNext(Integer integer) {
                        System.out.println("Got: " + integer);

Got: 1

Oops: I don't like 2

The first value gets through without any obstacle, but the second value throws an exception and then terminates the Observable. This will get confirmed if you run above snippet.

We don't need to always implement full subscriber every time. 
                .just(1, 2, 3)
                .subscribe(new Action1<Integer>() {
                    public void call(Integer integer) {
                        System.out.println("Got: " + integer);

In this case, if error happens it will come on your face as there is no handler for that. It's best to always implement error handler right from beginning.

Note: RxJava is single threaded by default.


Saturday, January 21, 2017

Scaling your Data Storage

Traditional approach of storing all data is - store it in a single data source and handle all read write operations in a more centralized fashion. In this approach, you keep adding more memory and/or processing power (or buy bigger and more powerful machine) as load on your system grows. Well this works for most cases, but a time will comes when this doesn't scale. Upgrading hardware doesn't always help to achieve scalability.

So obvious solution to this problem is - start partitioning your data. 
This post covers possible partitioning approaches.

Let's take example of a online system where you are storing user profile details and photographs (of course, along with other details). 

Dividing Schema / Scale Up

One way to partition data is - store profile detail on one server and photographs on other.  This way only specific read and write queries are sent and hence scaling is possible. This approach of partitioning is known as vertical partitioning. This approach, basically divides your schema. So if you want to fetch complete data of a given user, you need to fetch data from two different data sources. 

Replicating Schema / Sharding / Scale Out

In this approach you replicate the schema and then decide what data goes where. So all instances are exactly same.  In this approach, profile details as well as photograph of a user will be on a single instance.

  • If an instance goes down, only some of the users will get affected (High Availability), so your system overall doesn't get impacted. 
  • There is no master-slave thing here, so we can perform write in parallel. Writing is a major bottleneck for many systems. 
  • We can load balance our web server and access shard instances on different paths, this leads to faster queries. 
  • Data which are used together are stored together. This leads to data denormalization, but need to keep in mind that, it doesn't mean that data is not separated logically (so we are still storing profile and photographs logically separate). So no complex join, data is read and written in one operation. 
  • Segregating data in smaller shards helps in performance, as it can remain in cache. It also makes managing data easier, fast backup and restore. 

Functional Programming in Java - Function as First Class Citizens

Java (in version 8) added support for treating functions as first class citizens. Before this, Java programmers had to wrap standalone functions with Runnables or other anonymous interface implementation class to be able to refer to them. 

Function as first class citizen/type/object/entity/value:
Functions can be passed around like any other value!

What this basically means is that-
  •  functions can be passed as arguments to other functions, 
  • functions can be returned as values from other functions, 
  • functions can be assigned to variables 
  • functions can be stored in data structures. 

Before Java 8, Objects and primitives data types were treated as first class citizens (Classes and Functions were treated as second class as they were not values). But, functions are just as valuable to manipulate and reuse. Java inherited this modern feature from other functional programming languages (like JavaScript, Scala). This will help in improving productivity as programmers now can code at different level of abstraction. 

Read more in detail about what is first class, here

Lambda expression enables you to provide implementation of the functional interfaces (which has only one method ) directly inline and they treat the whole expression as an instance of functional interface. Below, we will see how lambda enables functional interfaces to be treated as first class objects.

Function As Value

java.util.function package (ref) provides multi purpose functional interfaces which come handy.

import java.util.function.BiFunction;
import java.util.function.Predicate;

 * Creates Inline lambda functions and stores as variable. And then function can be called directly from variable.
public class FunctionAsValue {
    public static void main(String[] args){
        Predicate<String> predicate = (s) -> s!=null || !s.isEmpty ();
        System.out.println(predicate.test ("geekrai"));

        //static method which confirms to Predicate method signature can be stored in invoked
        Predicate<String> pedicateRef = FunctionAsValue::isNotEmpty;
        System.out.println(pedicateRef.test ("geekrai"));

        //BiFunction accepts two arguments and produces a result
        BiFunction<String, String, String> concat = (s, t) -> s+t;
        System.out.println(concat.apply ("hello", " world"));

    private static boolean isNotEmpty(String s){
        return s!=null || !s.isEmpty ();

hello world

Function As Parameter

import java.util.function.Function;

 * Illustrates how to pass function as parameter
public class FunctionAsParam {
    public static void main(String[] args){
        //pass function as parameter to transform method
        System.out.println(transform ("geek", "Rai", s -> s.toUpperCase ()));

        Function<String, String> toLower = s -> s.toLowerCase ();

        System.out.println(transform("geek", "Rai", toLower));

    private static String transform(String a, String b, Function<String,String> toUpper){
        if(toUpper != null){
            a = toUpper.apply (a);
            b = toUpper.apply (b);
        return a + b;


--happy functional programming !!!

Friday, January 6, 2017

Pure Functions

This post discusses about one of the important aspect of functional programming, Pure Function using Java example.

Pure function are side-effect free functions or stateless functions. The output of pure functions depend purely (or completely) on input. What side-effect free basically means is that, pure functions should not modify argument as well as any other state variables. 

Now, obvious question is why is it so important. We will see later how it's one of the important ingredient for functional and Reactive programming. Let take a simple Java class to understand pure functions.

package functional;

import java.util.Random;

public class PureFunction {
 private int state = 420;

 // pure function
 public int addTwo(int x) {
  return x + 2;

 // Impure function
 public int addState(int x) {
  return x + state;

 // Impure Function
 public int randomizeArgument(int x) {
  Random r = new Random(1000);
  return x + r.nextInt(1000);

Method, addTwo purely depends on it's input/argument. No matter how many times you call it, it will behave in a predictable manner (i.e just add 2 to argument and return the added number as response). While the response other two methods (addState and randomizeArgument) is not going to be consistent even if it's called with same argument multiple times. Response of method addState depends on the instance variable of the class. If it gets mutated somewhere else, the response of the method will not be consistent. Similarly, the randomizedArgument method response will vary depending on the random value.

Let's cover some of the important points

  • If a pure function references any instance variable of the class, then that instance variable should be declared as final.
  • Pure functions help in highly concurrent and scalable environment by making the behavior predictable. 
  • You get parallelism free of cost if your method doesn't access a shared mutable data to perform its job. 

--happy learning !!!

Sunday, December 18, 2016

Git branching strategy

This post explores different Git branching options to optimally manage code base. Primarily, there are two options which most of the teams use - Git flow and GitHub flow. Let's cover them in detail.

Git Flow

Git Flow works on the idea of releases; you have many branches each specializing in specific stages of application. The branches are named as master, develop, feature, release and hotfix.

 You create a develop branch from the master branch which will be used by all developers for feature development and bug fixes for a specific release. If a developer needs to add a new feature then he will create a separate branch from develop branch so that both branches can evolve independent of each other. When feature is fully implemented, that branch gets merged back to develop. Idea is to have a stable and working develop branch; so feature gets merged back only when it's complete. Once all the features related to a release are implemented; develop branch is branched to release branch where formal testing of the release will commence. Release branch should be used only for bug fixes and not development. And finally, when it's ready for deployment it will be tagged inside master branch to support multiple feature releases.
This flow is not ideal for projects where you do releases quite often. This helps where you have releases happening like every other month or even every month. 

GitHub Flow

GitHub has popularized it's own flow which is quite effective for open source projects and other projects which doesn't have longer release cycles (in this, you could release every day). It's based on the idea of features.

It's light weight version of git flow which removes all unwarranted overhead of maintaining so many branches (It's just have feature branches apart from master). If you want to work for a feature or bug you start a separate branch from master. Once feature/bug is implemented it get's merged back to master branch. Feature branches are given quite descriptive names like oauth-support or bug#123. Developers can raise pull request to collaborate with co-workers or to do code review on the feature branch. Once code is reviewed and sign off is received the featured branch is merged back to master branch. You can have as many feature branches as you wish and once feature branch is merged it can also be deleted as master branch will have all commit history (unless you don't want). 

GitHub flow assumes that every time you merge changes to master; you are ready for production deployment.

There is 3rd branching strategy as well; GitLab Flow.


--happy branching !!!

Tuesday, December 6, 2016

Show Directory Tree structure in Unix

Graphics based operating systems provide option for you to see the complete directory structure graphically. But, in console based Linux/Unix operating systems this is not possible (by default). At most you can go inside each directory and do ls or if you are a scripting geek you can create one to do the same for you.

Recently, I came across directory listing program, tree which makes this job possible in Unix/Linux from the terminal. 


CentoOs/RHEL: $sudo yum install tree -y

Ubuntu: $sudo apt-get install tree

Using Tree

Use tree and the directory name. That's it!

$tree cookbooks/
`-- learn_chef_httpd
    |-- Berksfile
    |-- chefignore
    |-- metadata.rb
    |-- recipes
    |   `-- default.rb
    |-- spec
    |   |-- spec_helper.rb
    |   `-- unit
    |       `-- recipes
    |           `-- default_spec.rb
    `-- test
        `-- recipes
            `-- default_test.rb

7 directories, 8 files

Saturday, November 19, 2016

How Couchbase identifies Node for Data Access

This post talks about how Couchbase identifies where exactly the document is stored to facilitate quick read / update.

Couchbase uses a term Bucket which is equivalent to the term Database in relational world to logically partition all documents across cluster (or data nodes). Being a distributed DB, it tries to evenly distribute or partition (or shard) data into virtual buckets known as vBuckets. Each vbucket owns a subset of the keys or document id (and of course corresponding data as well) . Documents get mapped to vBucket by applying hash function on the key. Once the vBucket is identified there is a separate lookup table to know which nodes hosts the vBucket. The thing which maps different virtual buckets to nodes is known as vBucket map. (Note: Cluster Map contains of mapping of which service belong to which node at any given point of time)

Steps Involved (as shown in diagram):

  1. Hash(key) to get vBucket identifier (vb-k) which hosts/owns Key.
  2. Looking up vBucket map, tells vb-k is owned by node or server n-t
  3. Request is send directly to primary server node, n-t to fetch the document. 

 Both hashing function as well number of vBucket is configurable. So mapping will change if either changes. By default, Couchbase automatically divides each bucket into 1024 active vBuckets and 1024 replica buckets (per replica). When there is only one node, all vBuckets reside on that node. 

What if a new node gets added to cluster ?
When number of nodes scales up or down the information stored in vBuckets are re-distributed among the available nodes and then the corresponding vBucket map is also updated. This entire process is known as rebalancing. Rebalancing doesn't happen automatically; as a administrator/developer you need to trigger it either from UI or through CLI.

What if primary node is down ?
All read/update request by default go to the primary node. So, if a primary node fails for some reason, Couchbase takes off that node from the cluster (if configured to do so) and promotes the replica to become the primary node. So you can fail-over to replica node manually or automatically. You can address issue with the node, fix it and add it back to the cluster by performing rebalancing. Below table shows a sample mapping.
| vbucket id | active  | replica |
|     0      | node A  | node B  |
|     1      | node B  | node C  |
|     2      | node C  | node D  |
|     3      | node D  | node A  |


happy learning !!!

Wednesday, November 16, 2016

Why Multi-Dimensional Scaling in Couchbase

Couchbase has been supporting horizontal scaling in a monolithic fashion since its inception. You keep adding more nodes to the cluster to scale and improve performance (all nodes being exactly same). This single dimension scaling works to a great extent as all services - Query, Index and Data scale at same rate. But, they all are unique and have specific resource requirement.

Let's profile these services in detail and their specific hardware requirements to drive home the point - why MDS is required! 
This feature got added in Couchbase 4.0.

Query Service primarily executes Couchbase native queries, N1QL(similar to SQL, pronounced as nickel - leverages flexibility of JSON and power of SQL) . The query engine parses the query, generates execution plan and then executes the query in collaboration with index service and data service. The faster queries are executed, the better the performance.

Faster query processing requires more CPU or fast processor (and less memory & HDD). More cores will help in processing queries in parallel. 

Reference on  - Query Data with n1ql

Index Service performs indexing with Global Secondary Indexes (GSI - similar to B+tree used commonly in relational DBs). Index is a data structure which provides quick and efficient means to access data.  Index service creates and maintains secondary indexes and also performs index scan for N1QL queries. GSI/indexes are global across cluster and are defined using CREATE INDEX statement in N1QL. 

Index service is disk intensive so Optimized storage / SSD  will help in boosting performance. It need basic processor and less RAM/memory.  As an administrator, you can configure GSI with either the standard GSI storage, which uses ForestDB underneath, for indexes that cannot fit in memory or can pick the memory optimized GSI for faster in-memory indexing and queries. 

Data Service is central for Couchbase as data is the reason for any DB.  It stores all data and handles all fetch and update requests on data.  Data service is also responsible for creating and managing MapReduce views.   Active documents that exist only on the disk take much longer to access, which creates bottleneck for both reading and writing data. Couchbase tries to keep as much data as possible in memory.
Data refers to : (document) keys, metadata and the working set or the actual document.   Couchbase relies on extensive caching to achieve high throughput and low read/write latency. In perfect world, all data will be sitting in memory.

Data Service : Managed Cache (based on Memcached) + Storage Engine + View Query Engine

Memory and the speed of storage device affects performance (IO operations are queued by the server so faster storage helps to drain the queue faster). 


So, each type of service has it's own resource constraints. Couchbase introduced multi-dimensional scaling in version 4.0 so that these services can be independently optimized and assigned the kind of hardware which will help them excel. One size fits all is not going to work (especially when you are looking for higher throughput i.e. sub-milliseconds response times).  For example, storing data and executing queries on same node will cause CPU contention. Similarly, storing data and indexes on same node will cause disk IO contention.

Through MDS, we can separate, isolate and scale these three services independent of each other which will improve resource utilization as well as performance.


happy learning !!!