2PC (Two Phase Commit)

In this post, I will discuss two phase commit (aka 2PC) distributed transaction commit protocol and some of the problems associated with it. What is a distributed commit protocol? A commit protocol is an algorithm used for atomically committing a transaction. Atomicity implies that either all the changes (writes / updates) in the transaction will... Continue Reading →

December 30, 2021 1

Long JVM pauses without GC

Folks using Java for systems engineering might already be familiar with GC (Garbage Collection) and how GC logs can be important for debugging performance and related issues. GC activity can severely impact the dynamics of system built in Java. If it's a distributed system, then impact could be worse since there are several moving components... Continue Reading →

January 20, 2020 3

Optimistic Locking

In this post, I will briefly discuss optimistic locking technique, its advantages and potential use cases. Pessimistic locking protocol Let's first discuss the opposite of optimistic locking to setup the context. Pessimistic locking is the main locking paradigm used for guaranteeing mutual exclusion for a given piece of code subject to execution by reader and... Continue Reading →

December 23, 2019 0

Volatile vs Synchronized

In this post, I will talk about usage of volatile fields in Java and how volatile is different from synchronized. Both volatile and synchronized are used in multi-threaded programs to get some degree of thread safety depending on the operations performed by different threads. Consider the following piece of code class VolatileDemo { private int... Continue Reading →

August 18, 2019 2

Building RPC layer in a distributed system using Netty – An introductory tutorial

In this post, I will talk about how we can build a minimal RPC layer of a distributed system using Netty. By the end of this post, readers will have some familiarity with Netty concepts, protocol buffers and how these can be put together to build an initial (somewhat rudimentary) version of messaging component in... Continue Reading →

June 29, 2019 2

Vectorized Processing in Analytical Query Engines

Traditional query processing algorithms are based on "iterator" or "tuple-at-a-time" model where a single tuple is pushed up through the query plan tree from one operator to another. Each operator typically has a next() method which outputs a tuple or record and the latter is then consumed as an input record by the caller operator... Continue Reading →

April 26, 2018 0

Why Analytic Workloads are faster on Columnar Databases?

In this post I will briefly summarize why analytic (OLAP) workloads perform better on columnar (aka column-oriented) databases as opposed to traditional row-based (aka row-oriented) databases. Introduction Storage Organization Vectorized Query Execution CPU Cache Friendly Late Materialization Compression Introduction Analytic workloads comprise of operations like scans, joins, aggregations etc. These operations are concerned with data... Continue Reading →

May 4, 2017 2

Notes on Lock Free Programming (Part 1)

With the advent of multi-core architectures, it is becoming increasingly important to build scalable data structures that support the basic operations (insert, search) without taking coarse grained locks. Coarse grained locks are usually taken on the entire data structure and prevent any other concurrent thread(s) from operating even on other disjoint/orthogonal parts of the data... Continue Reading →

March 17, 2017 7

Notes on Distributed Systems (Part 1)

Eventual Consistency in a shared data replicated distributed system implies the following: A read operation R on data object X is not guaranteed to return the value of most recent completed write operation W on data object X. Alternatively, in a distributed system based on eventual consistency, it is possible for the readers to see... Continue Reading →

March 15, 2017 0

Use of Associativity in Hashing

In this short post, I will talk about the concept of Associativity in Hashing. This is a fairly well known method so the article might be an old hat for some readers. When implementing a collision resolution strategy for our hash table design, there are usually two directions that we take: Chaining - There is... Continue Reading →