About 2 months ago I joined GridGain Systems and was introduced to Apache Ignite. At my very first meetup last month in New York, someone challenged me on the subject of how Ignite could guarantee ACID level consistency if Ignite was a highly available distributed system. I was familiar with CAP Theorem from my work with Apache Cassandra, but for the first time in my otherwise data warehouse heavy career, I was having to stare ACID concepts right in the face. It was time to fire up the old interweb and really dig into what the C really meant in ACID was and how it compared to the one in CAP.
What does this mean in terms of Apache Ignite? Can it both be ACID compliant AND adhere to Brewer’s CAP theorem? Despite the inconsistent nature of the how Consistency is defined (pun intended), it sure can. But the devil, as always, is in the details.
Let’s first look at the ACID: Apache Ignite can provide fully ACID compliant transaction, but doesn’t have to for all operations. If you are looking for a strict consistency model, you will find solace in the TRANSACTIONAL (vs ATOMIC) atomicity mode. In TRANSACTIONAL mode, Ignite will either lock data on access (PESSIMISTIC Concurrency) or on prepare (OPTIMISTIC Concurrency). Either way, if you try to access data that is locked by another transaction, you will have to wait for the lock to release or optionally have the transaction fail if you have tight SLAs to meet.
Now that we understand that Ignite can be tuned for strict ACID consistency, let’s shift focus back to the CAP theorem. Contrary to popular belief, we can’t just “pick 2” from there and call it good. Realistically, Ignite will either have to be CP or AP. Now, given what we have discussed above, it seems that Ignite will favor consistency of reads over availability. So in TRANSACTIONAL mode, I would say that Ignite is CP. But there is another atomicity mode available in Ignite that changes the equation a bit, ATOMIC.
Before I get into what ATOMIC mode means, lets dig a little into the details. Apache Ignite is a distributed system that allows you to spread data over many nodes in cluster. Each piece of data will be assigned to a PRIMARY node, and you can determine how many BACKUP nodes you need to maintain your uptime guarantees. When you read data from an Ignite cluster, you will always read from the PRIMARY node, unless that node is down, at which time it will read from one of the BACKUPs. Because Ignite will only read from a single node at a time, the system does not compare values from the various replicas. And because Ignite is an in-memory system, and currently most commonly used RAM is volatile, if a node goes down, it loses its data and will be removed from the cluster. When it rejoins, it will have different data assigned to it, but those details are a blog post for another day.
If you choose to use ATOMIC atomicity mode, each mutation will either succeed or fail on its own and neither WRITES nor READS will lock any data. This is great for speed and for data that can tolerate a possibly lesser consistent model and it will favor Availability over Consistency. Therefore, in ATOMIC mode, Ignite can be considered AP.
TL;DR- As Apache Ignite is a distributed system, the CAP Theorem does apply. In TRANSACTIONAL atomicity mode, you get an CP system with the option of fully ACID compliant transactions. In ATOMIC atomicity mode, you gain the speed and uptime guarantees of AP system. You can mix both modes in your application. You can have an ACID batch set, then a few eventually consistent ATOMIC mutations. Now, who says you have to choose?