There are two significant categories in in-memory computing: In-Memory Database and In-Memory Data Grids. This post aims to present a concise version of thoughts on this topic, with insights gained from a recent analyst call aiding in organizing the information.
Nomenclature of In-Memory Database vs In-Memory Data Grid
Let's start by clarifying the terminology and buzzwords. The term "In-Memory Database" (IMDB) is a well-known category name that is generally used without ambiguity. However, it's important to note that there are now traditional databases with significant in-memory "options". Examples of such databases include MS SQL 2014, Oracle's Exalytics and Exadata, and IBM DB2 with BLU offerings. The distinction between these databases and pure in-memory databases is becoming blurred. For the sake of simplicity, I will refer to all of them as "In-Memory Databases."
On the other hand, "In-Memory Data Grids" (IMDGs) are sometimes referred to as "In-Memory NoSQL/NewSQL Databases", although the latter term may be more accurate in certain cases. In this article, I will use the term "In-Memory Data Grid" as it is more commonly used.
It's worth noting that there are also In-Memory Compute Grids and In-Memory Computing Platforms that include or enhance many features found in both In-Memory Data Grids and In-Memory Databases. I know this can be confusing, but for the sake of consistency, we will use the following terms to refer to the two main categories:
- In-Memory Database
- In-Memory Data Grid
Tiered Storage for In-Memory Database and In-Memory Data Grid
Defining the concept of "In-Memory" is crucial in this context. Interestingly, there is a considerable amount of confusion as some vendors label SSDs, Flash-on-PCI, Memory Channel Storage, and even DRAM as "In-Memory."
In reality, most vendors adopt a Tiered Storage Model, where a portion of the data resides in DRAM (the fastest storage option but with limited capacity). When the DRAM reaches its capacity, the excess data is overflowed to various flash or disk devices, which are slower but offer greater capacity. As a result, these products are rarely solely based on DRAM or flash storage; instead, they utilize a combination of both. However, it should be noted that the architectural design of most products tends to lean towards either predominantly DRAM or predominantly flash/disk storage.
The key takeaway is that the interpretation of "In-Memory" can vary significantly among different products. Nevertheless, all these products share a significant component that revolves around storing data in memory.
Technical Differences of In-Memory Database vs In-Memory Data Grid
Comparing the two categories, let's start with the technical differences, which are quite straightforward.
Most In-Memory Databases are essentially traditional RDBMS systems but with a key distinction - they store data in memory instead of on disk. That's the main characteristic of these databases. They offer robust SQL support, with only a few unsupported SQL features. They come with ODBC/JDBC drivers and can often be used as a replacement for existing RDBMS systems with minimal changes required.
On the other hand, In-Memory Data Grids have a different approach. They don't provide full ANSI SQL support like In-Memory Databases. Instead, they focus on providing capabilities for Massively Parallel Processing (MPP). This means that data is distributed across a large cluster of commodity servers and processed in a parallel manner. The primary access pattern in In-Memory Data Grids is key/value access, along with features like MapReduce, various types of HPC-like processing, and limited capabilities for distributed SQL querying and indexing.
It's worth noting that there is some overlap between In-Memory Data Grids and In-Memory Databases in terms of SQL support. For example, GridGain offers extensive and constantly expanding SQL support, including features like pluggable indexing, optimized distributed joins, custom SQL functions, and more.
Speed Only vs. Speed + Scalability of In-Memory Database vs In-Memory Data Grid
One of the key distinctions between In-Memory Data Grids and In-Memory Databases is their ability to handle large-scale operations across multiple servers. In-Memory Data Grids have an inherent capability for such scalability, thanks to their MPP architecture. On the other hand, In-Memory Databases struggle with scaling due to the inefficiency of performing SQL joins in a distributed context.
This is a little-known drawback of In-Memory Databases: while they offer SQL joins as a valuable feature, it becomes their Achilles heel when it comes to scalability. This fundamental limitation is why most existing SQL databases (whether disk or memory-based) are designed with vertically scalable SMP (Symmetrical Processing) architecture, unlike In-Memory Data Grids that utilize a more horizontally scalable MPP approach.
It’s important to note that both In-Memory Data Grids and In-Memory Database can achieve similar speed in a local non-distributed context. In the end - they both do all processing in memory.
It's worth noting that both In-Memory Data Grids and In-Memory Databases can achieve similar processing speeds in a local, non-distributed context because they both operate entirely in memory. However, only In-Memory Data Grids have the native ability to scale to hundreds and thousands of nodes, providing unprecedented scalability and unmatched throughput.
Replace Database vs. Change Application
In addition to scalability, there is another important difference to consider when using In-Memory Data Grids or In-Memory Databases to speed up existing systems or applications.
An In-Memory Data Grid operates alongside an existing database and acts as a distributed in-memory storage and processing layer between the database and the application. The application then relies on this layer for extremely fast data access and processing. Most In-Memory Data Grids seamlessly read from and write to databases, if needed, and are generally well-integrated with existing databases.
However, developers need to make some changes to the application to take advantage of these new capabilities. The application can no longer rely solely on SQL, but must also learn how to use techniques like MPP (Massively Parallel Processing), MapReduce, or other data processing methods.
On the other hand, In-Memory Databases provide a contrasting scenario. They often require replacing the existing database, unless you opt for temporary options to boost performance. However, they demand fewer changes to the application itself, as it can still rely on SQL, albeit a modified version of it.
Ultimately, both approaches have their own advantages and disadvantages. The choice between them may depend on organizational policies, politics, and technical considerations.
Conclusion
If you are developing a completely new system or application, the obvious choice is to go for In-Memory Data Grids. This allows you to work with the existing databases in your organization when needed, while also benefiting from the exceptional performance and scalability offered by In-Memory Data Grids. These two aspects are seamlessly integrated, giving you the best of both worlds.
However, if you are modernizing an existing enterprise system or application, the decision becomes more nuanced:
- If you can replace or upgrade your current disk-based RDBMS and cannot make changes to your applications, then opting for an In-Memory Database is the way to go. By replacing or upgrading your RDBMS, you can significantly boost your application's speed without having to extensively modify the application itself. While scalability may not be your primary concern, speed gains are still achievable
- On the other hand, if you are unable to replace your existing disk-based RDBMS but have the flexibility to make changes to the data access subsystem of your application, then choosing an In-Memory Data Grid is recommended. With an In-Memory Data Grid, you can enhance your application's speed and achieve substantial scalability without altering your current database. This option allows you to strike a balance between speed and scalability.
In summary, the decision can be summarized in the following table:
In-Memory Data Grid | In-Memory Database | |
---|---|---|
Existing Application | Changed | Unchanged |
Existing RDBMS | Unchanged | Changed or Replaced |
Speed | Yes | Yes |
Max. Scalability | Yes | No |