How In-Memory Computing Works
In-memory computing is about two things: making computing faster and scaling it to potentially support petabytes of in-memory data. In-memory computing leverages two key technologies: random-access memory (RAM) storage and parallelization.
Speed: RAM Storage
The first key is that in-memory computing takes the data from your disk drives and moves it into RAM. The hard drive is by far the slowest part of your server. A typical hard drive is literally a spinning disk, like an old- fashioned turntable. It has many moving parts, and it spins in a vacuum where the arm of the turntable physically scans across the disk to read your data. In addition, moving data from your disk to RAM for processing is time consuming, which adds more delays to the speed at which you can process data. Meanwhile, RAM is the second-fastest component in your server. Only the processor is faster.
With RAM, there are no moving parts. Memory is just a chip. In physical terms, an electrical signal reads the information stored in RAM. It works at the speed of electricity, which is the speed of light. When you move data from a disk to RAM storage, your computer runs anywhere from five thousand to a million times faster.
The human mind has a hard time grasping that kind of speed. We are talking about nanoseconds, milliseconds, and microseconds. A good analogy is that traditional computing is like a banana slug crawling through your garden at 0.007 miles per hour while in-memory computing is like an F-18 fighter jet traveling at 1,190 miles per hour, or twice the speed of sound. In other words, disk drives are really, really slow. And when you copy all of your data from disk and put it into RAM, computing becomes really, really fast.
You can look at it like a chef in a restaurant. The chef needs ingredients to cook his meals: that's your data. The ingredients might be in the chef's refrigerator or they might be ten miles down the road at the grocery store. The refrigerator is like RAM storage: The chef can instantly access the ingredients he needs. When he's done with the ingredients and the meal is finished, he puts the leftovers back in the refrigerator, all at the same time. The grocery store is like disk storage. The chef has to drive to the store to get the ingredients he needs. Worse, he has to pick them up one at a time. If he needs cheese, garlic, and pasta, he has to make one trip to the grocery store for the cheese, bring it back, and use it. Then he has to go through the whole process again for the garlic and the pasta. If that isn't enough, he has to drive the leftover ingredients back to the grocery store again, one by one, right after he's done using each of them.
But that's not all. Assume you could make a disk drive that was as fast as RAM, similar to flash drives. The system that traditional computing uses to look for the information on a hard disk - processor to RAM to controller to disk - would still make it much slower than in-memory computing.
To return to our example, let's say there are two chefs: one representing in-memory computing and the other traditional computing. The chef representing in-memory computing has his refrigerator right next to him and he also knows exactly where everything is on the shelves. Meanwhile, the chef representing traditional computing doesn't know where any of the ingredients are in the grocery store. He has to walk down all of the aisles until he finds the cheese. Then he has to walk down the same aisles again for the garlic, then the pasta, and so on. That's the difference in efficiency between RAM and disk storage.
RAM versus Flash
Flash storage was created to replace a disk drive. When it's used for that purpose, it is also called a solid-state device, or SSD. SSDs are made of silicon and are five to ten times faster than disk drives. However, both flash storage and disk drives are attached to the same controller in your computer. Even when you use flash, you still have to go through the same process of reading and writing from a disk. The processor goes to RAM, RAM goes to the controller, and the controller retrieves the information from the disk.
Flash accesses the information faster than disk, but it still uses the same slow process to get the data to the processor. Moreover, because of the inherent limitation in flash's physical design, it has a finite number of reads and writes before it needs to be replaced. Modern RAM, on the other hand, has unlimited life and takes up less space than flash. Flash may be five to ten times faster than a standard disk drive but RAM is up to a million times faster than the disk. Combined with the other benefits, there's no comparison.
Scale: Parallelization
RAM handles the speed of in-memory computing. But the scalability of the technology comes from parallelization. Parallelization came about in the early 2000s to solve a different problem: the inadequacy of 32-bit processors. By 2012, most servers had switched to 64-bit processors which can handle a lot more data. But in 2003, 32-bit processors were common and they were very limited. They couldn't manage more than four gigabytes of RAM memory at a time. Even if you put more RAM on the computer, the 32-bit processor couldn't see it. But the demand for more RAM storage was growing anyway.
The solution was to put data into RAM across a lot of different computers. Once it was broken down like this, a processor could address it. The cluster of computers looked like it was one application running on one computer with lots of RAM. You split up the data and the tasks, you use the collective RAM for storage, and you use all the computers for processing. That was how you handled a heavy load in the 32-bit world and it was called parallelization or massively parallel processing (MPP).
When 64-bit processors were released, they could handle more or less an unlimited amount of RAM. Parallelization was no longer necessary for its original use. But in-memory computing saw a different way to take advantage of it: scalability.
Even though 64-bit processors could handle a lot more data, it was still impossible for a single computer to support a billion users. But when you distributed the processing load across many computers, that kind of support was possible. Better, if the number of users increased, all you had to do was add a few more computers to grow with them.
Picture a row of six computers. You could have thousands of computers but we'll use six for this example. These computers are connected through a network, so we call them a cluster. Now imagine you have an application that will draw a lot of traffic, too much traffic to store all of the data on one computer. With parallelization, you take your application and break its data into pieces. Then you put one piece of it in computer 1, another piece in computer 2, and so on until the data is distributed optimally across the cluster. Your single application runs on the whole cluster of computers. When the cluster gets a request for data, it knows where that data is and processes the information in RAM from there. The data doesn't move around the way it does in traditional computing.
Even better, you can replicate specific parts of your data on different computers in the same cluster. In our example, let's say the data on computer 6 is in high demand. You can add another computer to the cluster that carries the same data. That way, not only can you handle things faster, but if computer 6 goes down, the extra one just takes over and carries on as usual.
If you tried to scale up like this with a single computer, it would get more and more expensive. At the end of the day, it would still slow you down. With parallelization, in-memory computing allows you to scale to demand linearly and without limits.
Let's return to the chef analogy, where a computer processor is a chef and memory storage is the chef's stove. A customer comes in and orders an appetizer. The chef cooks the appetizer on his one stove right away and the customer is happy.
Now what happens when 20 customers order appetizers? The one chef with his one stove can't handle it. That 20th customer is going to wait three hours to get her appetizer. The solution is to bring in more chefs with more stoves, all of them trained to cook the appetizer the same way. The more customers you get, the more chefs and stoves you bring into the picture so that no one has to wait. And if one stove breaks, it's no big deal: plenty of other stoves in the kitchen can take its place.
The Internet has created a level of scale that would have been unheard of just 15 or 20 years ago. Parallelization gives in-memory computing the power to scale to fit the world.