Hadoop® Accelerator

The GridGain In-Memory Accelerator for Hadoop® is a purpose-built product developed on top of the GridGain In-Memory Computing Platform; it is a plug-and-play solution optimized for in-memory processing that can be downloaded and installed in 10 minutes, and accelerates MapReduce and HIVE jobs by a factor of up to 10 times. It offers the industry’s first dual-mode, high-performance in-memory file system that is 100% compatible with Hadoop HDFS. In-memory HDFS, in-memory MapReduce and in-memory HIVE provide an easy to use extension to disk-based HDFS and traditional Hadoop environments, delivering orders of magnitude better performance.

The In-Memory Accelerator for Hadoop requires zero code change and works with any commercial or open source distribution of Hadoop 2.x.

Features

The GridGain In-Memory Accelerator for Hadoop enhances existing Hadoop technology to enable fast data processing using the tools and technology your organization is already using today. It was designed to eliminate the trade-offs when adding real time capabilities to existing Hadoop systems.

GridGain’s in-memory file system (GGFS) supports a dual-mode that allows it to work as either a standalone primary file system in the Hadoop cluster, or in tandem with HDFS, serving as an intelligent caching layer with HDFS configured as the primary file system. As a caching layer it provides highly tunable read-through and write-through logic and users can freely select which files or directories to be cached and how.

GridGain’s in-memory implementation of MapReduce allows users to effectively parallelize the processing of in-memory data stored in GGFS. It requires zero code change to the users’ Hadoop MapReduce code, and eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.

GridGain MapReduce

The ETL-free architecture of GridGain’s In-Memory Hadoop Accelerator enables companies to process live data in Hadoop without the need to offload it to other downstream systems to gain the in-memory computing performance advantage. GridGain’s In-Memory Accelerator for Hadoop avoids duplication of data and eliminates unnecessary data movement that typically clogs the network and I/O subsystems.

The GridGain In-Memory Accelerator for Hadoop comes with a comprehensive GUI-based management and monitoring tool (GridGain Visor).

Performance

The GridGain In-Memory Accelerator for Hadoop delivers up to 10x performance gain for any existing IO, network or CPU-intensive Hadoop MapReduce job and/or HDFS operation, delivering instant acceleration to existing Hadoop-based systems and products.

The GridGain File System (GGFS) is 100% compatible with HDFS and can act as a drop-in replacement or extension for HDFS. Benchmarks below compare raw GGFS and HDFS performance against the same set of operations:

Benchmark GGFS, ms. HDFS, ms. Boost, %
File Scan 27 667 2470%
File Create 96 961 1001%
File Random Access 413 2931 710%
File Delete 185 1234 667%

The above tests were performed on a 10-node cluster of Dell R610 blades with Dual 8-core CPUs, running Ubuntu 12.4 OS, 10GBE network fabric and stock unmodified Apache Hadoop 2.x distribution.

The GridGain In-Memory Accelerator for Hadoop relies on an in-memory optimized YARN-based MapReduce implementation that eliminates standard overhead associated with typical Hadoop’s job tracker polling, task tracker process creation, deployment and provisioning.

Plug-and-Play

The In-Memory Accelerator for Hadoop provides out-of-the-box support for Hadoop 2.x (YARN) and can be downloaded and installed in less than 10 minutes. It is a completely free, first-of-its-kind in-memory Hadoop plugin that works with your choice of open source or commercial Hadoop distribution.

The GridGain in-memory file system (GGFS) is 100% HDFS-compatible and provides plug-and-play integration that works out-of-the-box with hundreds of projects in the Hadoop ecosystem. In-memory MapReduce and HIVE support require zero code change for existing application logic and provide dramatic performance boosts for CPU-intensive tasks.