GridGain Developers Hub

Data Partitioning

Data partitioning is a method of subdividing large sets of data into smaller chunks and distributing them between all server nodes in a balanced manner.

How Data is Partitioned

When the table is created, it is always assigned to a distribution zone. Based on the distribution zone parameters, data is separated into PARTITIONS parts, called partitions, stored REPLICAS times across the cluster. Each partition is identified by a number from a limited set (0 to 24 by default). Each individual copy of a partition is called a replica, and is stored on separate nodes, if possible.

Once partitions and all replicas are created, they are distributed across the available cluster nodes that are included in the distribution zone following the DATA_NODES_FILTER parameter and according to the affinity function. Thus, each key is mapped to a list of nodes owning the corresponding partition and is stored on those nodes. When data is added, it is distributed evenly between partitions.

The nodes store partitions in the folder specified in the ignite.system.partitionsBasePath, or in the work/partitions folder. The nodes also store partition-specific RAFT logs in the ignite.system.partitionsLogPath with the information on RAFT elections and consensus.

Primary Replicas and Leases

Once the partitions are distributed on the nodes, GridGain forms replication groups for all partitions of the table, and each group elects its leader. To linearize writes to partitions, GridGain designates one replica of each partition as a primary replica.

To designate a primary replica, GridGain uses a process of granting a lease. Leases are granted by the lease placement driver, and signify the node that houses the primary replica, called a lease holder. Once the lease is granted, information about it is written to the metastorage, and provided to all nodes in the cluster. Usually, the primary replica will be the same as replication group leader.

Granted leases are valid for a short period of time and are extended every couple of seconds, preserving the continuity of each lease. A lease cannot be revoked until it expires. In exceptional situations (for example, when primary replica is unable to serve as primary anymore, leaseholder node goes offline, replication group is inoperable, etc.) the placement driver waits for the lease to expire and then initiates the negotiation of the new one.

Only the primary replica can handle operations of read-write transactions. Other replicas of the partition can be read from by using read-only transactions.

If a new replica is chosen to receive the lease, it first makes sure it is up-to-date with its replication group by stored data. In scenarios where replication group is no longer operable (for example, a node unexpectedly leaves the cluster and the group loses majority), it follows the disaster recovery procedure, and you may need to reset the partitions manually.

Partition Rebalance

When the cluster size changes, GridGain waits for the timeout specified in the DATA_NODES_AUTO_ADJUST_SCALE_UP or DATA_NODES_AUTO_ADJUST_SCALE_DOWN distribution zone properties, and then redistributes partitions according to affinity function and transfers data to make it up-to-date with the replication group. This process is called data rebalance