GridGain Developers Hub

Cluster Lifecycle

This topic covers the information covering GridGain 9 cluster initialization and lifecycle.

Node Start Without a Running Cluster

When nodes start, they check all addresses listed in the node finder configuration (network.nodeFinder).

All nodes found are added to the physical topology - the nodes that know each other. All nodes also share information about other nodes, so the topology will include all nodes whose address is listed in at least one other node. If there are nodes that are completely separated (for example, nodes A and B only know about each other, same with C and D), they will form separate physical topologies (in this example, you would have cluster with A and B, and another cluster with C and D).

If there is no running cluster, the nodes exchange information about the network, but do not start any processes until the cluster initialization command is received, nor do they form a logical topology - the nodes that are verified and form a cluster.

Node Requirements

All nodes in cluster must have similar time, that can be different by no more than schemaSync.maxClockSkew. This is necessary for correct transaction operation.

As network latency can be unpredictable, some requests may take so long to arrive that the time will be different on the receiving node by the time request arrives. To account for the delay, set the schemaSync.delayDuration property to the time that is long enough for the schema updates to be delivered to all nodes in the cluster. However, this delay also affects how long it takes for DDL to be executed, as all nodes need to wait for the delay to pass before applying the update.

Cluster Initialization

When the cluster init command is received by any node in the cluster, it starts the initialization process.

First, the nodes specified in the --cluster-management-group argument form a RAFT group and take the role of cluster management group (CMG) - a group of nodes that handles cluster operation. If --cluster-management-group is not specified, the nodes listed in the --metastorage-group are used.

Then, the nodes specified in the --metastorage-group argument form a RAFT group and take the role of metastorage group. These nodes will hold the authoritative copy of cluster meta information.

Once the 2 raft groups are started and elect their leaders, all other nodes in the topology are notified that the cluster is started, and they can join it. At this point, the cluster is considered initialized and can start receiving requests.

Each non-leader node receives the invitation from the CMG to join the cluster and forms a validation request. Then, the request is sent to the CMG, and, after validation, the node receives cluster meta information from the metastorage group and joins the cluster.

The nodes are also added to the cluster logical topology - the nodes that are verified and accepted by CMG as part of the cluster. When nodes shut down or leave the physical topology for any other reason, the cluster logical topology is immediately adjusted.

Cluster Management Group

Cluster management group stores information about the cluster, the list of nodes that are in the cluster, and handles all cluster logical topology changes. Due to using RAFT consensus algorithm, the CMG improves the protection from split-brain (as any cluster group losing the CMG majority will no longer be fully functional).

It is recommended to have the CMG of 3, 5 or 7 nodes. Larger management group improves stability, as it reduces the odds of losing the majority of CMG nodes, but may cause a minor performance hit.

Losing the majority of CMG nodes leaves the cluster mostly functional. The cluster without the CMG majority can still handle transactions and user requests, but cannot:

  • Add new nodes to logical topology.

  • Re-add nodes that left the cluster to logical topology.

  • Create new table indexes. In this scenario, CREATE INDEX DDL operation will never be fully resolved and will hang the application.

To restore full cluster functionality, bring the offline members of CMG back online.

The CMG stores the following information:

  • Current cluster state, including what nodes are in CMG and metastorage groups, what GridGain version is used and cluster tag.

  • Consistent IDs of all nodes in the logical topology.

  • Node validation status.

Cluster Metastorage Group

Cluster metastorage group stores information about the data stored in the cluster, and handles data distribution.

It is recommended to have the metastorage of 3, 5 or 7 nodes. Larger management group improves stability, as it reduces the odds of losing the majority of metastorage nodes, but may cause a minor performance hit.

Losing the majority of metastorage nodes will turn the cluster inoperable and may lead to data loss.

The metastorage contains the following information:

  • Cluster catalog - the single storage of all meta information about the cluster - table schemas, indexes, views, distribution zone information, etc.

  • Logical topology history.

  • Other data required for cluster operation.

Node Join Scenarios

New Nodes Joining the Cluster

When a new node is started, it adds itself to the physical topology. Then, the CMG receives the event that a new node has joined the topology, and sends it an invitation to join the cluster. Once the node receives it, it sends the validation request with node information, which the CMG verifies and adds the node to the logical topology.

Node Rejoins the Cluster

If the node leaves the physical topology (for example, because the machine with the node is unreachable), the cluster logical topology is immediately adjusted, and the node is excluded from it. It can no longer rejoin the cluster with the same node ID.

To rejoin the cluster, the node must be restarted. During the restart, a new ID will be generated and the node will be able to join the physical and logical topology.

When a node reappears in the physical topology, the CMG sends it an invitation to join. The node then asks the CMG to validate itself, and, if this is successful, it starts its components (doing local recovery on the way), after which it tells the CMG that it’s ready to join. The CMG then adds it to the logical topology. This is the same process as the first join of a blank node.