Disaster Recovery for System Groups
A GridGain 9 cluster includes two system RAFT groups, both of which are essential for the cluster’s normal operation:
You perform disaster recovery operations on system RAFT groups to recover permanent majority loss. When a system RAFT group loses majority, it becomes unavailable. When CMG is unavailable, the cluster itself remains available with limitations: it can still process most of the operations, but it cannot join new nodes, start/restart existing nodes, and start building new indexes. When MG is unavailable, the cluster becomes unusable; it cannot handle even GET/PUT/SQL requests.
You see that the majority has been lost in cluster logs in the console or in the rotated log files. When a RAFT group becomes unavailable, the logs would show something like
Send with retry timed out [retryCount = 11, groupId = cmg_group].
or
Send with retry timed out [retryCount = 11, groupId = metastorage_group].
.
An indicator that CMG is down is when a node does not start after a restart
command. This is reflected in the log as Local CMG state recovered, starting the CMG
, not followed by Successfully joined the cluster
.
If a node tries to start when CMG is available, but MG is not, the log shows Metastorage info on start
not followed by Performing MetaStorage recovery
.
Cluster Management Group
If CMG loses majority:
-
Restart CMG nodes to restore the lost majority.
-
If the above fails, forcefully assign a new majority using the following CLI command (manually or via REST):
recovery cluster reset --url=<node-url> --cluster-management-group=<new-cmg-nodes>
.The command is sent to the node indicated by the
--url
parameter, which must belong to thenew-cmg-nodes
RAFT group. This node becomes the Repair Conductor, and it initiates thereset
procedure.
The above procedure might fail for the following reasons:
-
Some of the nodes specified in
new-cmg-nodes
are not in the physical topology. -
The Repair Conductor does not have all the information it needs to start the procedure.
© 2024 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.