Configuring Replication
This chapter explains how to configure your GridGain clusters for data replication by using the CLI tool.
Preparing for Replication
By default, data center replication assumes that the clusters share the schema and the replica cluster database is empty. If there is data in replica cluster’s tables already, it will not be overwritten, instead new data will be added to the table.
Data replication pulls updates from a remote cluster, called the source cluster. The cluster the data is copied to is called replica cluster.
Connecting to Source Server
To connect to a source cluster, use the dcr create
command:
dcr create --name replication_name --source-cluster-address=127.0.0.1:10800
The address of the source cluster should be the address of any node in that cluster.
This command creates a connection, but does not start replication on its own. The name of the replication will be used later to address it in the commands.
Starting Replication
Active-Passive Replication
To start replication, use the dcr start
command:
dcr start replication_name --schema=PUBLIC --all
The cluster will start the data replication process from the cluster running on the 127.0.0.1
address, and will copy all tables from it.
When the replication is first started, the source cluster will perform a special operation called full state transfer, fully replicating the data from the source cluster to the replica cluster.
Afterwards, updates to the source cluster will be copied to the replica cluster.
You can configure more specific replication by limiting the scope or nodes that are allowed to participate as described in the Replication Configuration section.
Active-Active Replication
The only difference between starting active-passive and active-active replication is that you need to configure the both clusters to replicate data to each other.
The example below assumes that you have 2 nodes on addresses 127.0.0.1
and 127.0.0.2
that belong to different clusters. Then, it configures replication on both clusters targeting each other.
connect http://127.0.0.1:10300
dcr create --name replication_name --source-cluster-address=127.0.0.2:10800
dcr start replication_name --schema=PUBLIC --all
disconnect
connect http://127.0.0.2:10300
dcr create --name replication_name --source-cluster-address=127.0.0.1:10800
dcr start replication_name --schema=PUBLIC --all
Conflict Resolution
If the data was updated on both clusters before the data was replicated, the update with a later timestamp will be kept.
One-Time Replication
In some scenarios, it is not necessary to copy data dynamically. Instead, you can flush
data once. When running the flush
command, you specify the replication time in ISO format. All data up to that point will be replicated, and then the replication will stop automatically.
dcr flush replication_name --flush-point=2024-10-01T12:00:00+01:00
Replication Configuration
Limiting Replicating Nodes
By default, all nodes in the cluster are involved in the replication process. Sending data to the replica cluster can increase load on nodes, and slow down your application. To avoid this, you can narrow down the list of nodes that send data by listing the consistent IDs of all nodes involved in replication in the replication-nodes
.
dcr start replication_name --schema=PUBLIC --replication-nodes=defaultNode,otherNode --all
Limiting Replication Scope
Replication is tracked on a per-table basis, so you can replicate data only from some tables in the cluster. To do this, use the tables
option instead of all
.
dcr start replication_name --schema=PUBLIC --tables=myTable1,myTable2
You can add more tables to the replication process after it is started by starting the replication for them:
dcr start replication_name --schema=PUBLIC --tables=myTable3,myTable4
You can stop replication of only specific tables as well:
dcr stop replication_name --tables=myTable1
In this case, the replication will continue for all other tables.
Replication on Secured Clusters
If your cluster has security enabled, you need to provide user credentials to the dcr create
command to establish secure connection and authorization. The user needs to have a role with permissions that allow them to read and write to the tables that are being replicated.
dcr create replication_auth --source-cluster-address=127.0.0.1:10800 --username admin --password myPass
dcr start replication_auth --schema=PUBLIC --all
If your cluster is further secured by using SSL, you need to provide keystore and truststore that will be used to connect to it in the dcr create
command:
dcr create replication_ssl --source-cluster-address=127.0.0.1:10800 --keyStorePath=path-to-keystore --keyStorePassword=myPass --trustStorePath=path-to-truststore --truestStorePassword=myPass
dcr start replication_ssl --schema=PUBLIC --all
Checking Replication Status
You can list the currently existing replications by using the list
command:
dcr list
This command lists all existing replications and their statuses. You can get more in-depth information on each replication by using the status
command:
dcr status replication_name
Ending Replication
To stop replication, first stop the replication process for all tables:
dcr stop replication_name --all
This will stop the source cluster from sending updates, but will not delete the replication connection. Replication can be resumed if necessary.
To completely remove the replications, use the delete
command:
dcr delete replication_name
This will permanently delete the replication configuration. If replication is recreated, it will need to first synchronize the clusters, instead of resuming where it stopped before.
© 2024 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.