Generic Metrics
This page describes generic GridGain metrics - a new system that is designed to replace the legacy metrics system. Generic metrics provide a cleaner decoupling between metric collecting/storing and viewing/exporting. This page offers the basics of the new metrics system and explains how you can use it to monitor your cluster.
There are different types of metrics in GridGain. Each metric has a name and a return value. The return value can be a simple value like string
, long
, or double
. Alternatively, teh value can represent a Java object. Some metrics represent Histograms.
There are different ways to export metrics - so called Metric Exporters.
Metric Registers
Metrics are grouped into categories called registers. Each register has a name. The full name of a specific metric within the register consists of the register name followed by a dot, followed by the name of the metric: <register_name>.<metric_name>
.
For example, the register for data storage metrics is called io.datastorage
. The metric that return the storage size is called io.datastorage.StorageSize
.
Metric Exporters
An exporter provides a mechanism for accessing all the available metrics.
GridGain includes the following exporters:
-
JMX (default)
You can create a custom exporter by implementing the MetricExporterSpi interface.
If you want to enable metrics, configure one or multiple metric exporters in the node configuration. This is a node-specific configuration, which means it enables metrics only on the node where it is specified.
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="metricExporterSpi">
<list>
<bean class="org.apache.ignite.spi.metric.jmx.JmxMetricExporterSpi"/>
<bean class="org.apache.ignite.spi.metric.log.LogExporterSpi"/>
<bean class="org.apache.ignite.spi.metric.opencensus.OpenCensusMetricExporterSpi"/>
</list>
</property>
</bean>
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setMetricExporterSpi(new JmxMetricExporterSpi());
Ignite ignite = Ignition.start(cfg);
This API is not presently available for C#/.NET. You can use XML configuration.
This API is not presently available for C++. You can use XML configuration.
The following sections describe the exporters available in Ignite by default.
JMX
org.apache.ignite.spi.metric.jmx.JmxMetricExporterSpi
exposes metrics via JMX beans.
IgniteConfiguration cfg = new IgniteConfiguration();
JmxMetricExporterSpi jmxExporter = new JmxMetricExporterSpi();
//export cache metrics only
jmxExporter.setExportFilter(mreg -> mreg.name().startsWith("cache."));
cfg.setMetricExporterSpi(jmxExporter);
This API is not presently available for C++.
SQL View
SqlViewMetricExporterSpi
is enabled by default, SqlViewMetricExporterSpi
exposes metrics via the SYS.METRICS
view.
Each metric is displayed as a single record.
You can use any supported SQL tool to view the metrics:
> select name, value from SYS.METRICS where name LIKE 'cache.myCache.%';
+-----------------------------------+--------------------------------+
| NAME | VALUE |
+-----------------------------------+--------------------------------+
| cache.myCache.CacheTxRollbacks | 0 |
| cache.myCache.OffHeapRemovals | 0 |
| cache.myCache.QueryCompleted | 0 |
| cache.myCache.QueryFailed | 0 |
| cache.myCache.EstimatedRebalancingKeys | 0 |
| cache.myCache.CacheEvictions | 0 |
| cache.myCache.CommitTime | [J@2eb66498 |
....
Log
org.apache.ignite.spi.metric.log.LogExporterSpi
prints the metrics to the log file at regular intervals (1 min by default) at INFO level.
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:util="http://www.springframework.org/schema/util" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util.xsd">
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="metricExporterSpi">
<list>
<bean class="org.apache.ignite.spi.metric.log.LogExporterSpi"/>
</list>
</property>
</bean>
</beans>
If you use programmatic configuration, you can change the print frequency as follows:
IgniteConfiguration cfg = new IgniteConfiguration();
LogExporterSpi logExporter = new LogExporterSpi();
logExporter.setPeriod(600_000);
//export cache metrics only
logExporter.setExportFilter(mreg -> mreg.name().startsWith("cache."));
cfg.setMetricExporterSpi(logExporter);
Ignite ignite = Ignition.start(cfg);
This API is not presently available for C#/.NET. You can use XML configuration.
This API is not presently available for C++. You can use XML configuration.
OpenCensus
org.apache.ignite.spi.metric.opencensus.OpenCensusMetricExporterSpi
adds integration with the OpenCensus library.
To use the OpenCensus exporter:
-
Add
org.apache.ignite.spi.metric.opencensus.OpenCensusMetricExporterSpi
to the list of exporters in the node configuration. -
Configure OpenCensus StatsCollector to export to a specific system. See OpenCensusMetricsExporterExample.java for an example and OpenCensus documentation for additional information.
Configuration parameters:
-
filter
- predicate that filters metrics. -
period
- export period. -
sendInstanceName
- if enabled, a tag with the Ignite instance name is added to each metric. -
sendNodeId
- if enabled, a tag with the Ignite node id is added to each metric. -
sendConsistentId
- if enabled, a tag with the Ignite node consistent id is added to each metric.
Available Metrics
System
System metrics such as JVM or CPU metrics.
Register name: sys
Name | Type | Description |
---|---|---|
CpuLoad |
double |
CPU load. |
CurrentThreadCpuTime |
long |
ThreadMXBean.getCurrentThreadCpuTime() |
CurrentThreadUserTime |
long |
ThreadMXBean.getCurrentThreadUserTime() |
DaemonThreadCount |
integer |
ThreadMXBean.getDaemonThreadCount() |
GcCpuLoad |
double |
GC CPU load. |
PeakThreadCount |
integer |
ThreadMXBean.getPeakThreadCount |
SystemLoadAverage |
java.lang.Double |
OperatingSystemMXBean.getSystemLoadAverage() |
ThreadCount |
integer |
ThreadMXBean.getThreadCount |
TotalExecutedTasks |
long |
Total executed tasks. |
TotalStartedThreadCount |
long |
ThreadMXBean.getTotalStartedThreadCount |
UpTime |
long |
RuntimeMxBean.getUptime() |
memory.heap.committed |
long |
MemoryUsage.getHeapMemoryUsage().getCommitted() |
memory.heap.init |
long |
MemoryUsage.getHeapMemoryUsage().getInit() |
memory.heap.used |
long |
MemoryUsage.getHeapMemoryUsage().getUsed() |
memory.nonheap.committed |
long |
MemoryUsage.getNonHeapMemoryUsage().getCommitted() |
memory.nonheap.init |
long |
MemoryUsage.getNonHeapMemoryUsage().getInit() |
memory.nonheap.max |
long |
MemoryUsage.getNonHeapMemoryUsage().getMax() |
memory.nonheap.used |
long |
MemoryUsage.getNonHeapMemoryUsage().getUsed() |
Caches
Cache metrics.
Register name: cache.{cache_name}.{near}
Name | Type | Description |
---|---|---|
CacheEvictions |
long |
The total number of evictions from the cache. |
CacheGets |
long |
The total number of gets to the cache. |
CacheHits |
long |
The number of get requests that were satisfied by the cache. |
CacheMisses |
long |
A miss is a get request that is not satisfied. |
CachePuts |
long |
The total number of puts to the cache. |
CacheRemovals |
long |
The total number of removals from the cache. |
CacheTxCommits |
long |
Total number of transaction commits. |
CacheTxRollbacks |
long |
Total number of transaction rollbacks. |
CacheSize |
long |
Local cache size. |
CommitTime |
histogram |
Commit time in nanoseconds. |
CommitTimeTotal |
long |
The total time of commit, in nanoseconds. |
EntryProcessorHits |
long |
The total number of invocations on keys, which exist in cache. |
EntryProcessorInvokeTimeNanos |
long |
The total time of cache invocations for which this node is the initiator, in nanoseconds. |
EntryProcessorMaxInvocationTime |
long |
So far, the maximum time to execute cache invokes for which this node is the initiator. |
EntryProcessorMinInvocationTime |
long |
So far, the minimum time to execute cache invokes for which this node is the initiator. |
EntryProcessorMisses |
long |
The total number of invocations on keys, which don’t exist in cache. |
EntryProcessorPuts |
long |
The total number of cache invocations, caused update. |
EntryProcessorReadOnlyInvocations |
long |
The total number of cache invocations, caused no updates. |
EntryProcessorRemovals |
long |
The total number of cache invocations, caused removals. |
EstimatedRebalancingKeys |
long |
Number estimated to rebalance keys. |
GetAllTime |
histogram |
GetAll time for which this node is the initiator, in nanoseconds. |
GetTime |
histogram |
Get time for which this node is the initiator, in nanoseconds. |
GetTimeTotal |
long |
The total time of cache gets for which this node is the initiator, in nanoseconds. |
HeapEntriesCount |
long |
Onheap entries count. |
IndexRebuildKeysProcessed |
long |
The number of keys with rebuilt indexes. |
IsIndexRebuildInProgress |
boolean |
True if index build or rebuild is in progress. |
OffHeapBackupEntriesCount |
long |
Offheap backup entries count. |
OffHeapEntriesCount |
long |
Offheap entries count. |
OffHeapEvictions |
long |
The total number of evictions from the off-heap memory. |
OffHeapGets |
long |
The total number of get requests to the off-heap memory. |
OffHeapHits |
long |
The number of get requests that were satisfied by the off-heap memory. |
OffHeapMisses |
long |
A miss is a get request that is not satisfied by off-heap memory. |
OffHeapPrimaryEntriesCount |
long |
Offheap primary entries count. |
OffHeapPuts |
long |
The total number of put requests to the off-heap memory. |
OffHeapRemovals |
long |
The total number of removals from the off-heap memory. |
PutAllTime |
histogram |
PutAll time for which this node is the initiator, in nanoseconds. |
PutTime |
histogram |
Put time for which this node is the initiator, in nanoseconds. |
PutTimeTotal |
long |
The total time of cache puts for which this node is the initiator, in nanoseconds. |
QueryCompleted |
long |
Count of completed queries. |
QueryExecuted |
long |
Count of executed queries. |
QueryFailed |
long |
Count of failed queries. |
QueryMaximumTime |
long |
Maximum query execution time. |
QueryMinimalTime |
long |
Minimum query execution time. |
QuerySumTime |
long |
Query summary time. |
RebalanceClearingPartitionsLeft |
long |
Number of partitions need to be cleared before actual rebalance start. |
RebalanceStartTime |
long |
Rebalance start time. |
RebalancedKeys |
long |
Number of already rebalanced keys. |
RebalancingBytesRate |
long |
Estimated rebalancing speed in bytes. |
RebalancingKeysRate |
long |
Estimated rebalancing speed in keys. |
RemoveAllTime |
histogram |
RemoveAll time for which this node is the initiator, in nanoseconds. |
RemoveTime |
histogram |
Remove time for which this node is the initiator. in nanoseconds. |
RemoveTimeTotal |
long |
The total time of cache removal, in nanoseconds. |
RollbackTime |
histogram |
Rollback time in nanoseconds. |
RollbackTimeTotal |
long |
The total time of rollback, in nanoseconds. |
TotalRebalancedBytes |
long |
Number of already rebalanced bytes. |
getCacheTouches |
long |
The total number of touch() requests to the cache. Equal to the sum of hits and misses. |
getCacheTouchHits |
long |
The number of touch() requests that were satisfied by the cache, i.e., "hits." |
getCacheTouchMisses |
long |
The number of touch() requests that were not satisfied by the cache, i.e., "misses," either because the requested key was not found in the cache or TTL value was not changed. |
getCacheTouchHitPercentage |
float |
The percentage of cache touch() requests that were satisfied by the cache. Calculated as getCacheTouchHits divided by getCacheTouches multiplied by 100. |
getCacheTouchMissPercentage |
float |
The percentage of cache touch() requests that were not satisfied by the cache. Calculated as getCacheTouchMisses divided by getCacheTouches multiplied by 100. |
Cache Groups
Register name: cacheGroups.{group_name}
Name | Type | Description |
---|---|---|
AffinityPartitionsAssignmentMap |
java.util.Map |
Affinity partitions assignment map. |
Caches |
java.util.ArrayList |
List of caches |
IndexBuildCountPartitionsLeft |
long |
Number of partitions need processed for finished indexes create or rebuilding. |
LocalNodeMovingPartitionsCount |
integer |
Count of partitions with state MOVING for this cache group located on this node. |
LocalNodeOwningPartitionsCount |
integer |
Count of partitions with state OWNING for this cache group located on this node. |
LocalNodeRentingEntriesCount |
long |
Count of entries remains to evict in RENTING partitions located on this node for this cache group. |
LocalNodeRentingPartitionsCount |
integer |
Count of partitions with state RENTING for this cache group located on this node. |
MaximumNumberOfPartitionCopies |
integer |
Maximum number of partition copies for all partitions of this cache group. |
MinimumNumberOfPartitionCopies |
integer |
Minimum number of partition copies for all partitions of this cache group. |
MovingPartitionsAllocationMap |
java.util.Map |
Allocation map of partitions with state MOVING in the cluster. |
OwningPartitionsAllocationMap |
java.util.Map |
Allocation map of partitions with state OWNING in the cluster. |
PartitionIds |
java.util.ArrayList |
Local partition ids. |
SparseStorageSize |
long |
Storage space allocated for group adjusted for possible sparsity, in bytes. |
StorageSize |
long |
Storage space allocated for group, in bytes. |
TotalAllocatedPages |
long |
Cache group total allocated pages. |
TotalAllocatedSize |
long |
Total size of memory allocated for group, in bytes. |
Transactions
Transaction metrics.
Register name: tx
Name | Type | Description |
---|---|---|
AllOwnerTransactions |
java.util.HashMap |
Map of local node owning transactions. |
LockedKeysNumber |
long |
The number of keys locked on the node. |
OwnerTransactionsNumber |
long |
The number of active transactions for which this node is the initiator. |
TransactionsHoldingLockNumber |
long |
The number of active transactions holding at least one key lock. |
LastCommitTime |
long |
Last commit time. |
nodeSystemTimeHistogram |
histogram |
Transactions system times on node represented as histogram. |
nodeUserTimeHistogram |
histogram |
Transactions user times on node represented as histogram. |
LastRollbackTime |
long |
Last rollback time. |
totalNodeSystemTime |
long |
Total transactions system time on node. |
totalNodeUserTime |
long |
Total transactions user time on node. |
txCommits |
integer |
Number of transaction commits. |
txRollbacks |
integer |
Number of transaction rollbacks. |
Partition Map Exchange
Partition map exchange metrics.
Register name: pme
Name | Type | Description |
---|---|---|
CacheOperationsBlockedDuration |
long |
Current PME cache operations blocked duration in milliseconds. |
CacheOperationsBlockedDurationHistogram |
histogram |
Histogram of cache operations blocked PME durations in milliseconds. |
Duration |
long |
Current PME duration in milliseconds. |
DurationHistogram |
histogram |
Histogram of PME durations in milliseconds. |
Compute Jobs
Register name: compute.jobs
Name | Type | Description |
---|---|---|
compute.jobs.Active |
long |
Number of active jobs currently executing. |
compute.jobs.Canceled |
long |
Number of cancelled jobs that are still running. |
compute.jobs.ExecutionTime |
long |
Total execution time of jobs. |
compute.jobs.Finished |
long |
Number of finished jobs. |
compute.jobs.Rejected |
long |
Number of jobs rejected after more recent collision resolution operation. |
compute.jobs.Started |
long |
Number of started jobs. |
compute.jobs.Waiting |
long |
Number of currently queued jobs waiting to be executed. |
compute.jobs.WaitingTime |
long |
Total time jobs spent on waiting queue. |
Thread Pools
Register name: threadPools.{thread_pool_name}
Name | Type | Description |
---|---|---|
ActiveCount |
long |
Approximate number of threads that are actively executing tasks. |
CompletedTaskCount |
long |
Approximate total number of tasks that have completed execution. |
CorePoolSize |
long |
The core number of threads. |
KeepAliveTime |
long |
Thread keep-alive time, which is the amount of time which threads in excess of the core pool size may remain idle before being terminated. |
LargestPoolSize |
long |
Largest number of threads that have ever simultaneously been in the pool. |
MaximumPoolSize |
long |
The maximum allowed number of threads. |
PoolSize |
long |
Current number of threads in the pool. |
QueueSize |
long |
Current size of the execution queue. |
RejectedExecutionHandlerClass |
string |
Class name of current rejection handler. |
Shutdown |
boolean |
True if this executor has been shut down. |
TaskCount |
long |
Approximate total number of tasks that have been scheduled for execution. |
Terminated |
boolean |
True if all tasks have completed following shut down. |
Terminating |
long |
True if terminating but not yet terminated. |
ThreadFactoryClass |
string |
Class name of thread factory used to create new threads. |
Cache Group IO
Register name: io.statistics.cacheGroups.{group_name}
Name | Type | Description |
---|---|---|
LOGICAL_READS |
long |
Number of logical reads |
PHYSICAL_READS |
long |
Number of physical reads |
grpId |
integer |
Group id |
name |
string |
Name of the index |
startTime |
long |
Statistics collect start time |
Sorted Indexes
Register name: io.statistics.sortedIndexes.{cache_name}.{index_name}
Name | Type | Description |
---|---|---|
LOGICAL_READS_INNER |
long |
Number of logical reads for inner tree node |
LOGICAL_READS_LEAF |
long |
Number of logical reads for leaf tree node |
PHYSICAL_READS_INNER |
long |
Number of physical reads for inner tree node |
PHYSICAL_READS_LEAF |
long |
Number of physical reads for leaf tree node |
indexName |
string |
Name of the index |
name |
string |
Name of the cache |
startTime |
long |
Statistics collection start time |
Hash Indexes
Register name: io.statistics.hashIndexes.{cache_name}.{index_name}
Name | Type | Description |
---|---|---|
LOGICAL_READS_INNER |
long |
Number of logical reads for inner tree node |
LOGICAL_READS_LEAF |
long |
Number of logical reads for leaf tree node |
PHYSICAL_READS_INNER |
long |
Number of physical reads for inner tree node |
PHYSICAL_READS_LEAF |
long |
Number of physical reads for leaf tree node |
indexName |
string |
Name of the index |
name |
string |
Name of the cache |
startTime |
long |
Statistics collection start time |
Communication IO
Register name: io.communication
Name | Type | Description |
---|---|---|
ActiveSessionsCount |
integer |
Active TCP sessions count. |
OutboundMessagesQueueSize |
integer |
Outbound messages queue size. |
SentMessagesCount |
integer |
Sent messages count. |
SentBytesCount |
long |
Sent bytes count. |
ReceivedBytesCount |
long |
Received bytes count. |
ReceivedMessagesCount |
integer |
Received messages count. |
RejectedSslSessionsCount |
integer |
TCP sessions count that were rejected due to the SSL errors (metric is exported only if SSL is enabled). |
SslEnabled |
boolean |
Indicates whether SSL is enabled. |
SslHandshakeDurationHistogram |
histogram |
Histogram of SSL handshake duration in milliseconds (metric is exported only if SSL is enabled). |
Ignite Thin Client Connector
Register name: client.connector
Name | Type | Description |
---|---|---|
ActiveSessionsCount |
integer |
Active TCP sessions count. |
ReceivedBytesCount |
long |
Received bytes count. |
RejectedSslSessionsCount |
integer |
TCP sessions count that were rejected due to the SSL errors (metric is exported only if SSL is enabled). |
RejectedSessionsTimeout |
integer |
TCP sessions count that were rejected due to handshake timeout. |
RejectedSessionsAuthenticationFailed |
integer |
TCP sessions count that were rejected due to failed authentication. |
RejectedSessionsTotal |
integer |
Total number of rejected TCP connections. |
{clientType}.AcceptedSessions |
integer |
Number of successfully established sessions for the client type. |
{clientType}.ActiveSessions |
integer |
Number of active sessions for the client type. |
SentBytesCount |
long |
Sent bytes count. |
SslEnabled |
boolean |
Indicates whether SSL is enabled. |
SslHandshakeDurationHistogram |
histogram |
Histogram of SSL handshake duration in milliseconds (metric is exported only if SSL is enabled). |
Ignite REST Client Connector
Register name: rest.client
Name | Type | Description |
---|---|---|
ActiveSessionsCount |
integer |
Active TCP sessions count. |
ReceivedBytesCount |
long |
Received bytes count. |
RejectedSslSessionsCount |
integer |
TCP sessions count that were rejected due to the SSL errors (metric is exported only if SSL is enabled). |
SentBytesCount |
long |
Sent bytes count. |
SslEnabled |
boolean |
Indicates whether SSL is enabled. |
SslHandshakeDurationHistogram |
histogram |
Histogram of SSL handshake duration in milliseconds (metric is exported only if SSL is enabled). |
Discovery IO
Register name: io.discovery
Name | Type | Description |
---|---|---|
CoordinatorSince |
long |
Timestamp since which the local node became the coordinator (metric is exported only from server nodes). |
Coordinator |
UUID |
Coordinator ID (metric is exported only from server nodes). |
CurrentTopologyVersion |
long |
Current topology version. |
JoinedNodes |
integer |
Joined nodes count. |
LeftNodes |
integer |
Left nodes count. |
MessageWorkerQueueSize |
integer |
Current message worker queue size. |
PendingMessagesRegistered |
integer |
Pending registered messages count. |
RejectedSslConnectionsCount |
integer |
TCP discovery connections count that were rejected due to the SSL errors. |
SslEnabled |
boolean |
Indicates whether SSL is enabled. |
TotalProcessedMessages |
integer |
Total processed messages count. |
TotalReceivedMessages |
integer |
Total received messages count. |
Data Region IO
Register name: io.dataregion.{data_region_name}
Name | Type | Description |
---|---|---|
AllocationRate |
long |
Allocation rate (pages per second) averaged across rateTimeInternal. |
CheckpointBufferSize |
long |
Checkpoint buffer size in bytes. |
DirtyPages |
long |
Number of pages in memory not yet synchronized with persistent storage. |
EmptyDataPages |
long |
Calculates empty data pages count for region. It counts only totally free pages that can be reused (e. g. pages that are contained in reuse bucket of free list). |
EvictionRate |
long |
Eviction rate (pages per second). |
LargeEntriesPagesCount |
long |
Count of pages that fully ocupied by large entries that go beyond page size |
OffHeapSize |
long |
Offheap size in bytes. |
OffheapUsedSize |
long |
Offheap used size in bytes. |
PagesFillFactor |
double |
The percentage of the used space. |
PagesRead |
long |
Number of pages read from last restart. |
PagesReplaceAge |
long |
Average age at which pages in memory are replaced with pages from persistent storage (milliseconds). |
PagesReplaceRate |
long |
Rate at which pages in memory are replaced with pages from persistent storage (pages per second). |
PagesReplaced |
long |
Number of pages replaced from last restart. |
PagesWritten |
long |
Number of pages written from last restart. |
PhysicalMemoryPages |
long |
Number of pages residing in physical RAM. |
PhysicalMemorySize |
long |
Gets total size of pages loaded to the RAM, in bytes |
TotalAllocatedPages |
long |
Total number of allocated pages. |
TotalAllocatedSize |
long |
Gets a total size of memory allocated in the data region, in bytes |
TotalThrottlingTime |
long |
Total throttling threads time in milliseconds. The Ignite throttles threads that generate dirty pages during the ongoing checkpoint. |
UsedCheckpointBufferSize |
long |
Gets used checkpoint buffer size in bytes |
Data Storage
Data Storage metrics.
Register name: io.datastorage
Name | Type | Description |
---|---|---|
CheckpointBeforeLockHistogram |
histogram |
Histogram of checkpoint action before taken write lock duration in milliseconds. |
CheckpointFsyncHistogram |
histogram |
Histogram of checkpoint fsync duration in milliseconds. |
CheckpointHistogram |
histogram |
Histogram of checkpoint duration in milliseconds. |
CheckpointListenersExecuteHistogram |
histogram |
Histogram of checkpoint execution listeners under write lock duration in milliseconds. |
CheckpointLockHoldHistogram |
histogram |
Histogram of checkpoint lock hold duration in milliseconds. |
CheckpointLockWaitHistogram |
histogram |
Histogram of checkpoint lock wait duration in milliseconds. |
CheckpointMarkHistogram |
histogram |
Histogram of checkpoint mark duration in milliseconds. |
CheckpointPagesWriteHistogram |
histogram |
Histogram of checkpoint pages write duration in milliseconds. |
CheckpointSplitAndSortPagesHistogram |
histogram |
Histogram of splitting and sorting checkpoint pages duration in milliseconds. |
CheckpointTotalTime |
long |
Total duration of checkpoint |
CheckpointWalRecordFsyncHistogram |
histogram |
Histogram of the WAL fsync after logging ChTotalNodeseckpointRecord on begin of checkpoint duration in milliseconds. |
CheckpointWriteEntryHistogram |
histogram |
Histogram of entry buffer writing to file duration in milliseconds. |
LastCheckpointBeforeLockDuration |
long |
Duration of the checkpoint action before taken write lock in milliseconds. |
LastCheckpointCopiedOnWritePagesNumber |
long |
Number of pages copied to a temporary checkpoint buffer during the last checkpoint. |
LastCheckpointDataPagesNumber |
long |
Total number of data pages written during the last checkpoint. |
LastCheckpointDuration |
long |
Duration of the last checkpoint in milliseconds. |
LastCheckpointFsyncDuration |
long |
Duration of the sync phase of the last checkpoint in milliseconds. |
LastCheckpointListenersExecuteDuration |
long |
Duration of the checkpoint execution listeners under write lock in milliseconds. |
LastCheckpointLockHoldDuration |
long |
Duration of the checkpoint lock hold in milliseconds. |
LastCheckpointLockWaitDuration |
long |
Duration of the checkpoint lock wait in milliseconds. |
LastCheckpointMarkDuration |
long |
Duration of the checkpoint mark in milliseconds. |
LastCheckpointPagesWriteDuration |
long |
Duration of the checkpoint pages write in milliseconds. |
LastCheckpointTotalPagesNumber |
long |
Total number of pages written during the last checkpoint. |
LastCheckpointSplitAndSortPagesDuration |
long |
Duration of splitting and sorting checkpoint pages of the last checkpoint in milliseconds. |
LastCheckpointStart |
long |
Start timestamp of the last checkpoint. |
LastCheckpointWalRecordFsyncDuration |
long |
Duration of the WAL fsync after logging CheckpointRecord on the start of the last checkpoint in milliseconds. |
LastCheckpointWriteEntryDuration |
long |
Duration of entry buffer writing to file of the last checkpoint in milliseconds. |
SparseStorageSize |
long |
Storage space allocated adjusted for possible sparsity, in bytes. |
StorageSize |
long |
Storage space allocated, in bytes. |
WalArchiveSegments |
integer |
Current number of WAL segments in the WAL archive. |
WalBuffPollSpinsRate |
long |
WAL buffer poll spins number over the last time interval. |
WalFsyncTimeDuration |
long |
Total duration of fsync |
WalFsyncTimeNum |
long |
Total count of fsync |
WalLastRollOverTime |
long |
Time of the last WAL segment rollover. |
WalLoggingRate |
long |
Average number of WAL records per second written during the last time interval. |
WalTotalSize |
long |
Total size in bytes for storage wal files. |
WalWritingRate |
long |
Average number of bytes per second written during the last time interval. |
Cluster
Cluster metrics.
Register name: cluster
Name | Type | Description |
---|---|---|
ActiveBaselineNodes |
integer |
Active baseline nodes count. |
Rebalanced |
boolean |
True if the cluster has fully achieved rebalanced state. Note that an inactive cluster always has this metric in False regardless of the real partitions state. |
TotalBaselineNodes |
integer |
Total baseline nodes count. |
TotalClientNodes |
integer |
Client nodes count. |
TotalServerNodes |
integer |
Server nodes count. |
SQL
SQL metrics.
Memory Quotas
Register name: sql.memory.quotas
Name | Type | Description |
---|---|---|
OffloadedQueriesNumber |
number |
Number of queries that were offloaded to disk locally |
OffloadingRead |
bytes |
Number of bytes read from the disk during SQL query offloading |
OffloadingWritten |
bytes |
Number of bytes written to the disk during SQL query offloading |
freeMem |
bytes |
Amount of memory left available for the queries on this node, in bytes (negative value if SQL memory quotas are disabled) |
maxMem |
bytes |
Total amount of memory available for all queries on the current node (negative value if SQL memory quotas are disabled) |
requests |
number |
Total number of times memory quota has been requested on the current node by all the queries |
Parser Cache
Register name: sql.parser.cache
Name | Type | Description |
---|---|---|
hits |
number |
Number of hits for queries cache |
misses |
number |
Number of misses for queries cache |
User Queries
Register name: sql.parser.cache
canceled | number | Number of canceled queries initiated by the current node. This number is included in the general 'failed' metric. |
---|---|---|
failed |
number |
Number of failed queries (including OOME) initiated by the current node |
failedByOOM |
number |
Number of queries failed due to out of memory protection initiated by the current node. This number is included in the general 'failed' metric. |
success |
number |
Number of successfully executed queries initiated by the current node |
Throttling
Throttling metrics for the Write operation. Speed-based throttling protects the checkpoint buffer and the clean pages in the region. The checkpoint buffer needs a stronger protection because an overflow of this buffer makes a node crash. When performing a Write operation, we first check whether the checkpoint buffer is in danger. If it is, we employ the exponential backoff algorithm to protect the buffer: each subsequent "sleep" time is K times longer than the previous one. If the checkpoint buffer is not in danger, we calculate the "sleep" time using the speed-based algorithm to protect the pool of clean pages.
Register name: io.dataregion
Name | Type | Description |
---|---|---|
SpeedBasedThrottlingPercentage |
double |
Fraction of throttling time within average marking time (e.g., "quarter" = 0.25). |
MarkDirtySpeed |
long |
Speed of marking pages dirty, in pages/second. Value is averaged over the last 3 fragments, 0.25 sec each, plus the current fragment, 0-0.25 sec (0.75-1.0 sec total). |
CpWriteSpeed |
long |
Checkpoint write speed, in pages/second. Value is averaged over the last 3 checkpoints plus the current one. |
LastEstimatedSpeedForMarkAll |
long |
Last estimated speed of marking all clean pages dirty to the end of a checkpoint, in pages/second. |
CurrDirtyRatio |
double |
Current ratio of dirty pages (dirty vs total), expressed as a fraction. The fraction is computed for each segment in the current region, and the highest value becomes "current." |
TargetDirtyRatio |
double |
Ratio of dirty pages (dirty vs total), expressed as a fraction. Throttling starts when this ratio is reached. |
ThrottleParkTime |
long |
Park (sleep) time for the Write operation, in nanoseconds. Value is averaged over the last 3 fragments, 0.25 sec each, plus the current fragment, 0-0.25 sec (0.75-1.0 sec total). It defines park periods for either the checkpoint buffer protection or the clean page pool protection. |
CpTotalPages |
int |
Number of pages in the current checkpoint. |
CpEvictedPages |
int |
Number of evicted pages in the current checkpoint. |
CpWrittenPages |
int |
Number of written pages in the current checkpoint. |
CpSyncedPages |
int |
Number of fsynced pages in the current checkpoint. |
CheckpointBufferPagesCount |
int |
Number of occupied pages in the checkpoint buffer. |
CheckpointBufferPagesSize |
int |
Total number of pages in the checkpoint buffer. |
Histograms
Metrics that represent histograms are available in the JMX exporter only. Histogram metrics are exported as a set of values where each value corresponds to a specific bucket, and is available through a separate JMX bean attribute. The attribute names of a histogram metric have the following format:
{metric_name}_{low_bound}_{high_bound}
where
-
{metric_name}
- the name of the metric -
{low_bound}
- start of the bound,0
for the first bound -
{high_bound}
- end of the bound,inf
for the last bound
Examples of the metric names if the bounds are [10,100]:
-
histogram_0_10 - less than 10
-
histogram_10_100 - between 10 and 100
-
histogram_100_inf - more than 100
© 2025 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.