Data Manipulation Language (DML)
This page includes all data manipulation language (DML) commands supported by GridGain.
SELECT
Retrieve data from a table or multiple tables.
SELECT
[TOP term] [DISTINCT | ALL] selectExpression [,...]
FROM tableExpression [,...] [WHERE expression]
[GROUP BY expression [,...]] [HAVING expression]
[{UNION [ALL] | MINUS | EXCEPT | INTERSECT} select]
[ORDER BY order [,...]]
[{ LIMIT expression [OFFSET expression]
[SAMPLE_SIZE rowCountInt]} | {[OFFSET expression {ROW | ROWS}]
[{FETCH {FIRST | NEXT} expression {ROW | ROWS} ONLY}]}]
Parameters
-
DISTINCT
- removes duplicate rows from a result set. -
GROUP BY
- groups the result by the given expression(s). -
HAVING
- filters rows after grouping. -
ORDER BY
- sorts the result by the given column(s) or expression(s). -
LIMIT and FETCH FIRST/NEXT ROW(S) ONLY
- limits the number of rows returned by the query (no limit if null or smaller than zero). -
OFFSET
- specifies how many rows to skip. -
UNION, INTERSECT, MINUS, EXPECT
- combines the result of this query with the results of another query. -
tableExpression
- Joins a table. The join expression is not supported for cross and natural joins. A natural join is an inner join, where the condition is automatically on the columns with the same name.
tableExpression = [[LEFT | RIGHT]{OUTER}] | INNER | CROSS | NATURAL]
JOIN tableExpression
[ON expression]
-
LEFT
- LEFT JOIN performs a join starting with the first (left-most) table and then any matching second (right-most) table records. -
RIGHT
- RIGHT JOIN performs a join starting with the second (right-most) table and then any matching first (left-most) table records. -
OUTER
- Outer joins subdivide further into left outer joins, right outer joins, and full outer joins, depending on which table’s rows are retained (left, right, or both). -
INNER
- An inner join requires each row in the two joined tables to have matching column values. -
CROSS
- CROSS JOIN returns the Cartesian product of rows from tables in the join. -
NATURAL
- The natural join is a special case of equi-join. -
ON
- Value or condition to join on.
Description
SELECT
queries can be executed against both replicated and partitioned data.
When queries are executed against fully replicated data, GridGain will send a query to a single cluster node and run it over the local data there.
On the other hand, if a query is executed over partitioned data, then the execution flow will be the following:
-
The query will be parsed and split into multiple map queries and a single reduce query.
-
All the map queries are executed on all the nodes where required data resides.
-
All the nodes provide result sets of local execution to the query initiator (reducer) that, in turn, will accomplish the reduce phase by properly merging provided result sets.
JOINs
GridGain supports colocated and non-colocated distributed SQL joins. Furthermore, if the data resides in different tables (aka. caches in GridGain), GridGain allows for cross-table joins as well.
Joins between partitioned and replicated data sets always work without any limitations.
However, if you join partitioned data sets, then you have to make sure that the keys you are joining on are either colocated or make sure you switched on the non-colocated joins parameter for a query.
Refer to Distributed Joins page for more details.
Group By and Order By Optimizations
SQL queries with ORDER BY
clauses do not require loading the whole result set to a query initiator (reducer) node in order to complete the sorting. Instead, every node to which a query will be mapped will sort its own part of the overall result set and the reducer will do the merge in a streaming fashion.
The same optimization is implemented for sorted GROUP BY
queries - there is no need to load the whole result set to the reducer in order to do the grouping before giving it to an application. In GridGain, partial result sets from the individual nodes can be streamed, merged, aggregated, and returned to the application gradually.
Examples
Retrieve all rows from the Person
table:
SELECT * FROM Person;
Get all rows in alphabetical order:
SELECT * FROM Person ORDER BY name;
Calculate the number of Persons
from a specific city:
SELECT city_id, COUNT(*) FROM Person GROUP BY city_id;
Join data stored in the Person
and City
tables:
SELECT p.name, c.name
FROM Person p, City c
WHERE p.city_id = c.id;
INSERT
Inserts data into a table.
INSERT INTO tableName
{[( columnName [,...])]
{VALUES {({DEFAULT | expression} [,...])} [,...] | [DIRECT] [SORTED] select}}
| {SET {columnName = {DEFAULT | expression}} [,...]}
Parameters
-
tableName
- name of the table to be updated. -
columnName
- name of a column to be initialized with a value from the VALUES clause.
Description
INSERT
adds an entry or entries into a table (aka. cache in GridGain).
Since GridGain stores all the data in a form of key-value pairs, all the INSERT
statements are finally transformed into a set of key-value operations.
If a single key-value pair is being added into a cache then, eventually, an INSERT
statement will be converted into a cache.putIfAbsent(…)
operation. In other cases, when multiple key-value pairs are inserted, the DML engine creates an EntryProcessor
for each pair and uses cache.invokeAll(…)
to propagate the data into a cache.
Examples
Insert a new Person into the table:
INSERT INTO Person (id, name, city_id) VALUES (1, 'John Doe', 3);
Fill in Person table with the data retrieved from Account table:
INSERT INTO Person(id, name, city_id)
(SELECT a.id + 1000, concat(a.firstName, a.secondName), a.city_id
FROM Account a WHERE a.id > 100 AND a.id < 1000);
UPDATE
Update data in a table.
UPDATE tableName [[AS] newTableAlias]
SET {{columnName = {DEFAULT | expression}} [,...]} |
{(columnName [,...]) = (select)}
[WHERE expression][LIMIT expression]
Parameters
-
table
- the name of the table to be updated. -
columnName
- the name of a column to be updated with a value from aSET
clause.
Description
UPDATE
alters existing entries stored in a table.
Since GridGain stores all the data in a form of key-value pairs, all the UPDATE
statements are finally transformed into a set of key-value operations.
Initially, the SQL engine generates and executes a SELECT
query based on the UPDATE WHERE
clause and only after that does it modify the existing values that satisfy the clause result.
The modification is performed via a cache.invokeAll(…)
operation. This means that once the result of the SELECT
query is ready, the SQL engine will prepare a number of EntryProcessors
and will execute all of them using a cache.invokeAll(…)
operation. While the data is being modified using EntryProcessors
, additional checks are performed to make sure that nobody has interfered between the SELECT
and the actual update.
Primary Keys Updates
Ignite does not allow updating a primary key because the latter defines a partition the key and its value belong to statically. While the partition with all its data can change several cluster owners, the key always belongs to a single partition. The partition is calculated using a hash function applied to the key’s value.
Thus, if a key needs to be updated it has to be removed and then inserted.
Examples
Update the name
column of an entry:
UPDATE Person SET name = 'John Black' WHERE id = 2;
Update the Person
table with the data taken from the Account
table:
UPDATE Person p SET name = (SELECT a.first_name FROM Account a WHERE a.id = p.id)
WITH
Used to name a sub-query, can be referenced in other parts of the SQL statement.
WITH { name [( columnName [,...] )] AS ( select ) [,...] }
{ select | insert | update | merge | delete | createTable }
Parameters
-
query_name
- the name of the sub-query to be created. The name assigned to the sub-query is treated as though it was an inline view or table.
Description
WITH
creates a sub-query. One or more common table entries can be referred to by name. Column name declarations are optional - the column names will be inferred from the named select queries. The final action in a WITH statement can be a select
, insert
, update
, merge
, delete
, or create table
.
Example
WITH cte1 AS (
SELECT 1 AS FIRST_COLUMN
), cte2 AS (
SELECT FIRST_COLUMN+1 AS FIRST_COLUMN FROM cte1
)
SELECT sum(FIRST_COLUMN) FROM cte2;
MERGE
Merge data into a table.
MERGE INTO tableName [(columnName [,...])]
[KEY (columnName [,...])]
{VALUES {({ DEFAULT | expression } [,...])} [,...] | select}
Parameters
-
tableName
- the name of the table to be updated. -
columnName
- the name of a column to be initialized with a value from aVALUES
clause.
Description
MERGE
updates existing entries and inserts new entries.
Because GridGain stores all the data in a form of key-value pairs, all the MERGE
statements are transformed into a set of key-value operations.
MERGE
is one of the most straightforward operations because it is translated into cache.put(…)
and cache.putAll(…)
operations depending on the number of rows that need to be inserted or updated as part of the MERGE
query.
Examples
Merge some rows into the Person
table:
MERGE INTO Person(id, name, city_id) VALUES
(1, 'John Smith', 5),
(2, 'Mary Jones', 5);
Fill in the Person
table with the data retrieved from the Account
table:
MERGE INTO Person(id, name, city_id)
(SELECT a.id + 1000, concat(a.firstName, a.secondName), a.city_id
FROM Account a WHERE a.id > 100 AND a.id < 1000);
DELETE
Delete data from a table.
DELETE
[TOP term] FROM tableName
[WHERE expression]
[LIMIT term]
Parameters
-
tableName
- the name of the table to delete data from. -
TOP, LIMIT
- specifies the number of entries to be deleted (no limit if null or smaller than zero).
Description
DELETE
removes data from a table (aka. cache in GridGain).
Because GridGain stores all the data in a form of key-value pairs, all the DELETE
statements are transformed into a set of key-value operations.
A DELETE
statements' execution is split into two phases and is similar to the execution of UPDATE
statements.
First, using a SELECT
query, the SQL engine gathers those keys that satisfy the WHERE
clause in the DELETE
statement. Next, after having all those keys in place, it creates a number of EntryProcessors
and executes them with cache.invokeAll(…)
. While the data is being deleted, additional checks are performed to make sure that nobody has interfered between the SELECT
and the actual removal of the data.
Examples
Delete all the Persons
with a specific name:
DELETE FROM Person WHERE name = 'John Doe';
© 2025 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.