GridGain JSON Basics
Overview
For users storing data in the JSON format - for example, in NoSQL document databases - GridGain offers a number of new functions that simplify working with JSON data. This tutorial shows you how to populate a database with sample JSON data and use the JSON functions to improve query performance.
Sample Data
The sample JSON data for this tutorial are as follows:
The datasets are are relatively large (~15 Kb minified), with many data rows. You may want to tune the node memory defaults (for example, in /bin/ignite
).
Populate the Database
-
Run a GridGain node.
-
Connect to the node using the embedded sqlline:
/bin/sqlline.sh --verbose=true -u jdbc:ignite:thin://127.0.0.1:10800
-
Create a table to hold the the JSON data.
CREATE TABLE shows(id INT PRIMARY KEY, data VARCHAR);
-
For the sake of example comparison, produce duplicates and a few unique rows to be included in the query results.
INSERT INTO shows(id, data) SELECT X, 'put-here-planet-earth-json-sample' FROM SYSTEM_RANGE(1, 300000); INSERT INTO shows(id, data) VALUES (300001, 'put-here-the-blue-planet-json-sample');
Query JSON Data
Retrieve a Great Britain (GB) TV show whose first episode aired before 2005:
SELECT JSON_VALUE(data, '$.name') as name, JSON_VALUE(data, '$.network.name') as network, JSON_QUERY(data, '$.rating') as rating
FROM shows
WHERE TRUE
AND JSON_VALUE(data, '$.status') = 'Ended'
AND JSON_VALUE(data, '$.webChannel') is null
AND ARRAY_CONTAINS(ARRAY['US', 'GB'], JSON_VALUE(data, '$.network.country.code'))
AND CAST(JSON_VALUE(data, '$.rating.average') AS DOUBLE) > 9
AND JSON_VALUE(data, '$._embedded.episodes[0].name') = 'Introduction'
AND JSON_VALUE(data, '$._embedded.episodes[0].airstamp') < '2005-01-01'
;
The query plan uses PUBLIC.SHOWS._SCAN
and simply scans the entire table:
SELECT
"PUBLIC".JSON_VALUE("__Z0"."DATA", '$.name') AS "__C0_0",
"PUBLIC".JSON_VALUE("__Z0"."DATA", '$.network.name') AS "__C0_1",
"PUBLIC".JSON_QUERY("__Z0"."DATA", '$.rating') AS "__C0_2"
FROM "PUBLIC"."SHOWS" "__Z0"
/* PUBLIC.SHOWS.__SCAN_ */
WHERE ("PUBLI |
| SELECT
"__C0_0" AS "NAME",
"__C0_1" AS "NETWORK",
"__C0_2" AS "RATING"
FROM "PUBLIC"."__T0"
/* PUBLIC.merge_scan */ |
The query returns the following result:
+-------------------+------------+--------------------+ | NAME | NETWORK | RATING | +-------------------+------------+--------------------+ | The Blue Planet | BBC One | {"average":9.1} | +-------------------+------------+--------------------+ 1 row selected (23.731 seconds)
Improve Performance with Indexes
Let’s put "JSON" and "performance" in the same sentence by preparing our data and adding indexes.
Our query uses a couple of common fields related to the first episode, hence we can flatten these into separate columns and use them as indexes (read the automation section below):
ALTER TABLE shows ADD COLUMN first_episode_name VARCHAR;
UPDATE shows SET first_episode_name = JSON_VALUE(data, '$._embedded.episodes[0].name');
ALTER TABLE shows ADD COLUMN first_episode_airstamp timestamp;
UPDATE shows SET first_episode_airstamp = JSON_VALUE(data, '$._embedded.episodes[0].airstamp');
CREATE INDEX shows_first_episode_name_airstamp on shows(first_episode_name, first_episode_airstamp);
Now, rewrite the initial query to use the new index:
SELECT JSON_VALUE(data, '$.name') as name, JSON_VALUE(data, '$.network.name') as network, JSON_QUERY(data, '$.rating') as rating
FROM shows
WHERE TRUE
AND JSON_VALUE(data, '$.status') = 'Ended'
AND JSON_VALUE(data, '$.webChannel') is null
AND ARRAY_CONTAINS(ARRAY['US', 'GB'], JSON_VALUE(data, '$.network.country.code'))
AND CAST(JSON_VALUE(data, '$.rating.average') AS DOUBLE) > 9
AND first_episode_name = 'Introduction'
AND first_episode_airstamp < '2005-01-01'
;
Note that the query plan uses the created index - PUBLIC.SHOWS_FIRST_EPISODE_NAME_AIRST
:
SELECT
"PUBLIC".JSON_VALUE("__Z0"."DATA", '$.name') AS "__C0_0",
"PUBLIC".JSON_VALUE("__Z0"."DATA", '$.network.name') AS "__C0_1",
"PUBLIC".JSON_QUERY("__Z0"."DATA", '$.rating') AS "__C0_2"
FROM "PUBLIC"."SHOWS" "__Z0"
/* PUBLIC.SHOWS_FIRST_EPISODE_NAME_AIRST |
| SELECT
"__C0_0" AS "NAME",
"__C0_1" AS "NETWORK",
"__C0_2" AS "RATING"
FROM "PUBLIC"."__T0"
/* PUBLIC.merge_scan */
The query result is:
+-------------------+------------+--------------------+ | NAME | NETWORK | RATING | +-------------------+------------+--------------------+ | The Blue Planet | BBC One | {"average":9.1} | +-------------------+------------+--------------------+ 1 row selected (0.007 seconds)
Wasn’t that fast? From 23.731 sec to 0.007 sec - more than x3300 improvement!
Now let’s clean up (optional):
DROP TABLE IF EXISTS shows;
Automation
Flattening using UPDATE
is good for this tutorial. To automate this, you can add CacheInterceptor
to your CacheConfiguration
:
import org.apache.ignite.Ignition;
import org.apache.ignite.cache.CacheInterceptorAdapter;
import org.apache.ignite.cache.query.annotations.QuerySqlField;
import org.apache.ignite.configuration.CacheConfiguration;
import org.gridgain.internal.processors.query.h2.sql.JsonFunctions;
import javax.cache.Cache;
import java.io.Serializable;
public class JsonBasics {
public static void main(String[] args) {
CacheConfiguration cacheCfg = new CacheConfiguration("shows");
// set up cache mode, backup, etc.
cacheCfg.setIndexedTypes(Integer.class, ShowJson.class);
cacheCfg.setInterceptor(new ShowJsonFlattenInterceptor());
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setCacheConfiguration(cacheCfg);
Ignition.start(cfg);
}
}
class ShowJsonFlattenInterceptor extends CacheInterceptorAdapter<Integer, ShowJson> {
@Override
public ShowJson onBeforePut(Cache.Entry<Integer, ShowJson> entry, ShowJson newVal) {
// This method is called within IgniteCache.put(k, v)
String json = newVal.getData();
String firstEpisodeName = JsonFunctions.jsonValue(json, "$._embedded.episodes[0].name");
String firstEpisodeAirstamp = JsonFunctions.jsonValue(json, "$._embedded.episodes[0].airstamp");
ShowJson result = new ShowJson(json);
result.setFirstEpisodeName(firstEpisodeName);
result.setFirstEpisodeAirstamp(firstEpisodeAirstamp);
return result;
}
}
class ShowJson implements Serializable {
@QuerySqlField
private String data;
@QuerySqlField(name = "first_episode_name")
private String firstEpisodeName;
@QuerySqlField(name = "first_episode_airstamp")
private String firstEpisodeAirstamp;
public ShowJson(String data) {
this.data = data;
}
// getters and setters
}
Conclusion
You have successfully completed the tutorial. You have learned how to query semi-structured JSON data, as well as how to improve the JSON query performance by rearranging the data and adding well-chosen indexes.
© 2024 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.