GridGain JSON Basics

Vadim Kolodin

Senior Software Engineer

Overview

For users storing data in the JSON format - for example, in NoSQL document databases - GridGain offers a number of new functions that simplify working with JSON data. This tutorial shows you how to populate a database with sample JSON data and use the JSON functions to improve query performance.

Sample Data

The sample JSON data for this tutorial are as follows:

The datasets are are relatively large (~15 Kb minified), with many data rows. You may want to tune the node memory defaults (for example, in /bin/ignite).

Populate the Database

Run a GridGain node.

Connect to the node using the embedded sqlline:

/bin/sqlline.sh --verbose=true -u jdbc:ignite:thin://127.0.0.1:10800

Create a table to hold the the JSON data.
```
CREATE TABLE shows(id INT PRIMARY KEY, data VARCHAR);
```
Do not forget to escape single quotes ' with '' while using SQL.

For the sake of example comparison, produce duplicates and a few unique rows to be included in the query results.

INSERT INTO shows(id, data)
SELECT X, 'put-here-planet-earth-json-sample'
FROM SYSTEM_RANGE(1, 300000);

INSERT INTO shows(id, data)
VALUES (300001, 'put-here-the-blue-planet-json-sample');

Query JSON Data

Retrieve a Great Britain (GB) TV show whose first episode aired before 2005:

SELECT JSON_VALUE(data, '$.name') as name, JSON_VALUE(data, '$.network.name') as network, JSON_QUERY(data, '$.rating') as rating
FROM shows
WHERE TRUE
  AND JSON_VALUE(data, '$.status') = 'Ended'
  AND JSON_VALUE(data, '$.webChannel') is null
  AND ARRAY_CONTAINS(ARRAY['US', 'GB'], JSON_VALUE(data, '$.network.country.code'))
  AND CAST(JSON_VALUE(data, '$.rating.average') AS DOUBLE) > 9
  AND JSON_VALUE(data, '$._embedded.episodes[0].name') = 'Introduction'
  AND JSON_VALUE(data, '$._embedded.episodes[0].airstamp') < '2005-01-01'
;

The query plan uses PUBLIC.SHOWS._SCAN and simply scans the entire table:

 SELECT
    "PUBLIC".JSON_VALUE("__Z0"."DATA", '$.name') AS "__C0_0",
    "PUBLIC".JSON_VALUE("__Z0"."DATA", '$.network.name') AS "__C0_1",
    "PUBLIC".JSON_QUERY("__Z0"."DATA", '$.rating') AS "__C0_2"
FROM "PUBLIC"."SHOWS" "__Z0"
    /* PUBLIC.SHOWS.__SCAN_ */
WHERE ("PUBLI |
| SELECT
    "__C0_0" AS "NAME",
    "__C0_1" AS "NETWORK",
    "__C0_2" AS "RATING"
FROM "PUBLIC"."__T0"
    /* PUBLIC.merge_scan */ |

The query returns the following result:

+-------------------+------------+--------------------+
|       NAME        |   NETWORK  |       RATING       |
+-------------------+------------+--------------------+
| The Blue Planet   | BBC One    | {"average":9.1}    |
+-------------------+------------+--------------------+

1 row selected (23.731 seconds)

Improve Performance with Indexes

Let’s put "JSON" and "performance" in the same sentence by preparing our data and adding indexes.

Our query uses a couple of common fields related to the first episode, hence we can flatten these into separate columns and use them as indexes (read the automation section below):

ALTER TABLE shows ADD COLUMN first_episode_name VARCHAR;
UPDATE shows SET first_episode_name = JSON_VALUE(data, '$._embedded.episodes[0].name');

ALTER TABLE shows ADD COLUMN first_episode_airstamp timestamp;
UPDATE shows SET first_episode_airstamp = JSON_VALUE(data, '$._embedded.episodes[0].airstamp');

CREATE INDEX shows_first_episode_name_airstamp on shows(first_episode_name, first_episode_airstamp);

Now, rewrite the initial query to use the new index:

SELECT JSON_VALUE(data, '$.name') as name, JSON_VALUE(data, '$.network.name') as network, JSON_QUERY(data, '$.rating') as rating
FROM shows
WHERE TRUE
  AND JSON_VALUE(data, '$.status') = 'Ended'
  AND JSON_VALUE(data, '$.webChannel') is null
  AND ARRAY_CONTAINS(ARRAY['US', 'GB'], JSON_VALUE(data, '$.network.country.code'))
  AND CAST(JSON_VALUE(data, '$.rating.average') AS DOUBLE) > 9
  AND first_episode_name = 'Introduction'
  AND first_episode_airstamp < '2005-01-01'
;

Note that the query plan uses the created index - PUBLIC.SHOWS_FIRST_EPISODE_NAME_AIRST:

SELECT
    "PUBLIC".JSON_VALUE("__Z0"."DATA", '$.name') AS "__C0_0",
    "PUBLIC".JSON_VALUE("__Z0"."DATA", '$.network.name') AS "__C0_1",
    "PUBLIC".JSON_QUERY("__Z0"."DATA", '$.rating') AS "__C0_2"
FROM "PUBLIC"."SHOWS" "__Z0"
    /* PUBLIC.SHOWS_FIRST_EPISODE_NAME_AIRST |
| SELECT
    "__C0_0" AS "NAME",
    "__C0_1" AS "NETWORK",
    "__C0_2" AS "RATING"
FROM "PUBLIC"."__T0"
    /* PUBLIC.merge_scan */

The query result is:

+-------------------+------------+--------------------+
|       NAME        |   NETWORK  |       RATING       |
+-------------------+------------+--------------------+
| The Blue Planet   | BBC One    | {"average":9.1}    |
+-------------------+------------+--------------------+

1 row selected (0.007 seconds)

Wasn’t that fast? From 23.731 sec to 0.007 sec - more than x3300 improvement!

Now let’s clean up (optional):

DROP TABLE IF EXISTS shows;

Automation

Flattening using UPDATE is good for this tutorial. To automate this, you can add CacheInterceptor to your CacheConfiguration:

import org.apache.ignite.Ignition;
import org.apache.ignite.cache.CacheInterceptorAdapter;
import org.apache.ignite.cache.query.annotations.QuerySqlField;
import org.apache.ignite.configuration.CacheConfiguration;
import org.gridgain.internal.processors.query.h2.sql.JsonFunctions;
import javax.cache.Cache;
import java.io.Serializable;

public class JsonBasics {

    public static void main(String[] args) {
        CacheConfiguration cacheCfg = new CacheConfiguration("shows");
        // set up cache mode, backup, etc.
        cacheCfg.setIndexedTypes(Integer.class, ShowJson.class);
        cacheCfg.setInterceptor(new ShowJsonFlattenInterceptor());

        IgniteConfiguration cfg = new IgniteConfiguration();
        cfg.setCacheConfiguration(cacheCfg);

        Ignition.start(cfg);
    }
}

class ShowJsonFlattenInterceptor extends CacheInterceptorAdapter<Integer, ShowJson> {

    @Override
    public ShowJson onBeforePut(Cache.Entry<Integer, ShowJson> entry, ShowJson newVal) {
        // This method is called within IgniteCache.put(k, v)
        String json = newVal.getData();
        String firstEpisodeName = JsonFunctions.jsonValue(json, "$._embedded.episodes[0].name");
        String firstEpisodeAirstamp = JsonFunctions.jsonValue(json, "$._embedded.episodes[0].airstamp");

        ShowJson result = new ShowJson(json);
        result.setFirstEpisodeName(firstEpisodeName);
        result.setFirstEpisodeAirstamp(firstEpisodeAirstamp);
        return result;
    }
}

class ShowJson implements Serializable {

    @QuerySqlField
    private String data;

    @QuerySqlField(name = "first_episode_name")
    private String firstEpisodeName;

    @QuerySqlField(name = "first_episode_airstamp")
    private String firstEpisodeAirstamp;

    public ShowJson(String data) {
        this.data = data;
    }
    // getters and setters
}

Conclusion

You have successfully completed the tutorial. You have learned how to query semi-structured JSON data, as well as how to improve the JSON query performance by rearranging the data and adding well-chosen indexes.

© 2024 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.

Last updated on Apr 10, 2024