GridGain Developers Hub

Operational Commands

COPY INTO

Description

Imports data from an external source into a table or exports data to a file from the table. The table the data is imported into must exist.

Diagram( Terminal('COPY'), Terminal('FROM'), NonTerminal('source'), Terminal('INTO'), NonTerminal('target'), Terminal('FORMAT'), NonTerminal('formatType'), Optional( Sequence( Terminal('WITH'), OneOrMore (NonTerminal('parameter', {href:'./grammar-reference/#parameter'}), Terminal(',')) ) ) )

Parameters

  • source - the full path to the file to import the data data from (if target is a table). Then name of the table and the list of columns to export the data from (if target is a file). If the source is a table, you can further narrow down the selection by using a query.

  • target - the table and table columns when importing data to the table. The path to the file when exporting data.

  • formatType - the format of the file to work with:

    • CSV

    • PARQUET

    • ICEBERG

  • WITH - accepts additional parameters. All properties are case-sensitive:

    • delimiter - for CSV files only; default delimiter is ,. Delimiter syntax is 'char'. Any alphanumeric character can be a delimiter.

    • pattern - the file pattern used when importing partitioned Parquet tables in the regular expression format. The regular expression must be enclosed in ' signs. For example, '.*' imports all files. Partitioned column will not be imported.

    • header - for CSV files only; if set to true, specifies that the created file should contain a header line with column names. Column names of the table are used to create the header line.

    • S3:

      • s3.client-region - when using S3 storage, the region to store the data in.

      • s3.access-key-id - when using S3 storage, the AWS access key.

      • s3.secret-access-key - when using S3 storage, the AWS secret key.

      • Iceberg-specific:

        • The catalog properties are supported.

        • The table-identifier property describes Apache Iceberg TableIdentifier names. The names can be dot-separated. For example, db_name.table_name or table.

        • The warehouse path can be defined explicitly as a 'warehouse'='path' property, or implicitly as a source or target COPY FROM source INTO target. If both ways are defined, the explicit property is used.

Examples

Import data from columns name and age of a CSV file with header into Table1 columns name and age:

/* Import data from CSV with column headers */
COPY FROM '/path/to/dir/data.csv'
INTO Table1 (name, age)
FORMAT CSV

Import data from the first two columns of a CSV file without header into Table1 columns name and age:

/* Import data from CSV without column headers  */
COPY FROM '/path/to/dir/data.csv'
INTO Table1 (name, age)

Exports data from Table1 to a CSV file:

/* Export data to CSV */
COPY FROM (SELECT name, age FROM Table1)
INTO  '/path/to/dir/data.csv'
FORMAT CSV

Imports CSV file from AWS S3 into Table1:

/* Import CSV file from s3 */
COPY FROM 's3://mybucket/data.csv'
INTO Table1 (name, age)
FORMAT CSV
WITH 'delimiter'= '|', 's3.access-key-id' = 'keyid', 's3.secret-access-key' = 'secretkey'

A simple example of exporting data to Iceberg. For working with local file system you can use HadoopCatalog that does not need to connect to a Hive MetaStore.

COPY FROM Table1 (id,name,height)
INTO '/tmp/person.i/'
FORMAT ICEBERG
WITH 'table-identifier'='person', 'catalog-impl'='org.apache.iceberg.hadoop.HadoopCatalog'

Exports data into Iceberg on AWS S3:

COPY FROM person (id,name,height)
INTO 's3://iceberg-warehouse/glue-catalog'
FORMAT ICEBERG
WITH
    'table-identifier'='iceberg_db_1.person',
    'io-impl'='org.apache.iceberg.aws.s3.S3FileIO',
    'catalog-impl'='org.apache.iceberg.aws.glue.GlueCatalog',
    's3.client-region'='eu-central-1',
    's3.access-key-id'='YOUR_KEY',
    's3.secret-access-key'='YOUR_SECRET'

Imports data from partitioned Parquet database:

COPY FROM '/tmp/partitioned_table_dir'
INTO city (id, name, population)
FORMAT PARQUET
WITH 'pattern' = '.*'

Where the Parquet table looks like this:

partitioned_table_dir/
├─ CountryCode=USA/
│  ├─ 000000_0.parquet
├─ CountryCode=FR/
│  ├─ 000000_0.parquet

KILL QUERY

Cancels a running query. When a query is canceled with the KILL command, all parts of the query running on all other nodes are canceled too.

Diagram( Terminal('KILL'), Terminal('QUERY'), NonTerminal('query_id'), Optional(Terminal('NO WAIT')) )

Parameters

  • query_id - query identifier that can be retrieved via the SQL_QUERIES system view.

  • NO WAIT - if specified, the command will return control immediately, without waiting for the query to be cancelled. You can monitor query status through the SQL_QUERIES system view to make sure it was cancelled.

KILL TRANSACTION

Cancels an active transaction.

Diagram( Terminal('KILL'), Terminal('TRANSACTION'), NonTerminal('transaction_id'), Optional(Terminal('NO WAIT')) )

Parameters

  • transaction_id - transaction identifier that can be retrieved via the TRANSACTIONS system view.

  • NO WAIT - if specified, the command will return control immediately, without waiting for the query to be cancelled. You can monitor transaction status through the TRANSACTIONS system view to make sure it was cancelled.

KILL COMPUTE

Cancels a running query. When a query is canceled with the KILL command, all parts of the query running on all other nodes are canceled too.

Diagram( Terminal('KILL'), Terminal('COMPUTE'), NonTerminal('job_id'), Optional(Terminal('NO WAIT')) )

Parameters

  • job_id - query identifier that can be retrieved via the COMPUTE_JOBS system view.

  • NO WAIT - if specified, the command will return control immediately, without waiting for the query to be cancelled. You can monitor job status through the COMPUTE_JOBS system view to make sure it was cancelled.