GridGain Developers Hub

Operational Commands

COPY INTO

Description

Imports data from an external source into a table or exports data to a file from the table. The table the data is imported into must exist.

Diagram( Terminal('COPY'), Terminal('FROM'), NonTerminal('source'), Terminal('INTO'), NonTerminal('target'), Terminal('FORMAT'), NonTerminal('formatType'), Optional( Sequence( Terminal('WITH'), OneOrMore (NonTerminal('parameter', {href:'./grammar-reference/#parameter'}), Terminal(',')) ) ) )

Parameters

  • source - the full path to the file to import the data data from (if target is a table). Then name of the table and the list of columns to export the data from (if target is a file). If the source is a table, you can further narrow down the selection by using a query.

  • target - the table and table columns when importing data to the table. The path to the file when exporting data.

  • formatType - the format of the file to work with:

    • CSV

    • PARQUET

    • ICEBERG

  • WITH - accepts additional parameters. All properties are case-sensitive:

    • delimiter - for CSV files only; default delimiter is ,. Delimiter syntax is 'char'. Any alphanumeric character can be a delimiter.

    • pattern - the file pattern used when importing partitioned Parquet tables in the regular expression format. The regular expression must be enclosed in ' signs. For example, '.*' imports all files. Partitioned column will not be imported.

    • header - for CSV files only; if set to true, specifies that the created file should contain a header line with column names. Column names of the table are used to create the header line.

    • S3:

      • s3.client-region - when using S3 storage, the region to store the data in.

      • s3.access-key-id - when using S3 storage, the AWS access key.

      • s3.secret-access-key - when using S3 storage, the AWS secret key.

      • Iceberg-specific:

        • The catalog properties are supported.

        • The table-identifier property describes Apache Iceberg TableIdentifier names. The names can be dot-separated. For example, db_name.table_name or table.

        • The warehouse path can be defined explicitly as a 'warehouse'='path' property, or implicitly as a source or target COPY FROM source INTO target. If both ways are defined, the explicit property is used.

Examples

Import data from columns name and age of a CSV file with header into Table1 columns name and age:

/* Import data from CSV with column headers */
COPY FROM '/path/to/dir/data.csv'
INTO Table1 (name, age)
FORMAT CSV

Import data from the first two columns of a CSV file without header into Table1 columns name and age:

/* Import data from CSV without column headers  */
COPY FROM '/path/to/dir/data.csv'
INTO Table1 (name, age)

Exports data from Table1 to a CSV file:

/* Export data to CSV */
COPY FROM (SELECT name, age FROM Table1)
INTO  '/path/to/dir/data.csv'
FORMAT CSV

Imports CSV file from AWS S3 into Table1:

/* Import CSV file from s3 */
COPY FROM 's3://mybucket/data.csv'
INTO Table1 (name, age)
FORMAT CSV
WITH 'delimiter'= '|', 's3.access-key-id' = 'keyid', 's3.secret-access-key' = 'secretkey'

A simple example of exporting data to Iceberg. For working with local file system you can use HadoopCatalog that does not need to connect to a Hive MetaStore.

COPY FROM Table1 (id,name,height)
INTO '/tmp/person.i/'
FORMAT ICEBERG
WITH 'table-identifier'='person', 'catalog-impl'='org.apache.iceberg.hadoop.HadoopCatalog'

Exports data into Iceberg on AWS S3:

COPY FROM person (id,name,height)
INTO 's3://iceberg-warehouse/glue-catalog'
FORMAT ICEBERG
WITH
    'table-identifier'='iceberg_db_1.person',
    'io-impl'='org.apache.iceberg.aws.s3.S3FileIO',
    'catalog-impl'='org.apache.iceberg.aws.glue.GlueCatalog',
    's3.client-region'='eu-central-1',
    's3.access-key-id'='YOUR_KEY',
    's3.secret-access-key'='YOUR_SECRET'

Imports data from partitioned Parquet database:

COPY FROM '/tmp/partitioned_table_dir'
INTO city (id, name, population)
FORMAT PARQUET
WITH 'pattern' = '.*'

Where the Parquet table looks like this:

partitioned_table_dir/
├─ CountryCode=USA/
│  ├─ 000000_0.parquet
├─ CountryCode=FR/
│  ├─ 000000_0.parquet