Operational Commands
COPY INTO
Description
Imports data from an external source into a table or exports data to a file from the table. The table the data is imported into must exist.
Diagram( Terminal('COPY'), Terminal('FROM'), NonTerminal('source'), Terminal('INTO'), NonTerminal('target'), Terminal('FORMAT'), NonTerminal('formatType'), Optional( Sequence( Terminal('WITH'), OneOrMore (NonTerminal('parameter', {href:'./grammar-reference/#parameter'}), Terminal(',')) ) ) )
Parameters
-
source
- the full path to the file to import the data data from (iftarget
is a table). Then name of the table and the list of columns to export the data from (iftarget
is a file). If the source is a table, you can further narrow down the selection by using a query. -
target
- the table and table columns when importing data to the table. The path to the file when exporting data. -
formatType
- the format of the file to work with:-
CSV
-
PARQUET
-
ICEBERG
-
-
WITH
- accepts additional parameters. All properties are case-sensitive:-
delimiter
- for CSV files only; default delimiter is,
. Delimiter syntax is'char'
. Any alphanumeric character can be a delimiter. -
pattern
- the file pattern used when importing partitioned Parquet tables in the regular expression format. The regular expression must be enclosed in'
signs. For example,'.*'
imports all files. Partitioned column will not be imported. -
header
- for CSV files only; if set totrue
, specifies that the created file should contain a header line with column names. Column names of the table are used to create the header line. -
S3
:-
s3.client-region
- when using S3 storage, the region to store the data in. -
s3.access-key-id
- when using S3 storage, the AWS access key. -
s3.secret-access-key
- when using S3 storage, the AWS secret key. -
Iceberg-specific:
-
The catalog properties are supported.
-
The
table-identifier
property describes Apache Iceberg TableIdentifier names. The names can be dot-separated. For example,db_name.table_name
ortable
. -
The
warehouse
path can be defined explicitly as a'warehouse'='path'
property, or implicitly as a source or targetCOPY FROM source INTO target
. If both ways are defined, the explicit property is used.
-
-
-
Examples
Import data from columns name
and age
of a CSV file with header into Table1 columns name
and age
:
/* Import data from CSV with column headers */
COPY FROM '/path/to/dir/data.csv'
INTO Table1 (name, age)
FORMAT CSV
Import data from the first two columns of a CSV file without header into Table1 columns name
and age
:
/* Import data from CSV without column headers */
COPY FROM '/path/to/dir/data.csv'
INTO Table1 (name, age)
Exports data from Table1 to a CSV file:
/* Export data to CSV */
COPY FROM (SELECT name, age FROM Table1)
INTO '/path/to/dir/data.csv'
FORMAT CSV
Imports CSV file from AWS S3 into Table1:
/* Import CSV file from s3 */
COPY FROM 's3://mybucket/data.csv'
INTO Table1 (name, age)
FORMAT CSV
WITH 'delimiter'= '|', 's3.access-key-id' = 'keyid', 's3.secret-access-key' = 'secretkey'
A simple example of exporting data to Iceberg. For working with local file system you can use HadoopCatalog that does not need to connect to a Hive MetaStore.
COPY FROM Table1 (id,name,height)
INTO '/tmp/person.i/'
FORMAT ICEBERG
WITH 'table-identifier'='person', 'catalog-impl'='org.apache.iceberg.hadoop.HadoopCatalog'
Exports data into Iceberg on AWS S3:
COPY FROM person (id,name,height)
INTO 's3://iceberg-warehouse/glue-catalog'
FORMAT ICEBERG
WITH
'table-identifier'='iceberg_db_1.person',
'io-impl'='org.apache.iceberg.aws.s3.S3FileIO',
'catalog-impl'='org.apache.iceberg.aws.glue.GlueCatalog',
's3.client-region'='eu-central-1',
's3.access-key-id'='YOUR_KEY',
's3.secret-access-key'='YOUR_SECRET'
Imports data from partitioned Parquet database:
COPY FROM '/tmp/partitioned_table_dir'
INTO city (id, name, population)
FORMAT PARQUET
WITH 'pattern' = '.*'
Where the Parquet table looks like this:
partitioned_table_dir/ ├─ CountryCode=USA/ │ ├─ 000000_0.parquet ├─ CountryCode=FR/ │ ├─ 000000_0.parquet
© 2024 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.