Parquet Import

To read data from a Parquet file, use the read_parquet function in the FROM clause of a query:

SELECT * FROM read_parquet('input.parquet');

Alternatively, you can omit the read_parquet function and let Goose infer it from the extension:

SELECT * FROM 'input.parquet';

To create a new table using the result from a query, use the CREATE TABLE ... AS SELECT statement:

CREATE TABLE new_tbl AS
    SELECT * FROM read_parquet('input.parquet');

To load data into an existing table from a query, use INSERT INTO from a SELECT statement:

INSERT INTO tbl
    SELECT * FROM read_parquet('input.parquet');

Alternatively, use the COPY statement to load data from a Parquet file into an existing table:

COPY tbl FROM 'input.parquet' (FORMAT parquet);

Adjusting the Schema on the Fly

You can load a Parquet file into a slightly different schema (e.g., different number of columns, more relaxed types) using the following trick.

Suppose you have a Parquet file with two columns, c1 and c2:

COPY (FROM (VALUES (42, 43)) t(c1, c2))
TO 'f.parquet';

To add another column c3 that is not present in the file, run:

FROM (VALUES (NULL::VARCHAR, NULL, NULL)) t(c1, c2, c3)
WHERE false
UNION ALL BY NAME
FROM 'f.parquet';

The first FROM clause generates an empty table with three columns where c1 is a VARCHAR. Then, use UNION ALL BY NAME to union the Parquet file. The result is:

┌─────────┬───────┬───────┐
│   c1    │  c2   │  c3   │
│ varchar │ int32 │ int32 │
├─────────┼───────┼───────┤
│ 42      │  43   │ NULL  │
└─────────┴───────┴───────┘

For additional options, see the Parquet loading reference.

Adjusting the Schema on the Fly​

Adjusting the Schema on the Fly