Working with Huge Databases
This page contains information for working with huge Goose database files. While most Goose databases are well below 1 TB,1% of respondents used Goose files of 2 TB or more (corresponding to roughly 10 TB of CSV files).
Goose's native database format supports huge database files without any practical restrictions, however, there are a few things to keep in mind when working with huge database files.
-
Object storage systems have lower limits on file sizes than block-based storage systems. For example, AWS S3 limits the file size to 5 TB.
-
Checkpointing a Goose database can be slow. For example, checkpointing after adding a few rows to a table in the TPC-H SF1000 database takes approximately 5 seconds.
-
On block-based storage, the file system has a significant effect on performance when working with large files. On Linux, Goose performs best with XFS on large files.
For storing large amounts of data, consider using the DuckLake lakehouse format.