Skip to main content

Summarize

The SUMMARIZE command can be used to easily compute a number of aggregates over a table or a query. The SUMMARIZE command launches a query that computes a number of aggregates over all columns (min, max, approx_unique, avg, std, q25, q50, q75, count), and return these along the column name, column type, and the percentage of NULL values in the column. Note that the quantiles and percentiles are approximate values.

Usage

To summarize the contents of a table, use SUMMARIZE followed by the table name.

SUMMARIZE tbl;

To summarize a query, prepend SUMMARIZE to a query.

SUMMARIZE SELECT * FROM tbl;

Example

Below is an example of SUMMARIZE on the lineitem table of TPC-H SF1 table, generated using the tpch extension.

INSTALL tpch;
LOAD tpch;
CALL dbgen(sf = 1);
SUMMARIZE lineitem;
column_namecolumn_typeminmaxapprox_uniqueavgstdq25q50q75countnull_percentage
l_orderkeyINTEGER1600000015082273000279.6042049821732187.873480351915094472989869448523260012150.0%
l_partkeyINTEGER1200000202598100017.9893299940257735.69082650496499139999215003960012150.0%
l_suppkeyINTEGER110000100615000.6026061389242886.961998730611425014999750060012150.0%
l_linenumberINTEGER1773.00057571675069121.732431403651932823460012150.0%
l_quantityDECIMAL(15,2)1.0050.005025.50796713665482714.42626253701691813263860012150.0%
l_extendedpriceDECIMAL(15,2)901.00104949.5092313938255.13848465685423300.4387109622118756367245515960012150.0%
l_discountDECIMAL(15,2)0.000.10110.049999430115401630.0316198551081259600060012150.0%
l_taxDECIMAL(15,2)0.000.0890.040013508931108120.02581655179884272800060012150.0%
l_returnflagVARCHARAR3NULLNULLNULLNULLNULL60012150.0%
l_linestatusVARCHARFO2NULLNULLNULLNULLNULL60012150.0%
l_shipdateDATE1992-01-021998-12-012516NULLNULLNULLNULLNULL60012150.0%
l_commitdateDATE1992-01-311998-10-312460NULLNULLNULLNULLNULL60012150.0%
l_receiptdateDATE1992-01-041998-12-312549NULLNULLNULLNULLNULL60012150.0%
l_shipinstructVARCHARCOLLECT CODTAKE BACK RETURN4NULLNULLNULLNULLNULL60012150.0%
l_shipmodeVARCHARAIRTRUCK7NULLNULLNULLNULLNULL60012150.0%
l_commentVARCHARTiresiaszzle? furiously iro3558599NULLNULLNULLNULLNULL60012150.0%

Using SUMMARIZE in a Subquery

SUMMARIZE can be used as a subquery. This allows creating a table from the summary, for example:

CREATE TABLE tbl_summary AS SELECT * FROM (SUMMARIZE tbl);

Summarizing Remote Tables

It is possible to summarize remote tables via the httpfs extension using the SUMMARIZE TABLE statement. For example:

SUMMARIZE TABLE 'https://${uri}/Star_Trek-Season_1.csv';