Relational API
Relational API 是一种可用于增量构建查询的替代 API。
该 API 以 GoosePyRelation 节点为核心。关系对象可以视为 SQL 查询的符号化表示。
惰性求值
关系对象不持有任何数据,也不会执行任何操作,直到调用触发执行的方法。
例如,我们创建一个会加载 10 亿行数据的关系对象:
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("from range(1_000_000_000)")
在此时,rel 不持有任何数据,也不会从数据库中读取数据。
调用 rel.show() 或在终端直接打印 rel 时,会抓取前 10K 行数据。
如果行数超过 10K,由于关系中的总行数未知,输出窗口会显示 >9999 行。
调用输出方法后,数据会被读取并存储为指定格式:
rel.to_table("example_rel")
# 100% ▕████████████████████████████████████████████████████████████▏
关系创建
本节介绍关系对象的创建方式。相关方法采用惰性求值。
| 名称 | 说明 |
|---|---|
from_arrow | 从 Arrow 对象创建关系对象 |
from_csv_auto | 从 name 中的 CSV 文件创建关系对象 |
from_df | 从 df 中的 DataFrame 创建关系对象 |
from_parquet | 从 Parquet 文件创建关系对象 |
from_query | 运行 SQL 查询。如果是 SELECT 语句,则根据给定 SQL 查询创建关系对象;否则按原样执行该查询。 |
query | 运行 SQL 查询。如果是 SELECT 语句,则根据给定 SQL 查询创建关系对象;否则按原样执行该查询。 |
read_csv | 从 name 中的 CSV 文件创建关系对象 |
read_json | Create a relation object from the JSON file in 'name' |
read_parquet | 从 Parquet 文件创建关系对象 |
sql | 运行 SQL 查询。如果是 SELECT 语句,则根据给定 SQL 查询创建关系对象;否则按原样执行该查询。 |
table | 为指定名称的表创建关系对象 |
table_function | 通过给定参数从指定名称的表函数创建关系对象 |
values | 从传入的值创建关系对象 |
view | 为指定名称的视图创建关系对象 |
from_arrow
签名
from_arrow(self: _goose.GoosePyConnection, arrow_object: object) -> _goose.GoosePyRelation
说明
从 Arrow 对象创建关系对象
参数
-
arrow_object : pyarrow.Table, pyarrow.RecordBatch
用于创建关系对象的 Arrow 对象
示例
import goose
import pyarrow as pa
ids = pa.array([1], type=pa.int8())
texts = pa.array(['a'], type=pa.string())
example_table = pa.table([ids, texts], names=["id", "text"])
goose_conn = goose.connect()
rel = goose_conn.from_arrow(example_table)
rel.show()
结果
┌──────┬─────────┐
│ id │ text │
│ int8 │ varchar │
├──────┼─────────┤
│ 1 │ a │
└──────┴─────────┘
from_csv_auto
签名
from_csv_auto(self: _goose.GoosePyConnection, path_or_buffer: object, **kwargs) -> _goose.GoosePyRelation
说明
从 name 中的 CSV 文件创建关系对象
Aliases: read_csv
参数
-
path_or_buffer : Union[str, StringIO, TextIOBase]
CSV 文件路径或要读取的缓冲区。
-
header : Optional[bool], Optional[int]
Row number(s) to use as the column names, or None if no header.
-
compression : Optional[str]
Compression type (e.g., 'gzip', 'bz2').
-
sep : Optional[str]
Delimiter to use; defaults to comma.
-
delimiter : Optional[str]
Alternative delimiter to use.
-
dtype : Optional[Dict[str, str]], Optional[List[str]]
Data types for columns.
-
na_values : Optional[str], Optional[List[str]]
Additional strings to recognize as NA/NaN.
-
skiprows : Optional[int]
Number of rows to skip at the start.
-
quotechar : Optional[str]
Character used to quote fields.
-
escapechar : Optional[str]
Character used to escape delimiter or quote characters.
-
encoding : Optional[str]
Encoding to use for UTF when reading/writing.
-
parallel : Optional[bool]
Enable parallel reading.
-
date_format : Optional[str]
Format to parse dates.
-
timestamp_format : Optional[str]
Format to parse timestamps.
-
sample_size : Optional[int]
Number of rows to sample for schema inference.
-
all_varchar : Optional[bool]
Treat all columns as VARCHAR.
-
normalize_names : Optional[bool]
Normalize column names to lowercase.
-
null_padding : Optional[bool]
Enable null padding for rows with missing columns.
-
names : Optional[List[str]]
List of column names to use.
-
lineterminator : Optional[str]
Character to break lines on.
-
columns : Optional[Dict[str, str]]
Column mapping for schema.
-
auto_type_candidates : Optional[List[str]]
List of columns for automatic type inference.
-
max_line_size : Optional[int]
Maximum line size in bytes.
-
ignore_errors : Optional[bool]
Ignore parsing errors.
-
store_rejects : Optional[bool]
Store rejected rows.
-
rejects_table : Optional[str]
Table name to store rejected rows.
-
rejects_scan : Optional[str]
Scan to use for rejects.
-
rejects_limit : Optional[int]
Limit number of rejects stored.
-
force_not_null : Optional[List[str]]
List of columns to force as NOT NULL.
-
buffer_size : Optional[int]
Buffer size in bytes.
-
decimal : Optional[str]
Character to recognize as decimal point.
-
allow_quoted_nulls : Optional[bool]
Allow quoted NULL values.
-
filename : Optional[bool], Optional[str]
Add filename column or specify filename.
-
hive_partitioning : Optional[bool]
Enable Hive-style partitioning.
-
union_by_name : Optional[bool]
Union files by column name instead of position.
-
hive_types : Optional[Dict[str, str]]
Hive types for columns.
-
hive_types_autocast : Optional[bool]
Automatically cast Hive types.
-
connection : GoosePyConnection
要使用的 Goose 连接。
示例
import csv
import goose
goose_conn = goose.connect()
with open('code_example.csv', 'w', newline='') as csvfile:
fieldnames = ['id', 'text']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'id': '1', 'text': 'a'})
rel = goose_conn.from_csv_auto("code_example.csv")
rel.show()
结果
┌───────┬─────────┐
│ id │ text │
│ int64 │ varchar │
├───────┼─────────┤
│ 1 │ a │
└───────┴─────────┘
from_df
签名
from_df(self: _goose.GoosePyConnection, df: pandas.DataFrame) -> _goose.GoosePyRelation
说明
从 df 中的 DataFrame 创建关系对象
参数
-
df : pandas.DataFrame
A pandas DataFrame to be converted into a Goose relation.
示例
import goose
import pandas as pd
df = pd.DataFrame(data = {'id': [1], "text":["a"]})
goose_conn = goose.connect()
rel = goose_conn.from_df(df)
rel.show()
结果
┌───────┬─────────┐
│ id │ text │
│ int64 │ varchar │
├───────┼─────────┤
│ 1 │ a │
└───────┴─────────┘
from_parquet
签名
from_parquet(*args, **kwargs)
Overloaded function.
1. from_parquet(self: _goose.GoosePyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _goose.GoosePyRelation
Create a relation object from the Parquet files in file_glob
2. from_parquet(self: _goose.GoosePyConnection, file_globs: collections.abc.Sequence[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _goose.GoosePyRelation
Create a relation object from the Parquet files in file_globs
说明
从 Parquet 文件创建关系对象
Aliases: read_parquet
参数
-
file_glob : str
File path or glob pattern pointing to Parquet files to be read.
-
binary_as_string : bool, default: False
Interpret binary columns as strings instead of blobs.
-
file_row_number : bool, default: False
Add a column containing the row number within each file.
-
filename : bool, default: False
Add a column containing the name of the file each row came from.
-
hive_partitioning : bool, default: False
Enable automatic detection of Hive-style partitions in file paths.
-
union_by_name : bool, default: False
Union Parquet files by matching column names instead of positions.
-
compression : object
Optional compression codec to use when reading the Parquet files.
示例
import goose
import pyarrow as pa
import pyarrow.parquet as pq
ids = pa.array([1], type=pa.int8())
texts = pa.array(['a'], type=pa.string())
example_table = pa.table([ids, texts], names=["id", "text"])
pq.write_table(example_table, "code_example.parquet")
goose_conn = goose.connect()
rel = goose_conn.from_parquet("code_example.parquet")
rel.show()
结果
┌──────┬─────────┐
│ id │ text │
│ int8 │ varchar │
├──────┼─────────┤
│ 1 │ a │
└──────┴─────────┘
from_query
签名
from_query(self: _goose.GoosePyConnection, query: object, *, alias: str = '', params: object = None) -> _goose.GoosePyRelation
说明
运行 SQL 查询。如果是 SELECT 语句,则根据给定 SQL 查询创建关系对象;否则按原样执行该查询。
Warning. Passing
paramsto this method is discouraged due to significant performance overhead. Useexecute()for parameterized queries instead.
参数
-
query : object
The SQL query or subquery to be executed and converted into a relation.
-
alias : str, default: ''
Optional alias name to assign to the resulting relation.
-
params : object
Optional query parameters. Discouraged due to significant performance overhead. Use
execute()for parameterized queries instead.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.from_query("from range(1,2) tbl(id)")
rel.show()
结果
┌───────┐
│ id │
│ int64 │
├───────┤
│ 1 │
└───────┘
query
签名
query(self: _goose.GoosePyConnection, query: object, *, alias: str = '', params: object = None) -> _goose.GoosePyRelation
说明
运行 SQL 查询。如果是 SELECT 语句,则根据给定 SQL 查询创建关系对象;否则按原样执行该查询。
Warning. Passing
paramsto this method is discouraged due to significant performance overhead. Useexecute()for parameterized queries instead.
Aliases: from_query, sql
参数
-
query : object
The SQL query or subquery to be executed and converted into a relation.
-
alias : str, default: ''
Optional alias name to assign to the resulting relation.
-
params : object
Optional query parameters. Discouraged due to significant performance overhead. Use
execute()for parameterized queries instead.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.query("from range(1,2) tbl(id)")
rel.show()
结果
┌───────┐
│ id │
│ int64 │
├───────┤
│ 1 │
└───────┘
read_csv
签名
read_csv(self: _goose.GoosePyConnection, path_or_buffer: object, **kwargs) -> _goose.GoosePyRelation
说明
从 name 中的 CSV 文件创建关系对象
Aliases: from_csv_auto
参数
-
path_or_buffer : Union[str, StringIO, TextIOBase]
CSV 文件路径或要读取的缓冲区。
-
header : Optional[bool], Optional[int]
Row number(s) to use as the column names, or None if no header.
-
compression : Optional[str]
Compression type (e.g., 'gzip', 'bz2').
-
sep : Optional[str]
Delimiter to use; defaults to comma.
-
delimiter : Optional[str]
Alternative delimiter to use.
-
dtype : Optional[Dict[str, str]], Optional[List[str]]
Data types for columns.
-
na_values : Optional[str], Optional[List[str]]
Additional strings to recognize as NA/NaN.
-
skiprows : Optional[int]
Number of rows to skip at the start.
-
quotechar : Optional[str]
Character used to quote fields.
-
escapechar : Optional[str]
Character used to escape delimiter or quote characters.
-
encoding : Optional[str]
Encoding to use for UTF when reading/writing.
-
parallel : Optional[bool]
Enable parallel reading.
-
date_format : Optional[str]
Format to parse dates.
-
timestamp_format : Optional[str]
Format to parse timestamps.
-
sample_size : Optional[int]
Number of rows to sample for schema inference.
-
all_varchar : Optional[bool]
Treat all columns as VARCHAR.
-
normalize_names : Optional[bool]
Normalize column names to lowercase.
-
null_padding : Optional[bool]
Enable null padding for rows with missing columns.
-
names : Optional[List[str]]
List of column names to use.
-
lineterminator : Optional[str]
Character to break lines on.
-
columns : Optional[Dict[str, str]]
Column mapping for schema.
-
auto_type_candidates : Optional[List[str]]
List of columns for automatic type inference.
-
max_line_size : Optional[int]
Maximum line size in bytes.
-
ignore_errors : Optional[bool]
Ignore parsing errors.
-
store_rejects : Optional[bool]
Store rejected rows.
-
rejects_table : Optional[str]
Table name to store rejected rows.
-
rejects_scan : Optional[str]
Scan to use for rejects.
-
rejects_limit : Optional[int]
Limit number of rejects stored.
-
force_not_null : Optional[List[str]]
List of columns to force as NOT NULL.
-
buffer_size : Optional[int]
Buffer size in bytes.
-
decimal : Optional[str]
Character to recognize as decimal point.
-
allow_quoted_nulls : Optional[bool]
Allow quoted NULL values.
-
filename : Optional[bool], Optional[str]
Add filename column or specify filename.
-
hive_partitioning : Optional[bool]
Enable Hive-style partitioning.
-
union_by_name : Optional[bool]
Union files by column name instead of position.
-
hive_types : Optional[Dict[str, str]]
Hive types for columns.
-
hive_types_autocast : Optional[bool]
Automatically cast Hive types.
-
connection : GoosePyConnection
要使用的 Goose 连接。
示例
import csv
import goose
goose_conn = goose.connect()
with open('code_example.csv', 'w', newline='') as csvfile:
fieldnames = ['id', 'text']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'id': '1', 'text': 'a'})
rel = goose_conn.read_csv("code_example.csv")
rel.show()
结果
┌───────┬─────────┐
│ id │ text │
│ int64 │ varchar │
├───────┼─────────┤
│ 1 │ a │
└───────┴─────────┘
read_json
签名
read_json(self: _goose.GoosePyConnection, path_or_buffer: object, *, columns: typing.Optional[object] = None, sample_size: typing.Optional[object] = None, maximum_depth: typing.Optional[object] = None, records: typing.Optional[str] = None, format: typing.Optional[str] = None, date_format: typing.Optional[object] = None, timestamp_format: typing.Optional[object] = None, compression: typing.Optional[object] = None, maximum_object_size: typing.Optional[object] = None, ignore_errors: typing.Optional[object] = None, convert_strings_to_integers: typing.Optional[object] = None, field_appearance_threshold: typing.Optional[object] = None, map_inference_threshold: typing.Optional[object] = None, maximum_sample_files: typing.Optional[object] = None, filename: typing.Optional[object] = None, hive_partitioning: typing.Optional[object] = None, union_by_name: typing.Optional[object] = None, hive_types: typing.Optional[object] = None, hive_types_autocast: typing.Optional[object] = None) -> _goose.GoosePyRelation
说明
Create a relation object from the JSON file in 'name'
参数
-
path_or_buffer : object
File path or file-like object containing JSON data to be read.
-
columns : object
Optional list of column names to project from the JSON data.
-
sample_size : object
Number of rows to sample for inferring JSON schema.
-
maximum_depth : object
Maximum depth to which JSON objects should be parsed.
-
records : str
Format string specifying whether JSON is in records mode.
-
format : str
Format of the JSON data (e.g., 'auto', 'newline_delimited').
-
date_format : object
Format string for parsing date fields.
-
timestamp_format : object
Format string for parsing timestamp fields.
-
compression : object
Compression codec used on the JSON data (e.g., 'gzip').
-
maximum_object_size : object
Maximum size in bytes for individual JSON objects.
-
ignore_errors : object
If True, skip over JSON records with parsing errors.
-
convert_strings_to_integers : object
If True, attempt to convert strings to integers where appropriate.
-
field_appearance_threshold : object
Threshold for inferring optional fields in nested JSON.
-
map_inference_threshold : object
Threshold for inferring maps from JSON object patterns.
-
maximum_sample_files : object
Maximum number of files to sample for schema inference.
-
filename : object
If True, include a column with the source filename for each row.
-
hive_partitioning : object
If True, enable Hive partitioning based on directory structure.
-
union_by_name : object
If True, align JSON columns by name instead of position.
-
hive_types : object
If True, use Hive types from directory structure for schema.
-
hive_types_autocast : object
If True, automatically cast data types to match Hive types.
示例
import goose
import json
with open("code_example.json", mode="w") as f:
json.dump([{'id': 1, "text":"a"}], f)
goose_conn = goose.connect()
rel = goose_conn.read_json("code_example.json")
rel.show()
结果
┌───────┬─────────┐
│ id │ text │
│ int64 │ varchar │
├───────┼─────────┤
│ 1 │ a │
└───────┴─────────┘
read_parquet
签名
read_parquet(*args, **kwargs)
Overloaded function.
1. read_parquet(self: _goose.GoosePyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _goose.GoosePyRelation
Create a relation object from the Parquet files in file_glob
2. read_parquet(self: _goose.GoosePyConnection, file_globs: collections.abc.Sequence[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _goose.GoosePyRelation
Create a relation object from the Parquet files in file_globs
说明
从 Parquet 文件创建关系对象
Aliases: from_parquet
参数
-
file_glob : str
File path or glob pattern pointing to Parquet files to be read.
-
binary_as_string : bool, default: False
Interpret binary columns as strings instead of blobs.
-
file_row_number : bool, default: False
Add a column containing the row number within each file.
-
filename : bool, default: False
Add a column containing the name of the file each row came from.
-
hive_partitioning : bool, default: False
Enable automatic detection of Hive-style partitions in file paths.
-
union_by_name : bool, default: False
Union Parquet files by matching column names instead of positions.
-
compression : object
Optional compression codec to use when reading the Parquet files.
示例
import goose
import pyarrow as pa
import pyarrow.parquet as pq
ids = pa.array([1], type=pa.int8())
texts = pa.array(['a'], type=pa.string())
example_table = pa.table([ids, texts], names=["id", "text"])
pq.write_table(example_table, "code_example.parquet")
goose_conn = goose.connect()
rel = goose_conn.read_parquet("code_example.parquet")
rel.show()
结果
┌──────┬─────────┐
│ id │ text │
│ int8 │ varchar │
├──────┼─────────┤
│ 1 │ a │
└──────┴─────────┘
sql
签名
sql(self: _goose.GoosePyConnection, query: object, *, alias: str = '', params: object = None) -> _goose.GoosePyRelation
说明
运行 SQL 查询。如果是 SELECT 语句,则根据给定 SQL 查询创建关系对象;否则按原样执行该查询。
Warning. Passing
paramsto this method is discouraged due to significant performance overhead. Useexecute()for parameterized queries instead.
Aliases: from_query, query
参数
-
query : object
The SQL query or subquery to be executed and converted into a relation.
-
alias : str, default: ''
Optional alias name to assign to the resulting relation.
-
params : object
Optional query parameters. Discouraged due to significant performance overhead. Use
execute()for parameterized queries instead.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("from range(1,2) tbl(id)")
rel.show()
结果
┌───────┐
│ id │
│ int64 │
├───────┤
│ 1 │
└───────┘
table
签名
table(self: _goose.GoosePyConnection, table_name: str) -> _goose.GoosePyRelation
说明
为指定名称的表创建关系对象
参数
-
table_name : str
Name of the table to create a relation from.
示例
import goose
goose_conn = goose.connect()
goose_conn.sql("create table code_example as select * from range(1,2) tbl(id)")
rel = goose_conn.table("code_example")
rel.show()
结果
┌───────┐
│ id │
│ int64 │
├───────┤
│ 1 │
└───────┘
table_function
签名
table_function(self: _goose.GoosePyConnection, name: str, parameters: object = None) -> _goose.GoosePyRelation
说明
通过给定参数从指定名称的表函数创建关系对象
参数
-
name : str
Name of the table function to call.
-
parameters : object
Optional parameters to pass to the table function.
示例
import goose
goose_conn = goose.connect()
goose_conn.sql("""
create macro get_record_for(x) as table
select x*range from range(1,2)
""")
rel = goose_conn.table_function(name="get_record_for", parameters=[1])
rel.show()
结果
┌───────────────┐
│ (1 * "range") │
│ int64 │
├───────────────┤
│ 1 │
└───────────────┘
values
签名
values(self: _goose.GoosePyConnection, *args) -> _goose.GoosePyRelation
说明
从传入的值创建关系对象
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.values([1, 'a'])
rel.show()
结果
┌───────┬─────────┐
│ col0 │ col1 │
│ int32 │ varchar │
├───────┼─────────┤
│ 1 │ a │
└───────┴─────────┘
view
签名
view(self: _goose.GoosePyConnection, view_name: str) -> _goose.GoosePyRelation
说明
为指定名称的视图创建关系对象
参数
-
view_name : str
Name of the view to create a relation from.
示例
import goose
goose_conn = goose.connect()
goose_conn.sql("create table code_example as select * from range(1,2) tbl(id)")
rel = goose_conn.view("code_example")
rel.show()
结果
┌───────┐
│ id │
│ int64 │
├───────┤
│ 1 │
└───────┘
关系定义细节
This section contains the details on how to inspect a relation.
| 名称 | 说明 |
|---|---|
alias | Get the name of the current alias |
columns | Return a list containing the names of the columns of the relation. |
describe | Gives basic statistics (e.g., min, max) and if NULL exists for each column of the relation. |
description | Return the description of the result |
dtypes | Return a list containing the types of the columns of the relation. |
explain | explain(self: _goose.GoosePyRelation, type: _goose.ExplainType = 'standard') -> str |
query | Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object |
set_alias | Rename the relation object to new alias |
shape | Tuple of # of rows, # of columns in relation. |
show | Display a summary of the data |
sql_query | Get the SQL query that is equivalent to the relation |
type | Get the type of the relation. |
types | Return a list containing the types of the columns of the relation. |
alias
说明
Get the name of the current alias
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.alias
结果
unnamed_relation_43c808c247431be5
columns
说明
Return a list containing the names of the columns of the relation.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.columns
结果
['id', 'description', 'value', 'created_timestamp']
describe
签名
describe(self: _goose.GoosePyRelation) -> _goose.GoosePyRelation
说明
Gives basic statistics (e.g., min, max) and if NULL exists for each column of the relation.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.describe()
结果
┌─────────┬──────────────────────────────────────┬─────────────────┬────────────────────┬────────────────────────────┐
│ aggr │ id │ description │ value │ created_timestamp │
│ varchar │ varchar │ varchar │ double │ varchar │
├─────────┼──────────────────────────────────────┼─────────────────┼────────────────────┼────────────────────────────┤
│ count │ 9 │ 9 │ 9.0 │ 9 │
│ mean │ NULL │ NULL │ 5.0 │ NULL │
│ stddev │ NULL │ NULL │ 2.7386127875258306 │ NULL │
│ min │ 08fdcbf8-4e53-4290-9e81-423af263b518 │ value is even │ 1.0 │ 2025-04-09 15:41:20.642+02 │
│ max │ fb10390e-fad5-4694-91cb-e82728cb6f9f │ value is uneven │ 9.0 │ 2025-04-09 15:49:20.642+02 │
│ median │ NULL │ NULL │ 5.0 │ NULL │
└─────────┴──────────────────────────────────────┴─────────────────┴────────────────────┴────────────────────────────┘
description
说明
Return the description of the result
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.description
结果
[('id', 'UUID', None, None, None, None, None),
('description', 'STRING', None, None, None, None, None),
('value', 'NUMBER', None, None, None, None, None),
('created_timestamp', 'DATETIME', None, None, None, None, None)]
dtypes
说明
Return a list containing the types of the columns of the relation.
Aliases: types
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.dtypes
结果
[UUID, VARCHAR, BIGINT, TIMESTAMP WITH TIME ZONE]
explain
说明
explain(self: _goose.GoosePyRelation, type: _goose.ExplainType = 'standard') -> str
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.explain()
结果
┌───────────────────────────┐
│ PROJECTION │
│ ──────────────────── │
│ id │
│ description │
│ value │
│ created_timestamp │
│ │
│ ~9 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ RANGE │
│ ──────────────────── │
│ Function: RANGE │
│ │
│ ~9 Rows │
└───────────────────────────┘
query
签名
query(self: _goose.GoosePyRelation, virtual_table_name: str, sql_query: str) -> _goose.GoosePyRelation
说明
Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object
参数
-
virtual_table_name : str
The name to assign to the current relation when referenced in the SQL query.
-
sql_query : str
The SQL query string that uses the virtual table name to query the relation.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.query(virtual_table_name="rel_view", sql_query="from rel")
goose_conn.sql("show rel_view")
结果
┌───────────────────┬──────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├───────────────────┼──────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ id │ UUID │ YES │ NULL │ NULL │ NULL │
│ description │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ value │ BIGINT │ YES │ NULL │ NULL │ NULL │
│ created_timestamp │ TIMESTAMP WITH TIME ZONE │ YES │ NULL │ NULL │ NULL │
└───────────────────┴──────────────────────────┴─────────┴─────────┴─────────┴─────────┘
set_alias
签名
set_alias(self: _goose.GoosePyRelation, alias: str) -> _goose.GoosePyRelation
说明
Rename the relation object to new alias
参数
-
alias : str
The alias name to assign to the relation.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.set_alias('abc').select('abc.id')
结果
In the SQL query, the alias will be `abc`
shape
说明
Tuple of # of rows, # of columns in relation.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.shape
结果
(9, 4)
show
签名
show(self: _goose.GoosePyRelation, *, max_width: typing.Optional[typing.SupportsInt] = None, max_rows: typing.Optional[typing.SupportsInt] = None, max_col_width: typing.Optional[typing.SupportsInt] = None, null_value: typing.Optional[str] = None, render_mode: object = None) -> None
说明
Display a summary of the data
参数
-
max_width : int
Maximum display width for the entire output in characters.
-
max_rows : int
Maximum number of rows to display.
-
max_col_width : int
Maximum number of characters to display per column.
-
null_value : str
String to display in place of NULL values.
-
render_mode : object
Render mode for displaying the output.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.show()
结果
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 642ea3d7-793d-4867-a759-91c1226c25a0 │ value is uneven │ 1 │ 2025-04-09 15:41:20.642+02 │
│ 6817dd31-297c-40a8-8e40-8521f00b2d08 │ value is even │ 2 │ 2025-04-09 15:42:20.642+02 │
│ 45143f9a-e16e-4e59-91b2-3a0800eed6d6 │ value is uneven │ 3 │ 2025-04-09 15:43:20.642+02 │
│ fb10390e-fad5-4694-91cb-e82728cb6f9f │ value is even │ 4 │ 2025-04-09 15:44:20.642+02 │
│ 111ced5c-9155-418e-b087-c331b814db90 │ value is uneven │ 5 │ 2025-04-09 15:45:20.642+02 │
│ 66a870a6-aef0-4085-87d5-5d1b35d21c66 │ value is even │ 6 │ 2025-04-09 15:46:20.642+02 │
│ a7e8e796-bca0-44cd-a269-1d71090fb5cc │ value is uneven │ 7 │ 2025-04-09 15:47:20.642+02 │
│ 74908d48-7f2d-4bdd-9c92-1e7920b115b5 │ value is even │ 8 │ 2025-04-09 15:48:20.642+02 │
│ 08fdcbf8-4e53-4290-9e81-423af263b518 │ value is uneven │ 9 │ 2025-04-09 15:49:20.642+02 │
└──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘
sql_query
签名
sql_query(self: _goose.GoosePyRelation) -> str
说明
Get the SQL query that is equivalent to the relation
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.sql_query()
结果
SELECT
gen_random_uuid() AS id,
concat('value is ', CASE WHEN ((mod("range", 2) = 0)) THEN ('even') ELSE 'uneven' END) AS description,
"range" AS "value",
(now() + CAST(concat("range", ' ', 'minutes') AS INTERVAL)) AS created_timestamp
FROM "range"(1, 10)
type
说明
Get the type of the relation.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.type
结果
QUERY_RELATION
types
说明
Return a list containing the types of the columns of the relation.
Aliases: dtypes
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.types
结果
[UUID, VARCHAR, BIGINT, TIMESTAMP WITH TIME ZONE]
Transformation
This section contains the methods which can be used to chain queries. The methods are lazy evaluated.
| 名称 | 说明 |
|---|---|
aggregate | Compute the aggregate aggr_expr by the optional groups group_expr on the relation |
apply | Compute the function of a single column or a list of columns by the optional groups on the relation |
cross | Create cross/cartesian product of two relational objects |
except_ | Create the set except of this relation object with another relation object in other_rel |
filter | Filter the relation object by the filter in filter_expr |
insert | Inserts the given values into the relation |
insert_into | Inserts the relation object into an existing table named table_name |
intersect | Create the set intersection of this relation object with another relation object in other_rel |
join | Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are 'inner', 'left', 'right', 'outer', 'semi' and 'anti' |
limit | Only retrieve the first n rows from this relation object, starting at offset |
map | Calls the passed function on the relation |
order | Reorder the relation object by order_expr |
project | Project the relation object by the projection in project_expr |
select | Project the relation object by the projection in project_expr |
sort | Reorder the relation object by the provided expressions |
union | Create the set union of this relation object with another relation object in other_rel |
update | Update the given relation with the provided expressions |
aggregate
签名
aggregate(self: _goose.GoosePyRelation, aggr_expr: object, group_expr: str = '') -> _goose.GoosePyRelation
说明
Compute the aggregate aggr_expr by the optional groups group_expr on the relation
参数
-
aggr_expr : str, list[Expression]
The list of columns and aggregation functions.
-
group_expr : str, default: ''
The list of columns to be included in
group_by. IfNone,group by allis applied.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel = rel.aggregate('max(value)')
结果
┌──────────────┐
│ max("value") │
│ int64 │
├──────────────┤
│ 9 │
└──────────────┘
apply
签名
apply(self: _goose.GoosePyRelation, function_name: str, function_aggr: str, group_expr: str = '', function_parameter: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Compute the function of a single column or a list of columns by the optional groups on the relation
参数
-
function_name : str
Name of the function to apply over the relation.
-
function_aggr : str
The list of columns to apply the function over.
-
group_expr : str, default: ''
Optional SQL expression for grouping.
-
function_parameter : str, default: ''
Optional parameters to pass into the function.
-
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.apply(
function_name="count",
function_aggr="id",
group_expr="description",
projected_columns="description"
)
结果
┌─────────────────┬───────────┐
│ description │ count(id) │
│ varchar │ int64 │
├─────────────────┼───────────┤
│ value is uneven │ 5 │
│ value is even │ 4 │
└─────────────────┴───────────┘
cross
签名
cross(self: _goose.GoosePyRelation, other_rel: _goose.GoosePyRelation) -> _goose.GoosePyRelation
说明
Create cross/cartesian product of two relational objects
参数
-
other_rel : _goose.GoosePyRelation
Another relation to perform a cross product with.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.cross(other_rel=rel.set_alias("other_rel"))
结果
┌─────────────────────────────┬─────────────────┬───────┬───────────────────────────┬──────────────────────────────────────┬─────────────────┬───────┬───────────────────────────┐
│ id │ description │ value │ created_timestamp │ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │ uuid │ varchar │ int64 │ timestamp with time zone │
├─────────────────────────────┼─────────────────┼───────┼───────────────────────────┼──────────────────────────────────────┼─────────────────┼───────┼───────────────────────────┤
│ cb2b453f-1a06-4f5e-abe1-b… │ value is uneven │ 1 │ 2025-04-10 09:53:29.78+02 │ cb2b453f-1a06-4f5e-abe1-bfd413581bcf │ value is uneven │ 1 │ 2025-04-10 09:53:29.78+02 │
...
except_
签名
except_(self: _goose.GoosePyRelation, other_rel: _goose.GoosePyRelation) -> _goose.GoosePyRelation
说明
Create the set except of this relation object with another relation object in other_rel
参数
-
other_rel : _goose.GoosePyRelation
The relation to subtract from the current relation (set difference).
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.except_(other_rel=rel.set_alias("other_rel"))
结果
The relation query is executed twice, therefore generating different ids and timestamps:
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ f69ed6dd-a7fe-4de2-b6af-1c2418096d69 │ value is uneven │ 3 │ 2025-04-10 11:43:05.711+02 │
│ 08ad11dc-a9c2-4aaa-9272-760b27ad1f5d │ value is uneven │ 7 │ 2025-04-10 11:47:05.711+02 │
...
filter
签名
filter(self: _goose.GoosePyRelation, filter_expr: object) -> _goose.GoosePyRelation
说明
Filter the relation object by the filter in filter_expr
参数
-
filter_expr : str, Expression
The filter expression to apply over the relation.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.filter("value = 2")
结果
┌──────────────────────────────────────┬───────────────┬───────┬───────────────────────────┐
│ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │
├──────────────────────────────────────┼───────────────┼───────┼───────────────────────────┤
│ b0684ab7-fcbf-41c5-8e4a-a51bdde86926 │ value is even │ 2 │ 2025-04-10 09:54:29.78+02 │
└──────────────────────────────────────┴───────────────┴───────┴───────────────────────────┘
insert
签名
insert(self: _goose.GoosePyRelation, values: object) -> None
说明
Inserts the given values into the relation
参数
-
values : object
A tuple of values matching the relation column list, to be inserted.
示例
import goose
from datetime import datetime
from uuid import uuid4
goose_conn = goose.connect()
goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
).to_table("code_example")
rel = goose_conn.table("code_example")
rel.insert(
(
uuid4(),
'value is even',
10,
datetime.now()
)
)
rel.filter("value = 10")
结果
┌──────────────────────────────────────┬───────────────┬───────┬───────────────────────────────┐
│ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │
├──────────────────────────────────────┼───────────────┼───────┼───────────────────────────────┤
│ c6dfab87-fae6-4213-8f76-1b96a8d179f6 │ value is even │ 10 │ 2025-04-10 10:02:24.652218+02 │
└──────────────────────────────────────┴───────────────┴───────┴───────────────────────────────┘
insert_into
签名
insert_into(self: _goose.GoosePyRelation, table_name: str) -> None
说明
Inserts the relation object into an existing table named table_name
参数
-
table_name : str
The table name to insert the data into. The relation must respect the column order of the table.
示例
import goose
from datetime import datetime
from uuid import uuid4
goose_conn = goose.connect()
goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
).to_table("code_example")
rel = goose_conn.values(
[
uuid4(),
'value is even',
10,
datetime.now()
]
)
rel.insert_into("code_example")
goose_conn.table("code_example").filter("value = 10")
结果
┌──────────────────────────────────────┬───────────────┬───────┬───────────────────────────────┐
│ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │
├──────────────────────────────────────┼───────────────┼───────┼───────────────────────────────┤
│ 271c5ddd-c1d5-4638-b5a0-d8c7dc9e8220 │ value is even │ 10 │ 2025-04-10 14:29:18.616379+02 │
└──────────────────────────────────────┴───────────────┴───────┴───────────────────────────────┘
intersect
签名
intersect(self: _goose.GoosePyRelation, other_rel: _goose.GoosePyRelation) -> _goose.GoosePyRelation
说明
Create the set intersection of this relation object with another relation object in other_rel
参数
-
other_rel : _goose.GoosePyRelation
The relation to intersect with the current relation (set intersection).
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.intersect(other_rel=rel.set_alias("other_rel"))
结果
The relation query is executed once with `rel` and once with `other_rel`,
therefore generating different ids and timestamps:
┌──────┬─────────────┬───────┬──────────────────────────┐
│ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │
├──────┴─────────────┴───────┴──────────────────────────┤
│ 0 rows │
└───────────────────────────────────────────────────────┘
join
签名
join(self: _goose.GoosePyRelation, other_rel: _goose.GoosePyRelation, condition: object, how: str = 'inner') -> _goose.GoosePyRelation
说明
Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are 'inner', 'left', 'right', 'outer', 'semi' and 'anti'
Depending on how the condition parameter is provided, the JOIN clause generated is:
USING
import goose
goose_conn = goose.connect()
rel1 = goose_conn.sql("select range as id, concat('dummy 1', range) as text from range(1,10)")
rel2 = goose_conn.sql("select range as id, concat('dummy 2', range) as text from range(5,7)")
rel1.join(rel2, condition="id", how="inner").sql_query()
with following SQL:
SELECT *
FROM (
SELECT "range" AS id,
concat('dummy 1', "range") AS "text"
FROM "range"(1, 10)
) AS unnamed_relation_41bc15e744037078
INNER JOIN (
SELECT "range" AS id,
concat('dummy 2', "range") AS "text"
FROM "range"(5, 7)
) AS unnamed_relation_307e245965aa2c2b
USING (id)
ON
import goose
goose_conn = goose.connect()
rel1 = goose_conn.sql("select range as id, concat('dummy 1', range) as text from range(1,10)")
rel2 = goose_conn.sql("select range as id, concat('dummy 2', range) as text from range(5,7)")
rel1.join(rel2, condition=f"{rel1.alias}.id = {rel2.alias}.id", how="inner").sql_query()
with the following SQL:
SELECT *
FROM (
SELECT "range" AS id,
concat('dummy 1', "range") AS "text"
FROM "range"(1, 10)
) AS unnamed_relation_41bc15e744037078
INNER JOIN (
SELECT "range" AS id,
concat('dummy 2', "range") AS "text"
FROM "range"(5, 7)
) AS unnamed_relation_307e245965aa2c2b
ON ((unnamed_relation_41bc15e744037078.id = unnamed_relation_307e245965aa2c2b.id))
NATURAL,POSITIONALandASOFjoins are not provided by the relational API.CROSSjoins are provided through the cross method.
参数
-
other_rel : _goose.GoosePyRelation
The relation to join with the current relation.
-
condition : object
The join condition, typically a SQL expression or the duplicated column name to join on.
-
how : str, default: 'inner'
The type of join to perform: 'inner', 'left', 'right', 'outer', 'semi' and 'anti'.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel = rel.set_alias("rel").join(
other_rel=rel.set_alias("other_rel"),
condition="rel.id = other_rel.id",
how="left"
)
rel.count("*")
结果
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 9 │
└──────────────┘
limit
签名
limit(self: _goose.GoosePyRelation, n: typing.SupportsInt, offset: typing.SupportsInt = 0) -> _goose.GoosePyRelation
说明
Only retrieve the first n rows from this relation object, starting at offset
参数
-
n : int
The maximum number of rows to return.
-
offset : int, default: 0
The number of rows to skip before starting to return rows.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.limit(1)
结果
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 4135597b-29e7-4cb9-a443-41f3d54f25df │ value is uneven │ 1 │ 2025-04-10 10:52:03.678+02 │
└──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘
map
签名
map(self: _goose.GoosePyRelation, map_function: collections.abc.Callable, *, schema: typing.Optional[object] = None) -> _goose.GoosePyRelation
说明
Calls the passed function on the relation
参数
-
map_function : Callable
A Python function that takes a DataFrame and returns a transformed DataFrame.
-
schema : object, default: None
Optional schema describing the structure of the output relation.
示例
import goose
from pandas import DataFrame
def multiply_by_2(df: DataFrame):
df["id"] = df["id"] * 2
return df
goose_conn = goose.connect()
rel = goose_conn.sql("select range as id, 'dummy' as text from range(1,3)")
rel.map(multiply_by_2, schema={"id": int, "text": str})
结果
┌───────┬─────────┐
│ id │ text │
│ int64 │ varchar │
├───────┼─────────┤
│ 2 │ dummy │
│ 4 │ dummy │
└───────┴─────────┘
order
签名
order(self: _goose.GoosePyRelation, order_expr: str) -> _goose.GoosePyRelation
说明
Reorder the relation object by order_expr
参数
-
order_expr : str
SQL expression defining the ordering of the result rows.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.order("value desc").limit(1, offset=4)
结果
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 55899131-e3d3-463c-a215-f65cb8aef3bf │ value is uneven │ 5 │ 2025-04-10 10:56:03.678+02 │
└──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘
project
签名
project(self: _goose.GoosePyRelation, *args, groups: str = '') -> _goose.GoosePyRelation
说明
Project the relation object by the projection in project_expr
Aliases: select
参数
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.project("description").limit(1)
结果
┌─────────────────┐
│ description │
│ varchar │
├─────────────────┤
│ value is uneven │
└─────────────────┘
select
签名
select(self: _goose.GoosePyRelation, *args, groups: str = '') -> _goose.GoosePyRelation
说明
Project the relation object by the projection in project_expr
Aliases: project
参数
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.select("description").limit(1)
结果
┌─────────────────┐
│ description │
│ varchar │
├─────────────────┤
│ value is uneven │
└─────────────────┘
sort
签名
sort(self: _goose.GoosePyRelation, *args) -> _goose.GoosePyRelation
说明
Reorder the relation object by the provided expressions
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.sort("description")
结果
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 5e0dfa8c-de4d-4ccd-8cff-450dabb86bde │ value is even │ 6 │ 2025-04-10 16:52:15.605+02 │
│ 95f1ad48-facf-4a84-a971-0a4fecce68c7 │ value is even │ 2 │ 2025-04-10 16:48:15.605+02 │
...
union
签名
union(self: _goose.GoosePyRelation, union_rel: _goose.GoosePyRelation) -> _goose.GoosePyRelation
说明
Create the set union of this relation object with another relation object in other_rel
The union is
union all. In order to retrieve distinct values, apply distinct.
参数
-
union_rel : _goose.GoosePyRelation
The relation to union with the current relation (set union).
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel = rel.union(union_rel=rel)
rel.count("*")
结果
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 18 │
└──────────────┘
update
签名
update(self: _goose.GoosePyRelation, set: object, *, condition: object = None) -> None
说明
Update the given relation with the provided expressions
参数
-
set : object
Mapping of columns to new values for the update operation.
-
condition : object, default: None
Optional condition to filter which rows to update.
示例
import goose
from goose import ColumnExpression
goose_conn = goose.connect()
goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
).to_table("code_example")
rel = goose_conn.table("code_example")
rel.update(set={"description":None}, condition=ColumnExpression("value") == 1)
# the update is executed on the table, but not reflected on the relationship
# the relationship has to be recreated to retrieve the modified data
rel = goose_conn.table("code_example")
rel.show()
结果
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 66dcaa14-f4a6-4a55-af3b-7f6aa23ab4ad │ NULL │ 1 │ 2025-04-10 16:54:49.317+02 │
│ c6a18a42-67fb-4c95-827b-c966f2f95b88 │ value is even │ 2 │ 2025-04-10 16:55:49.317+02 │
...
Functions
This section contains the functions which can be applied to a relation, in order to get a (scalar) result. The functions are lazy evaluated.
| 名称 | 说明 |
|---|---|
any_value | Returns the first non-null value from a given column |
arg_max | Finds the row with the maximum value for a value column and returns the value of that row for an argument column |
arg_min | Finds the row with the minimum value for a value column and returns the value of that row for an argument column |
avg | Computes the average on a given column |
bit_and | Computes the bitwise AND of all bits present in a given column |
bit_or | Computes the bitwise OR of all bits present in a given column |
bit_xor | Computes the bitwise XOR of all bits present in a given column |
bitstring_agg | Computes a bitstring with bits set for each distinct value in a given column |
bool_and | Computes the logical AND of all values present in a given column |
bool_or | Computes the logical OR of all values present in a given column |
count | Computes the number of elements present in a given column |
cume_dist | Computes the cumulative distribution within the partition |
dense_rank | Computes the dense rank within the partition |
distinct | Retrieve distinct rows from this relation object |
favg | Computes the average of all values present in a given column using a more accurate floating point summation (Kahan Sum) |
first | Returns the first value of a given column |
first_value | Computes the first value within the group or partition |
fsum | Computes the sum of all values present in a given column using a more accurate floating point summation (Kahan Sum) |
geomean | Computes the geometric mean over all values present in a given column |
histogram | Computes the histogram over all values present in a given column |
lag | Computes the lag within the partition |
last | Returns the last value of a given column |
last_value | Computes the last value within the group or partition |
lead | Computes the lead within the partition |
list | Returns a list containing all values present in a given column |
max | Returns the maximum value present in a given column |
mean | Computes the average on a given column |
median | Computes the median over all values present in a given column |
min | Returns the minimum value present in a given column |
mode | Computes the mode over all values present in a given column |
n_tile | Divides the partition as equally as possible into num_buckets |
nth_value | Computes the nth value within the partition |
percent_rank | Computes the relative rank within the partition |
product | Returns the product of all values present in a given column |
quantile | Computes the exact quantile value for a given column |
quantile_cont | Computes the interpolated quantile value for a given column |
quantile_disc | Computes the exact quantile value for a given column |
rank | Computes the rank within the partition |
rank_dense | Computes the dense rank within the partition |
row_number | Computes the row number within the partition |
select_dtypes | Select columns from the relation, by filtering based on type(s) |
select_types | Select columns from the relation, by filtering based on type(s) |
std | Computes the sample standard deviation for a given column |
stddev | Computes the sample standard deviation for a given column |
stddev_pop | Computes the population standard deviation for a given column |
stddev_samp | Computes the sample standard deviation for a given column |
string_agg | Concatenates the values present in a given column with a separator |
sum | Computes the sum of all values present in a given column |
unique | Returns the distinct values in a column. |
value_counts | Computes the number of elements present in a given column, also projecting the original column |
var | Computes the sample variance for a given column |
var_pop | Computes the population variance for a given column |
var_samp | Computes the sample variance for a given column |
variance | Computes the sample variance for a given column |
any_value
签名
any_value(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Returns the first non-null value from a given column
参数
-
column : str
The column name from which to retrieve any value.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...). -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.any_value('id')
结果
┌──────────────────────────────────────┐
│ any_value(id) │
│ uuid │
├──────────────────────────────────────┤
│ 642ea3d7-793d-4867-a759-91c1226c25a0 │
└──────────────────────────────────────┘
arg_max
签名
arg_max(self: _goose.GoosePyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Finds the row with the maximum value for a value column and returns the value of that row for an argument column
参数
-
arg_column : str
The column name for which to find the argument maximizing the value.
-
value_column : str
The column name containing values used to determine the maximum.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...). -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.arg_max(arg_column="value", value_column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬───────────────────────────┐
│ description │ arg_max("value", "value") │
│ varchar │ int64 │
├─────────────────┼───────────────────────────┤
│ value is uneven │ 9 │
│ value is even │ 8 │
└─────────────────┴───────────────────────────┘
arg_min
签名
arg_min(self: _goose.GoosePyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Finds the row with the minimum value for a value column and returns the value of that row for an argument column
参数
-
arg_column : str
The column name for which to find the argument minimizing the value.
-
value_column : str
The column name containing values used to determine the minimum.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.arg_min(arg_column="value", value_column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬───────────────────────────┐
│ description │ arg_min("value", "value") │
│ varchar │ int64 │
├─────────────────┼───────────────────────────┤
│ value is even │ 2 │
│ value is uneven │ 1 │
└─────────────────┴───────────────────────────┘
avg
签名
avg(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the average on a given column
参数
-
column : str
The column name to calculate the average on.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.avg('value')
结果
┌──────────────┐
│ avg("value") │
│ double │
├──────────────┤
│ 5.0 │
└──────────────┘
bit_and
签名
bit_and(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the bitwise AND of all bits present in a given column
参数
-
column : str
The column name to perform the bitwise AND aggregation on.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel = rel.select("description, value::bit as value_bit")
rel.bit_and(column="value_bit", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────────────────────────────────────────────────────┐
│ description │ bit_and(value_bit) │
│ varchar │ bit │
├─────────────────┼──────────────────────────────────────────────────────────────────┤
│ value is uneven │ 0000000000000000000000000000000000000000000000000000000000000001 │
│ value is even │ 0000000000000000000000000000000000000000000000000000000000000000 │
└─────────────────┴──────────────────────────────────────────────────────────────────┘
bit_or
签名
bit_or(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the bitwise OR of all bits present in a given column
参数
-
column : str
The column name to perform the bitwise OR aggregation on.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel = rel.select("description, value::bit as value_bit")
rel.bit_or(column="value_bit", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────────────────────────────────────────────────────┐
│ description │ bit_or(value_bit) │
│ varchar │ bit │
├─────────────────┼──────────────────────────────────────────────────────────────────┤
│ value is uneven │ 0000000000000000000000000000000000000000000000000000000000001111 │
│ value is even │ 0000000000000000000000000000000000000000000000000000000000001110 │
└─────────────────┴──────────────────────────────────────────────────────────────────┘
bit_xor
签名
bit_xor(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the bitwise XOR of all bits present in a given column
参数
-
column : str
The column name to perform the bitwise XOR aggregation on.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel = rel.select("description, value::bit as value_bit")
rel.bit_xor(column="value_bit", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────────────────────────────────────────────────────┐
│ description │ bit_xor(value_bit) │
│ varchar │ bit │
├─────────────────┼──────────────────────────────────────────────────────────────────┤
│ value is even │ 0000000000000000000000000000000000000000000000000000000000001000 │
│ value is uneven │ 0000000000000000000000000000000000000000000000000000000000001001 │
└─────────────────┴──────────────────────────────────────────────────────────────────┘
bitstring_agg
签名
bitstring_agg(self: _goose.GoosePyRelation, column: str, min: typing.Optional[object] = None, max: typing.Optional[object] = None, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes a bitstring with bits set for each distinct value in a given column
参数
-
column : str
The column name to aggregate as a bitstring.
-
min : object, default: None
Optional minimum bitstring value for aggregation.
-
max : object, default: None
Optional maximum bitstring value for aggregation.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.bitstring_agg(column="value", groups="description", projected_columns="description", min=1, max=9)
结果
┌─────────────────┬────────────────────────┐
│ description │ bitstring_agg("value") │
│ varchar │ bit │
├─────────────────┼────────────────────────┤
│ value is uneven │ 101010101 │
│ value is even │ 010101010 │
└─────────────────┴────────────────────────┘
bool_and
签名
bool_and(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the logical AND of all values present in a given column
参数
-
column : str
The column name to perform the boolean AND aggregation on.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel = rel.select("description, mod(value,2)::boolean as uneven")
rel.bool_and(column="uneven", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────┐
│ description │ bool_and(uneven) │
│ varchar │ boolean │
├─────────────────┼──────────────────┤
│ value is even │ false │
│ value is uneven │ true │
└─────────────────┴──────────────────┘
bool_or
签名
bool_or(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the logical OR of all values present in a given column
参数
-
column : str
The column name to perform the boolean OR aggregation on.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel = rel.select("description, mod(value,2)::boolean as uneven")
rel.bool_or(column="uneven", groups="description", projected_columns="description")
结果
┌─────────────────┬─────────────────┐
│ description │ bool_or(uneven) │
│ varchar │ boolean │
├─────────────────┼─────────────────┤
│ value is even │ false │
│ value is uneven │ true │
└─────────────────┴─────────────────┘
count
签名
count(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the number of elements present in a given column
参数
-
column : str
The column name to perform count on.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.count("id")
结果
┌───────────┐
│ count(id) │
│ int64 │
├───────────┤
│ 9 │
└───────────┘
cume_dist
签名
cume_dist(self: _goose.GoosePyRelation, window_spec: str, projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the cumulative distribution within the partition
参数
-
window_spec : str
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.cume_dist(window_spec="over (partition by description order by value)", projected_columns="description, value")
结果
┌─────────────────┬───────┬──────────────────────────────────────────────────────────────┐
│ description │ value │ cume_dist() OVER (PARTITION BY description ORDER BY "value") │
│ varchar │ int64 │ double │
├─────────────────┼───────┼──────────────────────────────────────────────────────────────┤
│ value is uneven │ 1 │ 0.2 │
│ value is uneven │ 3 │ 0.4 │
│ value is uneven │ 5 │ 0.6 │
│ value is uneven │ 7 │ 0.8 │
│ value is uneven │ 9 │ 1.0 │
│ value is even │ 2 │ 0.25 │
│ value is even │ 4 │ 0.5 │
│ value is even │ 6 │ 0.75 │
│ value is even │ 8 │ 1.0 │
└─────────────────┴───────┴──────────────────────────────────────────────────────────────┘
dense_rank
签名
dense_rank(self: _goose.GoosePyRelation, window_spec: str, projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the dense rank within the partition
Aliases: rank_dense
参数
-
window_spec : str
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.dense_rank(window_spec="over (partition by description order by value)", projected_columns="description, value")
结果
┌─────────────────┬───────┬───────────────────────────────────────────────────────────────┐
│ description │ value │ dense_rank() OVER (PARTITION BY description ORDER BY "value") │
│ varchar │ int64 │ int64 │
├─────────────────┼───────┼───────────────────────────────────────────────────────────────┤
│ value is even │ 2 │ 1 │
│ value is even │ 4 │ 2 │
│ value is even │ 6 │ 3 │
│ value is even │ 8 │ 4 │
│ value is uneven │ 1 │ 1 │
│ value is uneven │ 3 │ 2 │
│ value is uneven │ 5 │ 3 │
│ value is uneven │ 7 │ 4 │
│ value is uneven │ 9 │ 5 │
└─────────────────┴───────┴───────────────────────────────────────────────────────────────┘
distinct
签名
distinct(self: _goose.GoosePyRelation) -> _goose.GoosePyRelation
说明
Retrieve distinct rows from this relation object
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("select range from range(1,4)")
rel = rel.union(union_rel=rel)
rel.distinct().order("range")
结果
┌───────┐
│ range │
│ int64 │
├───────┤
│ 1 │
│ 2 │
│ 3 │
└───────┘
favg
签名
favg(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the average of all values present in a given column using a more accurate floating point summation (Kahan Sum)
参数
-
column : str
The column name to calculate the average on.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.favg(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬───────────────┐
│ description │ favg("value") │
│ varchar │ double │
├─────────────────┼───────────────┤
│ value is uneven │ 5.0 │
│ value is even │ 5.0 │
└─────────────────┴───────────────┘
first
签名
first(self: _goose.GoosePyRelation, column: str, groups: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Returns the first value of a given column
参数
-
column : str
The column name from which to retrieve the first value.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.first(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────┐
│ description │ "first"("value") │
│ varchar │ int64 │
├─────────────────┼──────────────────┤
│ value is even │ 2 │
│ value is uneven │ 1 │
└─────────────────┴──────────────────┘
first_value
签名
first_value(self: _goose.GoosePyRelation, column: str, window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the first value within the group or partition
参数
-
column : str
The column name from which to retrieve the first value.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.first_value(column="value", window_spec="over (partition by description order by value)", projected_columns="description").distinct()
结果
┌─────────────────┬───────────────────────────────────────────────────────────────────────┐
│ description │ first_value("value") OVER (PARTITION BY description ORDER BY "value") │
│ varchar │ int64 │
├─────────────────┼───────────────────────────────────────────────────────────────────────┤
│ value is even │ 2 │
│ value is uneven │ 1 │
└─────────────────┴───────────────────────────────────────────────────────────────────────┘
fsum
签名
fsum(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the sum of all values present in a given column using a more accurate floating point summation (Kahan Sum)
参数
-
column : str
The column name to calculate the sum on.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.fsum(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬───────────────┐
│ description │ fsum("value") │
│ varchar │ double │
├─────────────────┼───────────────┤
│ value is even │ 20.0 │
│ value is uneven │ 25.0 │
└─────────────────┴───────────────┘
geomean
签名
geomean(self: _goose.GoosePyRelation, column: str, groups: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the geometric mean over all values present in a given column
参数
-
column : str
The column name to calculate the geometric mean on.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.geomean(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬───────────────────┐
│ description │ geomean("value") │
│ varchar │ double │
├─────────────────┼───────────────────┤
│ value is uneven │ 3.936283427035351 │
│ value is even │ 4.426727678801287 │
└─────────────────┴───────────────────┘
histogram
签名
histogram(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the histogram over all values present in a given column
参数
-
column : str
The column name to calculate the histogram on.
-
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.histogram(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬───────────────────────────┐
│ description │ histogram("value") │
│ varchar │ map(bigint, ubigint) │
├─────────────────┼───────────────────────────┤
│ value is uneven │ {1=1, 3=1, 5=1, 7=1, 9=1} │
│ value is even │ {2=1, 4=1, 6=1, 8=1} │
└─────────────────┴───────────────────────────┘
lag
签名
lag(self: _goose.GoosePyRelation, column: str, window_spec: str, offset: typing.SupportsInt = 1, default_value: str = 'NULL', ignore_nulls: bool = False, projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the lag within the partition
参数
-
column : str
The column name to apply the lag function on.
-
window_spec : str
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
offset : int, default: 1
The number of rows to lag behind.
-
default_value : str, default: 'NULL'
The default value to return when the lag offset goes out of bounds.
-
ignore_nulls : bool, default: False
Whether to ignore NULL values when computing the lag.
-
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.lag(column="description", window_spec="over (order by value)", projected_columns="description, value")
结果
┌─────────────────┬───────┬───────────────────────────────────────────────────┐
│ description │ value │ lag(description, 1, NULL) OVER (ORDER BY "value") │
│ varchar │ int64 │ varchar │
├─────────────────┼───────┼───────────────────────────────────────────────────┤
│ value is uneven │ 1 │ NULL │
│ value is even │ 2 │ value is uneven │
│ value is uneven │ 3 │ value is even │
│ value is even │ 4 │ value is uneven │
│ value is uneven │ 5 │ value is even │
│ value is even │ 6 │ value is uneven │
│ value is uneven │ 7 │ value is even │
│ value is even │ 8 │ value is uneven │
│ value is uneven │ 9 │ value is even │
└─────────────────┴───────┴───────────────────────────────────────────────────┘
last
签名
last(self: _goose.GoosePyRelation, column: str, groups: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Returns the last value of a given column
参数
-
column : str
The column name from which to retrieve the last value.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.last(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬─────────────────┐
│ description │ "last"("value") │
│ varchar │ int64 │
├─────────────────┼─────────────────┤
│ value is even │ 8 │
│ value is uneven │ 9 │
└─────────────────┴─────────────────┘
last_value
签名
last_value(self: _goose.GoosePyRelation, column: str, window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the last value within the group or partition
参数
-
column : str
The column name from which to retrieve the last value within the window.
-
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.last_value(column="value", window_spec="over (order by description)", projected_columns="description").distinct()
结果
┌─────────────────┬─────────────────────────────────────────────────┐
│ description │ last_value("value") OVER (ORDER BY description) │
│ varchar │ int64 │
├─────────────────┼─────────────────────────────────────────────────┤
│ value is uneven │ 9 │
│ value is even │ 8 │
└─────────────────┴─────────────────────────────────────────────────┘
lead
签名
lead(self: _goose.GoosePyRelation, column: str, window_spec: str, offset: typing.SupportsInt = 1, default_value: str = 'NULL', ignore_nulls: bool = False, projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the lead within the partition
参数
-
column : str
The column name to apply the lead function on.
-
window_spec : str
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
offset : int, default: 1
The number of rows to lead ahead.
-
default_value : str, default: 'NULL'
The default value to return when the lead offset goes out of bounds.
-
ignore_nulls : bool, default: False
Whether to ignore NULL values when computing the lead.
-
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.lead(column="description", window_spec="over (order by value)", projected_columns="description, value")
结果
┌─────────────────┬───────┬────────────────────────────────────────────────────┐
│ description │ value │ lead(description, 1, NULL) OVER (ORDER BY "value") │
│ varchar │ int64 │ varchar │
├─────────────────┼───────┼────────────────────────────────────────────────────┤
│ value is uneven │ 1 │ value is even │
│ value is even │ 2 │ value is uneven │
│ value is uneven │ 3 │ value is even │
│ value is even │ 4 │ value is uneven │
│ value is uneven │ 5 │ value is even │
│ value is even │ 6 │ value is uneven │
│ value is uneven │ 7 │ value is even │
│ value is even │ 8 │ value is uneven │
│ value is uneven │ 9 │ NULL │
└─────────────────┴───────┴────────────────────────────────────────────────────┘
list
签名
list(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Returns a list containing all values present in a given column
参数
-
column : str
The column name to aggregate values into a list.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.list(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬─────────────────┐
│ description │ list("value") │
│ varchar │ int64[] │
├─────────────────┼─────────────────┤
│ value is even │ [2, 4, 6, 8] │
│ value is uneven │ [1, 3, 5, 7, 9] │
└─────────────────┴─────────────────┘
max
签名
max(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Returns the maximum value present in a given column
参数
-
column : str
The column name to calculate the maximum value of.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.max(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────┐
│ description │ max("value") │
│ varchar │ int64 │
├─────────────────┼──────────────┤
│ value is even │ 8 │
│ value is uneven │ 9 │
└─────────────────┴──────────────┘
mean
签名
mean(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the average on a given column
参数
-
column : str
The column name to calculate the mean value of.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.mean(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────┐
│ description │ avg("value") │
│ varchar │ double │
├─────────────────┼──────────────┤
│ value is even │ 5.0 │
│ value is uneven │ 5.0 │
└─────────────────┴──────────────┘
median
签名
median(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the median over all values present in a given column
参数
-
column : str
The column name to calculate the median value of.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.median(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬─────────────────┐
│ description │ median("value") │
│ varchar │ double │
├─────────────────┼─────────────────┤
│ value is even │ 5.0 │
│ value is uneven │ 5.0 │
└─────────────────┴─────────────────┘
min
签名
min(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Returns the minimum value present in a given column
参数
-
column : str
The column name to calculate the min value of.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.min(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────┐
│ description │ min("value") │
│ varchar │ int64 │
├─────────────────┼──────────────┤
│ value is uneven │ 1 │
│ value is even │ 2 │
└─────────────────┴──────────────┘
mode
签名
mode(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the mode over all values present in a given column
参数
-
column : str
The column name to calculate the mode (most frequent value) of.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.mode(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬─────────────────┐
│ description │ "mode"("value") │
│ varchar │ int64 │
├─────────────────┼─────────────────┤
│ value is uneven │ 1 │
│ value is even │ 2 │
└─────────────────┴─────────────────┘
n_tile
签名
n_tile(self: _goose.GoosePyRelation, window_spec: str, num_buckets: typing.SupportsInt, projected_columns: str = '') -> _goose.GoosePyRelation
说明
Divides the partition as equally as possible into num_buckets
参数
-
window_spec : str
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
num_buckets : int
The number of buckets to divide the rows into.
-
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.n_tile(window_spec="over (partition by description)", num_buckets=2, projected_columns="description, value")
结果
┌─────────────────┬───────┬──────────────────────────────────────────┐
│ description │ value │ ntile(2) OVER (PARTITION BY description) │
│ varchar │ int64 │ int64 │
├─────────────────┼───────┼──────────────────────────────────────────┤
│ value is uneven │ 1 │ 1 │
│ value is uneven │ 3 │ 1 │
│ value is uneven │ 5 │ 1 │
│ value is uneven │ 7 │ 2 │
│ value is uneven │ 9 │ 2 │
│ value is even │ 2 │ 1 │
│ value is even │ 4 │ 1 │
│ value is even │ 6 │ 2 │
│ value is even │ 8 │ 2 │
└─────────────────┴───────┴──────────────────────────────────────────┘
nth_value
签名
nth_value(self: _goose.GoosePyRelation, column: str, window_spec: str, offset: typing.SupportsInt, ignore_nulls: bool = False, projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the nth value within the partition
参数
-
column : str
The column name from which to retrieve the nth value within the window.
-
window_spec : str
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
offset : int
The position of the value to retrieve within the window (1-based index).
-
ignore_nulls : bool, default: False
Whether to ignore NULL values when computing the nth value.
-
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.nth_value(column="value", window_spec="over (partition by description)", projected_columns="description", offset=1)
结果
┌─────────────────┬───────────────────────────────────────────────────────┐
│ description │ nth_value("value", 1) OVER (PARTITION BY description) │
│ varchar │ int64 │
├─────────────────┼───────────────────────────────────────────────────────┤
│ value is even │ 2 │
│ value is even │ 2 │
│ value is even │ 2 │
│ value is even │ 2 │
│ value is uneven │ 1 │
│ value is uneven │ 1 │
│ value is uneven │ 1 │
│ value is uneven │ 1 │
│ value is uneven │ 1 │
└─────────────────┴───────────────────────────────────────────────────────┘
percent_rank
签名
percent_rank(self: _goose.GoosePyRelation, window_spec: str, projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the relative rank within the partition
参数
-
window_spec : str
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.percent_rank(window_spec="over (partition by description order by value)", projected_columns="description, value")
结果
┌─────────────────┬───────┬─────────────────────────────────────────────────────────────────┐
│ description │ value │ percent_rank() OVER (PARTITION BY description ORDER BY "value") │
│ varchar │ int64 │ double │
├─────────────────┼───────┼─────────────────────────────────────────────────────────────────┤
│ value is even │ 2 │ 0.0 │
│ value is even │ 4 │ 0.3333333333333333 │
│ value is even │ 6 │ 0.6666666666666666 │
│ value is even │ 8 │ 1.0 │
│ value is uneven │ 1 │ 0.0 │
│ value is uneven │ 3 │ 0.25 │
│ value is uneven │ 5 │ 0.5 │
│ value is uneven │ 7 │ 0.75 │
│ value is uneven │ 9 │ 1.0 │
└─────────────────┴───────┴─────────────────────────────────────────────────────────────────┘
product
签名
product(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Returns the product of all values present in a given column
参数
-
column : str
The column name to calculate the product of.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.product(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────┐
│ description │ product("value") │
│ varchar │ double │
├─────────────────┼──────────────────┤
│ value is uneven │ 945.0 │
│ value is even │ 384.0 │
└─────────────────┴──────────────────┘
quantile
签名
quantile(self: _goose.GoosePyRelation, column: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the exact quantile value for a given column
参数
-
column : str
The column name to compute the quantile for.
-
q : object, default: 0.5
The quantile value to compute (e.g., 0.5 for median).
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.quantile(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────────────────────┐
│ description │ quantile_disc("value", 0.500000) │
│ varchar │ int64 │
├─────────────────┼──────────────────────────────────┤
│ value is uneven │ 5 │
│ value is even │ 4 │
└─────────────────┴──────────────────────────────────┘
quantile_cont
签名
quantile_cont(self: _goose.GoosePyRelation, column: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the interpolated quantile value for a given column
参数
-
column : str
The column name to compute the continuous quantile for.
-
q : object, default: 0.5
The quantile value to compute (e.g., 0.5 for median).
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.quantile_cont(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────────────────────┐
│ description │ quantile_cont("value", 0.500000) │
│ varchar │ double │
├─────────────────┼──────────────────────────────────┤
│ value is even │ 5.0 │
│ value is uneven │ 5.0 │
└─────────────────┴──────────────────────────────────┘
quantile_disc
签名
quantile_disc(self: _goose.GoosePyRelation, column: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the exact quantile value for a given column
参数
-
column : str
The column name to compute the discrete quantile for.
-
q : object, default: 0.5
The quantile value to compute (e.g., 0.5 for median).
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.quantile_disc(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────────────────────┐
│ description │ quantile_disc("value", 0.500000) │
│ varchar │ int64 │
├─────────────────┼──────────────────────────────────┤
│ value is even │ 4 │
│ value is uneven │ 5 │
└─────────────────┴──────────────────────────────────┘
rank
签名
rank(self: _goose.GoosePyRelation, window_spec: str, projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the rank within the partition
参数
-
window_spec : str
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.rank(window_spec="over (partition by description order by value)", projected_columns="description, value")
结果
┌─────────────────┬───────┬─────────────────────────────────────────────────────────┐
│ description │ value │ rank() OVER (PARTITION BY description ORDER BY "value") │
│ varchar │ int64 │ int64 │
├─────────────────┼───────┼─────────────────────────────────────────────────────────┤
│ value is uneven │ 1 │ 1 │
│ value is uneven │ 3 │ 2 │
│ value is uneven │ 5 │ 3 │
│ value is uneven │ 7 │ 4 │
│ value is uneven │ 9 │ 5 │
│ value is even │ 2 │ 1 │
│ value is even │ 4 │ 2 │
│ value is even │ 6 │ 3 │
│ value is even │ 8 │ 4 │
└─────────────────┴───────┴─────────────────────────────────────────────────────────┘
rank_dense
签名
rank_dense(self: _goose.GoosePyRelation, window_spec: str, projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the dense rank within the partition
Aliases: dense_rank
参数
-
window_spec : str
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.rank_dense(window_spec="over (partition by description order by value)", projected_columns="description, value")
结果
┌─────────────────┬───────┬───────────────────────────────────────────────────────────────┐
│ description │ value │ dense_rank() OVER (PARTITION BY description ORDER BY "value") │
│ varchar │ int64 │ int64 │
├─────────────────┼───────┼───────────────────────────────────────────────────────────────┤
│ value is uneven │ 1 │ 1 │
│ value is uneven │ 3 │ 2 │
│ value is uneven │ 5 │ 3 │
│ value is uneven │ 7 │ 4 │
│ value is uneven │ 9 │ 5 │
│ value is even │ 2 │ 1 │
│ value is even │ 4 │ 2 │
│ value is even │ 6 │ 3 │
│ value is even │ 8 │ 4 │
└─────────────────┴───────┴───────────────────────────────────────────────────────────────┘
row_number
签名
row_number(self: _goose.GoosePyRelation, window_spec: str, projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the row number within the partition
参数
-
window_spec : str
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.row_number(window_spec="over (partition by description order by value)", projected_columns="description, value")
结果
┌─────────────────┬───────┬───────────────────────────────────────────────────────────────┐
│ description │ value │ row_number() OVER (PARTITION BY description ORDER BY "value") │
│ varchar │ int64 │ int64 │
├─────────────────┼───────┼───────────────────────────────────────────────────────────────┤
│ value is uneven │ 1 │ 1 │
│ value is uneven │ 3 │ 2 │
│ value is uneven │ 5 │ 3 │
│ value is uneven │ 7 │ 4 │
│ value is uneven │ 9 │ 5 │
│ value is even │ 2 │ 1 │
│ value is even │ 4 │ 2 │
│ value is even │ 6 │ 3 │
│ value is even │ 8 │ 4 │
└─────────────────┴───────┴───────────────────────────────────────────────────────────────┘
select_dtypes
签名
select_dtypes(self: _goose.GoosePyRelation, types: object) -> _goose.GoosePyRelation
说明
Select columns from the relation, by filtering based on type(s)
Aliases: select_types
参数
-
types : object
Data type(s) to select columns by. Can be a single type or a collection of types.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.select_dtypes(types=[goose.sqltypes.VARCHAR]).distinct()
结果
┌─────────────────┐
│ description │
│ varchar │
├─────────────────┤
│ value is even │
│ value is uneven │
└─────────────────┘
select_types
签名
select_types(self: _goose.GoosePyRelation, types: object) -> _goose.GoosePyRelation
说明
Select columns from the relation, by filtering based on type(s)
Aliases: select_dtypes
参数
-
types : object
Data type(s) to select columns by. Can be a single type or a collection of types.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.select_types(types=[goose.sqltypes.VARCHAR]).distinct()
结果
┌─────────────────┐
│ description │
│ varchar │
├─────────────────┤
│ value is even │
│ value is uneven │
└─────────────────┘
std
签名
std(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the sample standard deviation for a given column
Aliases: stddev, stddev_samp
参数
-
column : str
The column name to calculate the standard deviation for.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.std(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────────┐
│ description │ stddev_samp("value") │
│ varchar │ double │
├─────────────────┼──────────────────────┤
│ value is uneven │ 3.1622776601683795 │
│ value is even │ 2.581988897471611 │
└─────────────────┴──────────────────────┘
stddev
签名
stddev(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the sample standard deviation for a given column
Aliases: std, stddev_samp
参数
-
column : str
The column name to calculate the standard deviation for.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.stddev(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────────┐
│ description │ stddev_samp("value") │
│ varchar │ double │
├─────────────────┼──────────────────────┤
│ value is even │ 2.581988897471611 │
│ value is uneven │ 3.1622776601683795 │
└─────────────────┴──────────────────────┘
stddev_pop
签名
stddev_pop(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the population standard deviation for a given column
参数
-
column : str
The column name to calculate the standard deviation for.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.stddev_pop(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬─────────────────────┐
│ description │ stddev_pop("value") │
│ varchar │ double │
├─────────────────┼─────────────────────┤
│ value is even │ 2.23606797749979 │
│ value is uneven │ 2.8284271247461903 │
└─────────────────┴─────────────────────┘
stddev_samp
签名
stddev_samp(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the sample standard deviation for a given column
参数
-
column : str
The column name to calculate the standard deviation for.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.stddev_samp(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────────┐
│ description │ stddev_samp("value") │
│ varchar │ double │
├─────────────────┼──────────────────────┤
│ value is even │ 2.581988897471611 │
│ value is uneven │ 3.1622776601683795 │
└─────────────────┴──────────────────────┘
string_agg
签名
string_agg(self: _goose.GoosePyRelation, column: str, sep: str = ',', groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Concatenates the values present in a given column with a separator
参数
-
column : str
The column name to concatenate values from.
-
sep : str, default: ','
Separator string to use between concatenated values.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.string_agg(column="value", sep=",", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────────────┐
│ description │ string_agg("value", ',') │
│ varchar │ varchar │
├─────────────────┼──────────────────────────┤
│ value is even │ 2,4,6,8 │
│ value is uneven │ 1,3,5,7,9 │
└─────────────────┴──────────────────────────┘
sum
签名
sum(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the sum of all values present in a given column
参数
-
column : str
The column name to calculate the sum for.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.sum(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────┐
│ description │ sum("value") │
│ varchar │ int128 │
├─────────────────┼──────────────┤
│ value is even │ 20 │
│ value is uneven │ 25 │
└─────────────────┴──────────────┘
unique
签名
unique(self: _goose.GoosePyRelation, unique_aggr: str) -> _goose.GoosePyRelation
说明
Returns the distinct values in a column.
参数
-
unique_aggr : str
The column to get the distinct values for.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.unique(unique_aggr="description")
结果
┌─────────────────┐
│ description │
│ varchar │
├─────────────────┤
│ value is even │
│ value is uneven │
└─────────────────┘
value_counts
签名
value_counts(self: _goose.GoosePyRelation, column: str, groups: str = '') -> _goose.GoosePyRelation
说明
Computes the number of elements present in a given column, also projecting the original column
参数
-
column : str
The column name to count values from.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.value_counts(column="description", groups="description")
结果
┌─────────────────┬────────────────────┐
│ description │ count(description) │
│ varchar │ int64 │
├─────────────────┼────────────────────┤
│ value is uneven │ 5 │
│ value is even │ 4 │
└─────────────────┴────────────────────┘
var
签名
var(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the sample variance for a given column
参数
-
column : str
The column name to calculate the sample variance for.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.var(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬───────────────────┐
│ description │ var_samp("value") │
│ varchar │ double │
├─────────────────┼───────────────────┤
│ value is even │ 6.666666666666667 │
│ value is uneven │ 10.0 │
└─────────────────┴───────────────────┘
var_pop
签名
var_pop(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the population variance for a given column
参数
-
column : str
The column name to calculate the population variance for.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.var_pop(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬──────────────────┐
│ description │ var_pop("value") │
│ varchar │ double │
├─────────────────┼──────────────────┤
│ value is even │ 5.0 │
│ value is uneven │ 8.0 │
└─────────────────┴──────────────────┘
var_samp
签名
var_samp(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the sample variance for a given column
参数
-
column : str
The column name to calculate the sample variance for.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.var_samp(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬───────────────────┐
│ description │ var_samp("value") │
│ varchar │ double │
├─────────────────┼───────────────────┤
│ value is even │ 6.666666666666667 │
│ value is uneven │ 10.0 │
└─────────────────┴───────────────────┘
variance
签名
variance(self: _goose.GoosePyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _goose.GoosePyRelation
说明
Computes the sample variance for a given column
参数
-
column : str
The column name to calculate the sample variance for.
-
groups : str, default: ''
Comma-separated list of columns to include in the
group by. -
window_spec : str, default: ''
Optional window specification for window functions, provided as
over (partition by ... order by ...) -
projected_columns : str, default: ''
Comma-separated list of columns to include in the result.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.variance(column="value", groups="description", projected_columns="description")
结果
┌─────────────────┬───────────────────┐
│ description │ var_samp("value") │
│ varchar │ double │
├─────────────────┼───────────────────┤
│ value is even │ 6.666666666666667 │
│ value is uneven │ 10.0 │
└─────────────────┴───────────────────┘
输出
This section contains the functions which will trigger an SQL execution and retrieve the data.
| 名称 | 说明 |
|---|---|
arrow | Execute and return an Arrow Record Batch Reader that yields all rows |
close | Closes the result |
create | Creates a new table named table_name with the contents of the relation object |
create_view | Creates a view named view_name that refers to the relation object |
df | Execute and fetch all rows as a pandas DataFrame |
execute | Transform the relation into a result set |
fetch_arrow_reader | Execute and return an Arrow Record Batch Reader that yields all rows |
fetch_arrow_table | Execute and fetch all rows as an Arrow Table |
fetch_df_chunk | Execute and fetch a chunk of the rows |
fetch_record_batch | Execute and return an Arrow Record Batch Reader that yields all rows |
fetchall | Execute and fetch all rows as a list of tuples |
fetchdf | Execute and fetch all rows as a pandas DataFrame |
fetchmany | Execute and fetch the next set of rows as a list of tuples |
fetchnumpy | Execute and fetch all rows as a Python dict mapping each column to one numpy arrays |
fetchone | Execute and fetch a single row as a tuple |
pl | Execute and fetch all rows as a Polars DataFrame |
record_batch | record_batch(self: object, batch_size: typing.SupportsInt = 1000000) -> object |
tf | Fetch a result as dict of TensorFlow Tensors |
to_arrow_table | Execute and fetch all rows as an Arrow Table |
to_csv | Write the relation object to a CSV file in 'file_name' |
to_df | Execute and fetch all rows as a pandas DataFrame |
to_parquet | Write the relation object to a Parquet file in 'file_name' |
to_table | Creates a new table named table_name with the contents of the relation object |
to_view | Creates a view named view_name that refers to the relation object |
torch | Fetch a result as dict of PyTorch Tensors |
write_csv | Write the relation object to a CSV file in 'file_name' |
write_parquet | Write the relation object to a Parquet file in 'file_name' |
arrow
签名
arrow(self: _goose.GoosePyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader
说明
Execute and return an Arrow Record Batch Reader that yields all rows
Aliases: fetch_arrow_table, to_arrow_table
参数
-
batch_size : int, default: 1000000
The batch size of writing the data to the Arrow table
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
pa_table = rel.arrow()
pa_table
结果
pyarrow.Table
id: string
description: string
value: int64
created_timestamp: timestamp[us, tz=Europe/Amsterdam]
----
id: [["3ac9e0ba-8390-4a02-ad72-33b1caea6354","8b844392-1404-4bbc-b731-120f42c8ca27","ca5584ca-8e97-4fca-a295-ae3c16c32f5b","926d071e-5f64-488f-ae02-d19e315f9f5c","aabeedf0-5783-4eff-9963-b3967a6ea5d8","1f20db9a-bee8-4b65-b7e8-e7c36b5b8fee","795c678e-3524-4b52-96ec-7b48c24eeab1","9ffbd403-169f-4fe4-bc41-09751066f1f1","8fdb0a60-29f0-4f5b-afcc-c736a03cd083"]]
description: [["value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven"]]
value: [[1,2,3,4,5,6,7,8,9]]
created_timestamp: [[2025-04-10 09:07:12.614000Z,2025-04-10 09:08:12.614000Z,2025-04-10 09:09:12.614000Z,2025-04-10 09:10:12.614000Z,2025-04-10 09:11:12.614000Z,2025-04-10 09:12:12.614000Z,2025-04-10 09:13:12.614000Z,2025-04-10 09:14:12.614000Z,2025-04-10 09:15:12.614000Z]]
close
签名
close(self: _goose.GoosePyRelation) -> None
说明
Closes the result
create
签名
create(self: _goose.GoosePyRelation, table_name: str) -> None
说明
Creates a new table named table_name with the contents of the relation object
Aliases: to_table
参数
-
table_name : str
The name of the table to be created. There shouldn't be any other table with the same name.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.create("table_code_example")
goose_conn.table("table_code_example").limit(1)
结果
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 3ac9e0ba-8390-4a02-ad72-33b1caea6354 │ value is uneven │ 1 │ 2025-04-10 11:07:12.614+02 │
└──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘
create_view
签名
create_view(self: _goose.GoosePyRelation, view_name: str, replace: bool = True) -> _goose.GoosePyRelation
说明
Creates a view named view_name that refers to the relation object
Aliases: to_view
参数
-
view_name : str
The name of the view to be created.
-
replace : bool, default: True
If the view should be created with
CREATE OR REPLACE. When set toFalse, there shouldn't be another view with the sameview_name.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.create_view("view_code_example", replace=True)
goose_conn.table("view_code_example").limit(1)
结果
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 3ac9e0ba-8390-4a02-ad72-33b1caea6354 │ value is uneven │ 1 │ 2025-04-10 11:07:12.614+02 │
└──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘
df
签名
df(self: _goose.GoosePyRelation, *, date_as_object: bool = False) -> pandas.DataFrame
说明
Execute and fetch all rows as a pandas DataFrame
参数
-
date_as_object : bool, default: False
If the date columns should be interpreted as Python date objects.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.df()
结果
id description value created_timestamp
0 3ac9e0ba-8390-4a02-ad72-33b1caea6354 value is uneven 1 2025-04-10 11:07:12.614000+02:00
1 8b844392-1404-4bbc-b731-120f42c8ca27 value is even 2 2025-04-10 11:08:12.614000+02:00
2 ca5584ca-8e97-4fca-a295-ae3c16c32f5b value is uneven 3 2025-04-10 11:09:12.614000+02:00
...
execute
签名
execute(self: _goose.GoosePyRelation) -> _goose.GoosePyRelation
说明
Transform the relation into a result set
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.execute()
结果
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│ id │ description │ value │ created_timestamp │
│ uuid │ varchar │ int64 │ timestamp with time zone │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 3ac9e0ba-8390-4a02-ad72-33b1caea6354 │ value is uneven │ 1 │ 2025-04-10 11:07:12.614+02 │
│ 8b844392-1404-4bbc-b731-120f42c8ca27 │ value is even │ 2 │ 2025-04-10 11:08:12.614+02 │
│ ca5584ca-8e97-4fca-a295-ae3c16c32f5b │ value is uneven │ 3 │ 2025-04-10 11:09:12.614+02 │
fetch_arrow_reader
签名
fetch_arrow_reader(self: _goose.GoosePyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader
说明
Execute and return an Arrow Record Batch Reader that yields all rows
参数
-
batch_size : int, default: 1000000
The batch size for fetching the data.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
pa_reader = rel.fetch_arrow_reader(batch_size=1)
pa_reader.read_next_batch()
结果
pyarrow.RecordBatch
id: string
description: string
value: int64
created_timestamp: timestamp[us, tz=Europe/Amsterdam]
----
id: ["e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd"]
description: ["value is even"]
value: [2]
created_timestamp: [2025-04-10 09:25:51.259000Z]
fetch_arrow_table
签名
fetch_arrow_table(self: _goose.GoosePyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.Table
说明
Execute and fetch all rows as an Arrow Table
Aliases: arrow, to_arrow_table
参数
-
batch_size : int, default: 1000000
The batch size for fetching the data.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.fetch_arrow_table()
结果
pyarrow.Table
id: string
description: string
value: int64
created_timestamp: timestamp[us, tz=Europe/Amsterdam]
----
id: [["1587b4b0-3023-49fe-82cf-06303ca136ac","e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd","3f8ad67a-290f-4a22-b41b-0173b8e45afa","9a4e37ef-d8bd-46dd-ab01-51cf4973549f","12baa624-ebc9-45ae-b73e-6f4029e31d2d","56d41292-53cc-48be-a1b8-e1f5d6ca5581","1accca18-c950-47c1-9108-aef8afbd5249","56d8db75-72c4-4d40-90d2-a3c840579c37","e19f6201-8646-401c-b019-e37c42c39632"]]
description: [["value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven"]]
value: [[1,2,3,4,5,6,7,8,9]]
created_timestamp: [[2025-04-10 09:24:51.259000Z,2025-04-10 09:25:51.259000Z,2025-04-10 09:26:51.259000Z,2025-04-10 09:27:51.259000Z,2025-04-10 09:28:51.259000Z,2025-04-10 09:29:51.259000Z,2025-04-10 09:30:51.259000Z,2025-04-10 09:31:51.259000Z,2025-04-10 09:32:51.259000Z]]
fetch_df_chunk
签名
fetch_df_chunk(self: _goose.GoosePyRelation, vectors_per_chunk: typing.SupportsInt = 1, *, date_as_object: bool = False) -> pandas.DataFrame
说明
Execute and fetch a chunk of the rows
参数
-
vectors_per_chunk : int, default: 1
Number of data chunks to be processed before converting to dataframe.
-
date_as_object : bool, default: False
If the date columns should be interpreted as Python date objects.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.fetch_df_chunk()
结果
id description value created_timestamp
0 1587b4b0-3023-49fe-82cf-06303ca136ac value is uneven 1 2025-04-10 11:24:51.259000+02:00
1 e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd value is even 2 2025-04-10 11:25:51.259000+02:00
2 3f8ad67a-290f-4a22-b41b-0173b8e45afa value is uneven 3 2025-04-10 11:26:51.259000+02:00
...
fetch_record_batch
签名
fetch_record_batch(self: _goose.GoosePyRelation, rows_per_batch: typing.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader
说明
Execute and return an Arrow Record Batch Reader that yields all rows
Deprecated
fetch_record_batch()is deprecated since 1.4.0. Userecord_batch()instead.
Aliases: record_batch
参数
-
rows_per_batch : int, default: 1000000
The number of rows per batch.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
pa_reader = rel.fetch_record_batch(rows_per_batch=1)
pa_reader.read_next_batch()
结果
pyarrow.RecordBatch
id: string
description: string
value: int64
created_timestamp: timestamp[us, tz=Europe/Amsterdam]
----
id: ["908cf67c-a086-4b94-9017-2089a83e4a6c"]
description: ["value is uneven"]
value: [1]
created_timestamp: [2025-04-10 09:52:55.249000Z]
fetchall
签名
fetchall(self: _goose.GoosePyRelation) -> list
说明
Execute and fetch all rows as a list of tuples
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.limit(1).fetchall()
结果
[(UUID('1587b4b0-3023-49fe-82cf-06303ca136ac'),
'value is uneven',
1,
datetime.datetime(2025, 4, 10, 11, 24, 51, 259000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
fetchdf
签名
fetchdf(self: _goose.GoosePyRelation, *, date_as_object: bool = False) -> pandas.DataFrame
说明
Execute and fetch all rows as a pandas DataFrame
参数
-
date_as_object : bool, default: False
If the date columns should be interpreted as Python date objects.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.fetchdf()
结果
id description value created_timestamp
0 1587b4b0-3023-49fe-82cf-06303ca136ac value is uneven 1 2025-04-10 11:24:51.259000+02:00
1 e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd value is even 2 2025-04-10 11:25:51.259000+02:00
2 3f8ad67a-290f-4a22-b41b-0173b8e45afa value is uneven 3 2025-04-10 11:26:51.259000+02:00
...
fetchmany
签名
fetchmany(self: _goose.GoosePyRelation, size: typing.SupportsInt = 1) -> list
说明
Execute and fetch the next set of rows as a list of tuples
Warning Executing any operation during the retrieval of the data from an aggregate relation, will close the result set.
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
agg_rel = rel.aggregate("value")
while res := agg_rel.fetchmany(size=1):
print(res)
rel.show()
参数
-
size : int, default: 1
The number of records to be fetched.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
while res := rel.fetchmany(size=1):
print(res)
结果
[(UUID('cf4c5e32-d0aa-4699-a3ee-0092e900f263'), 'value is uneven', 1, datetime.datetime(2025, 4, 30, 16, 23, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('cec335ac-24ac-49a3-ae9a-bb35f71fc88d'), 'value is even', 2, datetime.datetime(2025, 4, 30, 16, 24, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('2423295d-9bb0-453c-a385-21bdacba03b6'), 'value is uneven', 3, datetime.datetime(2025, 4, 30, 16, 25, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('88806b21-192d-41e7-a293-c789aad636ba'), 'value is even', 4, datetime.datetime(2025, 4, 30, 16, 26, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('05837a28-dacf-4121-88a6-a374aefb8a07'), 'value is uneven', 5, datetime.datetime(2025, 4, 30, 16, 27, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('b9c1f7e9-6156-4554-b80e-67d3b5d810bb'), 'value is even', 6, datetime.datetime(2025, 4, 30, 16, 28, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('4709c7fa-d286-4864-bb48-69748b447157'), 'value is uneven', 7, datetime.datetime(2025, 4, 30, 16, 29, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('30e48457-b103-4fa5-95cf-1c7f0143335b'), 'value is even', 8, datetime.datetime(2025, 4, 30, 16, 30, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('036b7f4b-bd78-4ffb-a351-964d93f267b7'), 'value is uneven', 9, datetime.datetime(2025, 4, 30, 16, 31, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
fetchnumpy
签名
fetchnumpy(self: _goose.GoosePyRelation) -> dict
说明
Execute and fetch all rows as a Python dict mapping each column to one numpy arrays
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.fetchnumpy()
结果
{'id': array([UUID('1587b4b0-3023-49fe-82cf-06303ca136ac'),
UUID('e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd'),
UUID('3f8ad67a-290f-4a22-b41b-0173b8e45afa'),
UUID('9a4e37ef-d8bd-46dd-ab01-51cf4973549f'),
UUID('12baa624-ebc9-45ae-b73e-6f4029e31d2d'),
UUID('56d41292-53cc-48be-a1b8-e1f5d6ca5581'),
UUID('1accca18-c950-47c1-9108-aef8afbd5249'),
UUID('56d8db75-72c4-4d40-90d2-a3c840579c37'),
UUID('e19f6201-8646-401c-b019-e37c42c39632')], dtype=object),
'description': array(['value is uneven', 'value is even', 'value is uneven',
'value is even', 'value is uneven', 'value is even',
'value is uneven', 'value is even', 'value is uneven'],
dtype=object),
'value': array([1, 2, 3, 4, 5, 6, 7, 8, 9]),
'created_timestamp': array(['2025-04-10T09:24:51.259000', '2025-04-10T09:25:51.259000',
'2025-04-10T09:26:51.259000', '2025-04-10T09:27:51.259000',
'2025-04-10T09:28:51.259000', '2025-04-10T09:29:51.259000',
'2025-04-10T09:30:51.259000', '2025-04-10T09:31:51.259000',
'2025-04-10T09:32:51.259000'], dtype='datetime64[us]')}
fetchone
签名
fetchone(self: _goose.GoosePyRelation) -> typing.Optional[tuple]
说明
Execute and fetch a single row as a tuple
Warning Executing any operation during the retrieval of the data from an aggregate relation, will close the result set.
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
agg_rel = rel.aggregate("value")
while res := agg_rel.fetchone():
print(res)
rel.show()
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
while res := rel.fetchone():
print(res)
结果
(UUID('fe036411-f4c7-4f52-9ddd-80cd2bb56613'), 'value is uneven', 1, datetime.datetime(2025, 4, 30, 12, 59, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('466c9b43-e9f0-4237-8f26-155f259a5b59'), 'value is even', 2, datetime.datetime(2025, 4, 30, 13, 0, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('5755cf16-a94f-41ef-a16d-21e856d71f9f'), 'value is uneven', 3, datetime.datetime(2025, 4, 30, 13, 1, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('05b52c93-bd68-45e1-b02a-a08d682c33d5'), 'value is even', 4, datetime.datetime(2025, 4, 30, 13, 2, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('cf61ef13-2840-4541-900d-f493767d7622'), 'value is uneven', 5, datetime.datetime(2025, 4, 30, 13, 3, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('033e7c68-e800-4ee8-9787-6cf50aabc27b'), 'value is even', 6, datetime.datetime(2025, 4, 30, 13, 4, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('8b8d6545-ff54-45d6-b69a-97edb63dfe43'), 'value is uneven', 7, datetime.datetime(2025, 4, 30, 13, 5, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('7da79dfe-b29c-462b-a414-9d5e3cc80139'), 'value is even', 8, datetime.datetime(2025, 4, 30, 13, 6, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('f83ffff2-33b9-4f86-9d14-46974b546bab'), 'value is uneven', 9, datetime.datetime(2025, 4, 30, 13, 7, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
pl
签名
pl(self: _goose.GoosePyRelation, batch_size: typing.SupportsInt = 1000000, *, lazy: bool = False) -> goose::PolarsDataFrame
说明
Execute and fetch all rows as a Polars DataFrame
参数
-
batch_size : int, default: 1000000
The number of records to be fetched per batch.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.pl(batch_size=1)
结果
shape: (9, 4)
┌─────────────────────────────────┬─────────────────┬───────┬────────────────────────────────┐
│ id ┆ description ┆ value ┆ created_timestamp │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ datetime[μs, Europe/Amsterdam] │
╞═════════════════════════════════╪═════════════════╪═══════╪════════════════════════════════╡
│ b2f92c3c-9372-49f3-897f-2c86fc… ┆ value is uneven ┆ 1 ┆ 2025-04-10 11:49:51.886 CEST │
record_batch
说明
record_batch(self: object, batch_size: typing.SupportsInt = 1000000) -> object
参数
-
batch_size : int, default: 1000000
The batch size for fetching the data.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
pa_batch = rel.record_batch(batch_size=1)
pa_batch.read_next_batch()
结果
pyarrow.RecordBatch
id: string
description: string
value: int64
created_timestamp: timestamp[us, tz=Europe/Amsterdam]
----
id: ["908cf67c-a086-4b94-9017-2089a83e4a6c"]
description: ["value is uneven"]
value: [1]
created_timestamp: [2025-04-10 09:52:55.249000Z]
tf
签名
tf(self: _goose.GoosePyRelation) -> dict
说明
Fetch a result as dict of TensorFlow Tensors
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.select("description, value").tf()
结果
{'description': <tf.Tensor: shape=(9,), dtype=string, numpy=
array([b'value is uneven', b'value is even', b'value is uneven',
b'value is even', b'value is uneven', b'value is even',
b'value is uneven', b'value is even', b'value is uneven'],
dtype=object)>,
'value': <tf.Tensor: shape=(9,), dtype=int64, numpy=array([1, 2, 3, 4, 5, 6, 7, 8, 9])>}
to_arrow_table
签名
to_arrow_table(self: _goose.GoosePyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.Table
说明
Execute and fetch all rows as an Arrow Table
Aliases: fetch_arrow_table, arrow
参数
-
batch_size : int, default: 1000000
The batch size for fetching the data.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.to_arrow_table()
结果
pyarrow.Table
id: string
description: string
value: int64
created_timestamp: timestamp[us, tz=Europe/Amsterdam]
----
id: [["86b2011d-3818-426f-a41e-7cd5c7321f79","07fa4f89-0bba-4049-9acd-c933332a66d5","f2f1479e-f582-4fe4-b82f-9b753b69634c","529d3c63-5961-4adb-b0a8-8249188fc82a","aa9eea7d-7fac-4dcf-8f32-4a0b5d64f864","4852aa32-03f2-40d3-8006-b8213904775a","c0127203-f2e3-4925-9810-655bc02a3c19","2a1356ba-5707-44d6-a492-abd0a67e5efb","800a1c24-231c-4dae-bd68-627654c8a110"]]
description: [["value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven"]]
value: [[1,2,3,4,5,6,7,8,9]]
created_timestamp: [[2025-04-10 09:54:24.015000Z,2025-04-10 09:55:24.015000Z,2025-04-10 09:56:24.015000Z,2025-04-10 09:57:24.015000Z,2025-04-10 09:58:24.015000Z,2025-04-10 09:59:24.015000Z,2025-04-10 10:00:24.015000Z,2025-04-10 10:01:24.015000Z,2025-04-10 10:02:24.015000Z]]
to_csv
签名
to_csv(self: _goose.GoosePyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None) -> None
说明
Write the relation object to a CSV file in 'file_name'
Aliases: write_csv
参数
-
file_name : str
The name of the output CSV file.
-
sep : str, default: ','
Field delimiter for the output file.
-
na_rep : str, default: ''
Missing data representation.
-
header : bool, default: True
Whether to write column headers.
-
quotechar : str, default: '"'
Character used to quote fields containing special characters.
-
escapechar : str, default: None
Character used to escape the delimiter if quoting is set to QUOTE_NONE.
-
date_format : str, default: None
Custom format string for DATE values.
-
timestamp_format : str, default: None
Custom format string for TIMESTAMP values.
-
quoting : int, default: csv.QUOTE_MINIMAL
Control field quoting behavior (e.g., QUOTE_MINIMAL, QUOTE_ALL).
-
encoding : str, default: 'utf-8'
Character encoding for the output file.
-
compression : str, default: auto
Compression type (e.g., 'gzip', 'bz2', 'zstd').
-
overwrite : bool, default: False
When true, all existing files inside targeted directories will be removed (not supported on remote filesystems). Only has an effect when used with
partition_by. -
per_thread_output : bool, default: False
When
true, write one file per thread, rather than one file in total. This allows for faster parallel writing. -
use_tmp_file : bool, default: False
Write to a temporary file before renaming to final name to avoid partial writes.
-
partition_by : list[str], default: None
List of column names to partition output by (creates folder structure).
-
write_partition_columns : bool, default: False
Whether or not to write partition columns into files. Only has an effect when used with
partition_by.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.to_csv("code_example.csv")
结果
The data is exported to a CSV file, named code_example.csv
to_df
签名
to_df(self: _goose.GoosePyRelation, *, date_as_object: bool = False) -> pandas.DataFrame
说明
Execute and fetch all rows as a pandas DataFrame
参数
-
date_as_object : bool, default: False
If the date columns should be interpreted as Python date objects.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.to_df()
结果
id description value created_timestamp
0 e1f79925-60fd-4ee2-ae67-5eff6b0543d1 value is uneven 1 2025-04-10 11:56:04.452000+02:00
1 caa619d4-d79c-4c00-b82e-9319b086b6f8 value is even 2 2025-04-10 11:57:04.452000+02:00
2 64c68032-99b9-4e8f-b4a3-6c522d5419b3 value is uneven 3 2025-04-10 11:58:04.452000+02:00
...
to_parquet
签名
to_parquet(self: _goose.GoosePyRelation, file_name: str, *, compression: object = None, field_ids: object = None, row_group_size_bytes: object = None, row_group_size: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None, append: object = None, filename_pattern: object = None, file_size_bytes: object = None) -> None
说明
Write the relation object to a Parquet file in 'file_name'
Aliases: write_parquet
参数
-
file_name : str
The name of the output Parquet file.
-
compression : str, default: 'snappy'
The compression format to use (
uncompressed,snappy,gzip,zstd,brotli,lz4,lz4_raw). -
field_ids : STRUCT
The field_id for each column. Pass auto to attempt to infer automatically.
-
row_group_size_bytes : int, default: row_group_size * 1024
The target size of each row group. You can pass either a human-readable string, e.g., 2MB, or an integer, i.e., the number of bytes. This option is only used when you have issued
SET preserve_insertion_order = false;, otherwise, it is ignored. -
row_group_size : int, default: 122880
The target size, i.e., number of rows, of each row group.
-
overwrite : bool, default: False
If True, overwrite the file if it exists.
-
per_thread_output : bool, default: False
When
True, write one file per thread, rather than one file in total. This allows for faster parallel writing. -
use_tmp_file : bool, default: False
Write to a temporary file before renaming to final name to avoid partial writes.
-
partition_by : list[str], default: None
List of column names to partition output by (creates folder structure).
-
write_partition_columns : bool, default: False
Whether or not to write partition columns into files. Only has an effect when used with
partition_by. -
append : bool, default: False
When
True, in the event a filename pattern is generated that already exists, the path will be regenerated to ensure no existing files are overwritten. Only has an effect when used withpartition_by.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.to_parquet("code_example.parquet")
结果
The data is exported to a Parquet file, named code_example.parquet
to_table
签名
to_table(self: _goose.GoosePyRelation, table_name: str) -> None
说明
Creates a new table named table_name with the contents of the relation object
Aliases: create
参数
-
table_name : str
The name of the table to be created. There shouldn't be any other table with the same name.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.to_table("table_code_example")
结果
A table, named table_code_example, is created with the data of the relation
to_view
签名
to_view(self: _goose.GoosePyRelation, view_name: str, replace: bool = True) -> _goose.GoosePyRelation
说明
Creates a view named view_name that refers to the relation object
Aliases: create_view
参数
-
view_name : str
The name of the view to be created.
-
replace : bool, default: True
If the view should be created with
CREATE OR REPLACE. When set toFalse, there shouldn't be another view with the sameview_name.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.to_view("view_code_example", replace=True)
结果
A view, named view_code_example, is created with the query definition of the relation
torch
签名
torch(self: _goose.GoosePyRelation) -> dict
说明
Fetch a result as dict of PyTorch Tensors
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.select("value").torch()
结果
{'value': tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])}
write_csv
签名
write_csv(self: _goose.GoosePyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None) -> None
说明
Write the relation object to a CSV file in 'file_name'
Aliases: to_csv
参数
-
file_name : str
The name of the output CSV file.
-
sep : str, default: ','
Field delimiter for the output file.
-
na_rep : str, default: ''
Missing data representation.
-
header : bool, default: True
Whether to write column headers.
-
quotechar : str, default: '"'
Character used to quote fields containing special characters.
-
escapechar : str, default: None
Character used to escape the delimiter if quoting is set to QUOTE_NONE.
-
date_format : str, default: None
Custom format string for DATE values.
-
timestamp_format : str, default: None
Custom format string for TIMESTAMP values.
-
quoting : int, default: csv.QUOTE_MINIMAL
Control field quoting behavior (e.g., QUOTE_MINIMAL, QUOTE_ALL).
-
encoding : str, default: 'utf-8'
Character encoding for the output file.
-
compression : str, default: auto
Compression type (e.g., 'gzip', 'bz2', 'zstd').
-
overwrite : bool, default: False
When true, all existing files inside targeted directories will be removed (not supported on remote filesystems). Only has an effect when used with
partition_by. -
per_thread_output : bool, default: False
When
true, write one file per thread, rather than one file in total. This allows for faster parallel writing. -
use_tmp_file : bool, default: False
Write to a temporary file before renaming to final name to avoid partial writes.
-
partition_by : list[str], default: None
List of column names to partition output by (creates folder structure).
-
write_partition_columns : bool, default: False
Whether or not to write partition columns into files. Only has an effect when used with
partition_by.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.write_csv("code_example.csv")
结果
The data is exported to a CSV file, named code_example.csv
write_parquet
签名
write_parquet(self: _goose.GoosePyRelation, file_name: str, *, compression: object = None, field_ids: object = None, row_group_size_bytes: object = None, row_group_size: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None, append: object = None, filename_pattern: object = None, file_size_bytes: object = None) -> None
说明
Write the relation object to a Parquet file in 'file_name'
Aliases: to_parquet
参数
-
file_name : str
The name of the output Parquet file.
-
compression : str, default: 'snappy'
The compression format to use (
uncompressed,snappy,gzip,zstd,brotli,lz4,lz4_raw). -
field_ids : STRUCT
The field_id for each column. Pass auto to attempt to infer automatically.
-
row_group_size_bytes : int, default: row_group_size * 1024
The target size of each row group. You can pass either a human-readable string, e.g., 2MB, or an integer, i.e., the number of bytes. This option is only used when you have issued
SET preserve_insertion_order = false;, otherwise, it is ignored. -
row_group_size : int, default: 122880
The target size, i.e., number of rows, of each row group.
-
overwrite : bool, default: False
If True, overwrite the file if it exists.
-
per_thread_output : bool, default: False
When
True, write one file per thread, rather than one file in total. This allows for faster parallel writing. -
use_tmp_file : bool, default: False
Write to a temporary file before renaming to final name to avoid partial writes.
-
partition_by : list[str], default: None
List of column names to partition output by (creates folder structure).
-
write_partition_columns : bool, default: False
Whether or not to write partition columns into files. Only has an effect when used with
partition_by. -
append : bool, default: False
When
True, in the event a filename pattern is generated that already exists, the path will be regenerated to ensure no existing files are overwritten. Only has an effect when used withpartition_by.
示例
import goose
goose_conn = goose.connect()
rel = goose_conn.sql("""
select
gen_random_uuid() as id,
concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
range as value,
now() + concat(range,' ', 'minutes')::interval as created_timestamp
from range(1, 10)
"""
)
rel.write_parquet("code_example.parquet")
结果
The data is exported to a Parquet file, named code_example.parquet