Conversion between Goose and Python

This page documents the rules for converting Python objects to Goose and Goose results to Python.

Object Conversion: Python Object to Goose

This is a mapping of Python object types to Goose Logical Types:

None → NULL
bool → BOOLEAN
datetime.timedelta → INTERVAL
str → VARCHAR
bytearray → BLOB
memoryview → BLOB
decimal.Decimal → DECIMAL / DOUBLE
uuid.UUID → UUID

The rest of the conversion rules are as follows.

`int`

Since integers can be of arbitrary size in Python, there is not a one-to-one conversion possible for ints. Instead we perform these casts in order until one succeeds:

BIGINT
INTEGER
UBIGINT
UINTEGER
DOUBLE

When using the Goose Value class, it's possible to set a target type, which will influence the conversion.

`float`

These casts are tried in order until one succeeds:

DOUBLE
FLOAT

`datetime.datetime`

For datetime we will check pandas.isnull if it's available and return NULL if it returns true. We check against datetime.datetime.min and datetime.datetime.max to convert to -inf and +inf respectively.

If the datetime has tzinfo, we will use TIMESTAMPTZ, otherwise it becomes TIMESTAMP.

`datetime.time`

If the time has tzinfo, we will use TIMETZ, otherwise it becomes TIME.

`datetime.date`

date converts to the DATE type. We check against datetime.date.min and datetime.date.max to convert to -inf and +inf respectively.

`bytes`

bytes converts to BLOB by default, when it's used to construct a Value object of type BITSTRING, it maps to BITSTRING instead.

`list`

list becomes a LIST type of the “most permissive” type of its children, for example:

my_list_value = [
    12345,
    "test"
]

Will become VARCHAR[] because 12345 can convert to VARCHAR but test can not convert to INTEGER.

[12345, test]

`dict`

The dict object can convert to either STRUCT(...) or MAP(..., ...) depending on its structure. If the dict has a structure similar to:

import goose

my_map_dict = {
    "key": [
        1, 2, 3
    ],
    "value": [
        "one", "two", "three"
    ]
}

goose.values(my_map_dict)

Then we'll convert it to a MAP of key-value pairs of the two lists zipped together. The example above becomes a MAP(INTEGER, VARCHAR):

┌─────────────────────────┐
│ {1=one, 2=two, 3=three} │
│  map(integer, varchar)  │
├─────────────────────────┤
│ {1=one, 2=two, 3=three} │
└─────────────────────────┘

If the dict is returned by a function, the function will return a MAP, therefore the function return_type has to be specified. Providing a return type which cannot convert to MAP will raise an error:

import goose
goose_conn = goose.connect()

def get_map() -> dict[str,list[str]|list[int]]:
    return {
        "key": [
            1, 2, 3
        ],
        "value": [
            "one", "two", "three"
        ]
    }

goose_conn.create_function("get_map", get_map, return_type=dict[int, str])

goose_conn.sql("select get_map()").show()

goose_conn.create_function("get_map_error", get_map)

goose_conn.sql("select get_map_error()").show()

┌─────────────────────────┐
│        get_map()        │
│  map(bigint, varchar)   │
├─────────────────────────┤
│ {1=one, 2=two, 3=three} │
└─────────────────────────┘

ConversionException: Conversion Error: Type VARCHAR can't be cast as UNION(u1 VARCHAR[], u2 BIGINT[]). VARCHAR can't be implicitly cast to any of the union member types: VARCHAR[], BIGINT[]

The names of the fields matter and the two lists need to have the same size.

Otherwise we'll try to convert it to a STRUCT.

import goose

my_struct_dict = {
    1: "one",
    "2": 2,
    "three": [1, 2, 3],
    False: True
}

goose.values(my_struct_dict)

Becomes:

┌────────────────────────────────────────────────────────────────────┐
│      {'1': 'one', '2': 2, 'three': [1, 2, 3], 'False': true}       │
│ struct("1" varchar, "2" integer, three integer[], "false" boolean) │
├────────────────────────────────────────────────────────────────────┤
│ {'1': one, '2': 2, 'three': [1, 2, 3], 'False': true}              │
└────────────────────────────────────────────────────────────────────┘

If the dict is returned by a function, the function will return a MAP, due to automatic conversion. To return a STRUCT, the return_type has to be provided:

import goose
from goose.sqltypes import BOOLEAN, INTEGER, VARCHAR
from goose import list_type, struct_type

goose_conn = goose.connect()

my_struct_dict = {
    1: "one",
    "2": 2,
    "three": [1, 2, 3],
    False: True
}

def get_struct() -> dict[str|int|bool,str|int|list[int]|bool]:
    return my_struct_dict

goose_conn.create_function("get_struct_as_map", get_struct)

goose_conn.sql("select get_struct_as_map()").show()

goose_conn.create_function("get_struct", get_struct, return_type=struct_type({
    1: VARCHAR,
    "2": INTEGER,
    "three": list_type(INTEGER),
    False: BOOLEAN
}))

goose_conn.sql("select get_struct()").show()

┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                         get_struct_as_map()                                          │
│ map(union(u1 varchar, u2 bigint, u3 boolean), union(u1 varchar, u2 bigint, u3 bigint[], u4 boolean)) │
├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {1=one, 2=2, three=[1, 2, 3], false=true}                                                            │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────┐
│                            get_struct()                            │
│ struct("1" varchar, "2" integer, three integer[], "false" boolean) │
├────────────────────────────────────────────────────────────────────┤
│ {'1': one, '2': 2, 'three': [1, 2, 3], 'False': true}              │
└────────────────────────────────────────────────────────────────────┘

Every key of the dictionary is converted to string.

`tuple`

tuple converts to LIST by default, when it's used to construct a Value object of type STRUCT it will convert to STRUCT instead.

`numpy.ndarray` and `numpy.datetime64`

ndarray and datetime64 are converted by calling tolist() and converting the result of that.

Result Conversion: Goose Results to Python

Goose's Python client provides multiple additional methods that can be used to efficiently retrieve data.

NumPy

fetchnumpy() fetches the data as a dictionary of NumPy arrays

Pandas

df() fetches the data as a Pandas DataFrame
fetchdf() is an alias of df()
fetch_df() is an alias of df()
fetch_df_chunk(vector_multiple) fetches a portion of the results into a DataFrame. The number of rows returned in each chunk is the vector size (2048 by default) * vector_multiple (1 by default).

Apache Arrow

fetch_arrow_table() fetches the data as an Arrow table
to_arrow_table() is an alias of fetch_arrow_table()
arrow() returns an Arrow record batch reader
fetch_record_batch(chunk_size) returns an Arrow record batch reader with chunk_size rows per batch

Polars

pl() fetches the data as a Polars DataFrame

Examples

Below are some examples using this functionality. See the Python guides for more examples.

Fetch as Pandas DataFrame:

df = con.execute("SELECT * FROM items").fetchdf()
print(df)

       item   value  count
   jeans    20.0      1
  hammer    42.2      2
  laptop  2000.0      1
chainsaw   500.0     10
  iphone   300.0      2

Fetch as dictionary of NumPy arrays:

arr = con.execute("SELECT * FROM items").fetchnumpy()
print(arr)

{'item': masked_array(data=['jeans', 'hammer', 'laptop', 'chainsaw', 'iphone'],
             mask=[False, False, False, False, False],
       fill_value='?',
            dtype=object), 'value': masked_array(data=[20.0, 42.2, 2000.0, 500.0, 300.0],
             mask=[False, False, False, False, False],
       fill_value=1e+20), 'count': masked_array(data=[1, 2, 1, 10, 2],
             mask=[False, False, False, False, False],
       fill_value=999999,
            dtype=int32)}

Fetch as an Arrow table. Converting to Pandas afterwards just for pretty printing:

tbl = con.execute("SELECT * FROM items").fetch_arrow_table()
print(tbl.to_pandas())

       item    value  count
   jeans    20.00      1
  hammer    42.20      2
  laptop  2000.00      1
chainsaw   500.00     10
  iphone   300.00      2

Object Conversion: Python Object to Goose​

int​

float​

datetime.datetime​

datetime.time​

datetime.date​

bytes​

list​

dict​

tuple​

numpy.ndarray and numpy.datetime64​

Result Conversion: Goose Results to Python​

NumPy​

Pandas​

Apache Arrow​

Polars​

Examples​