Tables are one of the core data structures in Ibis.

Table

Table(self, arg)

An immutable and lazy dataframe.

Analogous to a SQL table or a pandas DataFrame. A table expression contains an ordered set of named columns, each with a single known type. Unless explicitly ordered with an .order_by(), the order of rows is undefined.

Table immutability means that the data underlying an Ibis Table cannot be modified: every method on a Table returns a new Table with those changes. Laziness means that an Ibis Table expression does not run your computation every time you call one of its methods. Instead, it is a symbolic expression that represents a set of operations to be performed, which typically is translated into a SQL query. That SQL query is then executed on a backend, where the data actually lives. The result (now small enough to be manageable) can then be materialized back into python as a pandas/pyarrow/python DataFrame/Column/scalar.

You will not create Table objects directly. Instead, you will create one

Methods

NameDescription
aliasCreate a table expression with a specific name alias.
as_scalarInform ibis that the table expression should be treated as a scalar.
countCompute the number of rows in the table.
differenceCompute the set difference of multiple table expressions.
distinctReturn a Table with duplicate rows removed.
dropnaDeprecated - use drop_null instead.
fillnaDeprecated - use fill_null instead.
filterSelect rows from table based on predicates.
intersectCompute the set intersection of multiple table expressions.
limitSelect n rows from self starting at offset.
order_bySort a table by one or more expressions.
sampleSample a fraction of rows from a table.
selectCompute a new table expression using exprs and named_exprs.
sqlRun a SQL query against a table expression.
unionCompute the set union of multiple table expressions.
viewCreate a new table expression distinct from the current one.
cacheCache the results of a computation to improve performance on subsequent executions.
into_backendConverts the Expr to a table in the given backend con with an optional table name name.

alias

alias(alias)

Create a table expression with a specific name alias.

This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql method.

.alias creates a temporary view in the database.

This side effect will be removed in a future version of xorq and is not part of the public API.

Parameters

NameTypeDescriptionDefault
aliasstrName of the child expressionrequired

Returns

NameTypeDescription
TableAn table expression

Examples

>>> import letsql as ls
>>> ls.options.interactive = True
>>> t = ls.examples.penguins.fetch()
>>> expr = t.alias("pingüinos").sql('SELECT * FROM "pingüinos" LIMIT 5')
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ … │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼───┤
│ Adelie  │ Torgersen │           39.118.7181.0 │ … │
│ Adelie  │ Torgersen │           39.517.4186.0 │ … │
│ Adelie  │ Torgersen │           40.318.0195.0 │ … │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │ … │
│ Adelie  │ Torgersen │           36.719.3193.0 │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴───┘

as_scalar

as_scalar()

Inform ibis that the table expression should be treated as a scalar.

Note that the table must have exactly one column and one row for this to work. If the table has more than one column an error will be raised in expression construction time. If the table has more than one row an error will be raised by the backend when the expression is executed.

Returns

NameTypeDescription
ScalarA scalar subquery

Examples

>>> import letsql as ls
>>> ls.options.interactive = True
>>> t = ls.examples.penguins.fetch()
>>> heavy_gentoo = t.filter(t.species == "Gentoo", t.body_mass_g > 6200)
>>> from_that_island = t.filter(t.island == heavy_gentoo.select("island").as_scalar())
>>> from_that_island.species.value_counts().order_by("species")
┏━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species ┃ species_count ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ string  │ int64         │
├─────────┼───────────────┤
│ Adelie  │            44
│ Gentoo  │           124
└─────────┴───────────────┘

count

count(where=None)

Compute the number of rows in the table.

Parameters

NameTypeDescriptionDefault
whereir.BooleanValue | NoneOptional boolean expression to filter rows when counting.None

Returns

NameTypeDescription
IntegerScalarNumber of rows in the table

Examples

>>> import letsql as ls
>>> ls.options.interactive = True
>>> t = ls.memtable({"a": ["foo", "bar", "baz"]})
>>> t
┏━━━━━━━━┓
┃ a      ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo    │
│ bar    │
│ baz    │
└────────┘
>>> t.count()
┌───┐
3
└───┘
>>> t.count(t.a != "foo")
┌───┐
2
└───┘
>>> type(t.count())
<class 'letsql.vendor.ibis.expr.types.numeric.IntegerScalar'>

difference

difference(table, *rest, distinct=True)

Compute the set difference of multiple table expressions.

The input tables must have identical schemas.

Parameters

NameTypeDescriptionDefault
tableTableA table expressionrequired
*restTableAdditional table expressions()
distinctboolOnly diff distinct rows not occurring in the calling tableTrue

See Also

ibis.difference

Returns

NameTypeDescription
TableThe rows present in self that are not present in tables.

Examples

>>> import letsql as ls
>>> ls.options.interactive = True
>>> t1 = ls.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
1
2
└───────┘
>>> t2 = ls.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
2
3
└───────┘
>>> t1.difference(t2)
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
1
└───────┘

distinct

distinct(on=None, keep='first')

Return a Table with duplicate rows removed.

Similar to pandas.DataFrame.drop_duplicates().

Some backends do not support keep='last'

Parameters

NameTypeDescriptionDefault
onstr | Iterable[str] | s.Selector | NoneOnly consider certain columns for identifying duplicates. By default, deduplicate all of the columns.None
keepLiteral[‘first’, ‘last’] | NoneDetermines which duplicates to keep. - "first": Drop duplicates except for the first occurrence. - "last": Drop duplicates except for the last occurrence. - None: Drop all duplicates'first'

Examples

>>> import letsql as ls
>>> import letsql.examples as ex
>>> import letsql.selectors as s
>>> ls.options.interactive = True
>>> t = ex.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ … │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼───┤
│ Adelie  │ Torgersen │           39.118.7181.0 │ … │
│ Adelie  │ Torgersen │           39.517.4186.0 │ … │
│ Adelie  │ Torgersen │           40.318.0195.0 │ … │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │ … │
│ Adelie  │ Torgersen │           36.719.3193.0 │ … │
│ Adelie  │ Torgersen │           39.320.6190.0 │ … │
│ Adelie  │ Torgersen │           38.917.8181.0 │ … │
│ Adelie  │ Torgersen │           39.219.6195.0 │ … │
│ Adelie  │ Torgersen │           34.118.1193.0 │ … │
│ Adelie  │ Torgersen │           42.020.2190.0 │ … │
│ …       │ …         │              … │             … │                 … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴───┘

Compute the distinct rows of a subset of columns

>>> t[["species", "island"]].distinct().order_by(s.all())
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species   ┃ island    ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string    │ string    │
├───────────┼───────────┤
│ Adelie    │ Biscoe    │
│ Adelie    │ Dream     │
│ Adelie    │ Torgersen │
│ Chinstrap │ Dream     │
│ Gentoo    │ Biscoe    │
└───────────┴───────────┘

Drop all duplicate rows except the first

>>> t.distinct(on=["species", "island"], keep="first").order_by(s.all())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━┓
┃ species   ┃ island    ┃ bill_length_mm ┃ bill_depth_… ┃ flipper_length_mm ┃  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━┩
│ string    │ string    │ float64        │ float64      │ float64           │  │
├───────────┼───────────┼────────────────┼──────────────┼───────────────────┼──┤
│ Adelie    │ Biscoe    │           37.818.3174.0 │  │
│ Adelie    │ Dream     │           39.516.7178.0 │  │
│ Adelie    │ Torgersen │           39.118.7181.0 │  │
│ Chinstrap │ Dream     │           46.517.9192.0 │  │
│ Gentoo    │ Biscoe    │           46.113.2211.0 │  │
└───────────┴───────────┴────────────────┴──────────────┴───────────────────┴──┘

Drop all duplicate rows except the last

>>> t.distinct(on=["species", "island"], keep="last").order_by(s.all())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━┓
┃ species   ┃ island    ┃ bill_length_mm ┃ bill_depth_… ┃ flipper_length_mm ┃  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━┩
│ string    │ string    │ float64        │ float64      │ float64           │  │
├───────────┼───────────┼────────────────┼──────────────┼───────────────────┼──┤
│ Adelie    │ Biscoe    │           42.718.3196.0 │  │
│ Adelie    │ Dream     │           41.518.5201.0 │  │
│ Adelie    │ Torgersen │           43.119.2197.0 │  │
│ Chinstrap │ Dream     │           50.218.7198.0 │  │
│ Gentoo    │ Biscoe    │           49.916.1213.0 │  │
└───────────┴───────────┴────────────────┴──────────────┴───────────────────┴──┘

Drop all duplicated rows

>>> expr = t.distinct(on=["species", "island", "year", "bill_length_mm"], keep=None)
>>> expr.count()
┌─────┐
273
└─────┘
>>> t.count()
┌─────┐
344
└─────┘

You can pass selectors to on

>>> t.distinct(on=~s.numeric())
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━┩
│ string  │ string    │ float64        │ float64       │ int64             │ … │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼───┤
│ Adelie  │ Torgersen │           39.118.7181 │ … │
│ Adelie  │ Torgersen │           39.517.4186 │ … │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │ … │
│ Adelie  │ Biscoe    │           37.818.3174 │ … │
│ Adelie  │ Biscoe    │           37.718.7180 │ … │
│ Adelie  │ Dream     │           39.516.7178 │ … │
│ Adelie  │ Dream     │           37.218.1178 │ … │
│ Adelie  │ Dream     │           37.518.9179 │ … │
│ Gentoo  │ Biscoe    │           46.113.2211 │ … │
│ Gentoo  │ Biscoe    │           50.016.3230 │ … │
│ …       │ …         │              … │             … │                 … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴───┘

The only valid values of keep are "first", "last" and .

>>> t.distinct(on="species", keep="second")  # quartodoc: +EXPECTED_FAILURE
Traceback (most recent call last):
  ...
letsql.vendor.ibis.common.exceptions.LetSQLError: Invalid value for `keep`: 'second', must be 'first', 'last' or None

dropna

dropna(subset=None, how='any')

Deprecated - use drop_null instead.

fillna

fillna(replacements)

Deprecated - use fill_null instead.

filter

filter(*predicates)

Select rows from table based on predicates.

Parameters

NameTypeDescriptionDefault
predicatesir.BooleanValue | Sequence[ir.BooleanValue] | IfAnyAllBoolean value expressions used to select rows in table.()

Returns

NameTypeDescription
TableFiltered table expression

Examples

>>> import letsql as ls
>>> ls.options.interactive = True
>>> t = ls.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ … │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼───┤
│ Adelie  │ Torgersen │           39.118.7181.0 │ … │
│ Adelie  │ Torgersen │           39.517.4186.0 │ … │
│ Adelie  │ Torgersen │           40.318.0195.0 │ … │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │ … │
│ Adelie  │ Torgersen │           36.719.3193.0 │ … │
│ Adelie  │ Torgersen │           39.320.6190.0 │ … │
│ Adelie  │ Torgersen │           38.917.8181.0 │ … │
│ Adelie  │ Torgersen │           39.219.6195.0 │ … │
│ Adelie  │ Torgersen │           34.118.1193.0 │ … │
│ Adelie  │ Torgersen │           42.020.2190.0 │ … │
│ …       │ …         │              … │             … │                 … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴───┘
>>> t.filter([t.species == "Adelie", t.body_mass_g > 3500]).sex.value_counts().drop_null(
...     "sex"
... ).order_by("sex")
┏━━━━━━━━┳━━━━━━━━━━━┓
┃ sex    ┃ sex_count ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ string │ int64     │
├────────┼───────────┤
│ female │        22
│ male   │        68
└────────┴───────────┘

intersect

intersect(table, *rest, distinct=True)

Compute the set intersection of multiple table expressions.

The input tables must have identical schemas.

Parameters

NameTypeDescriptionDefault
tableTableA table expressionrequired
*restTableAdditional table expressions()
distinctboolOnly return distinct rowsTrue

Returns

NameTypeDescription
TableA new table containing the intersection of all input tables.

See Also

ibis.intersect

Examples

>>> import letsql as ls
>>> ls.options.interactive = True
>>> t1 = ls.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
1
2
└───────┘
>>> t2 = ls.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
2
3
└───────┘
>>> t1.intersect(t2)
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
2
└───────┘

limit

limit(n, offset=0)

Select n rows from self starting at offset.

The result set is not deterministic without a call to order_by.]

Parameters

NameTypeDescriptionDefault
nint | NoneNumber of rows to include. If None, the entire table is selected starting from offset.required
offsetintNumber of rows to skip first0

Returns

NameTypeDescription
TableThe first n rows of self starting at offset

Examples

>>> import letsql as ls
>>> ls.options.interactive = True
>>> t = ls.memtable({"a": [1, 1, 2], "b": ["c", "a", "a"]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
1 │ c      │
1 │ a      │
2 │ a      │
└───────┴────────┘
>>> t.limit(2)
┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
1 │ c      │
1 │ a      │
└───────┴────────┘

You can use None with offset to slice starting from a particular row

>>> t.limit(None, offset=1)
┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
1 │ a      │
2 │ a      │
└───────┴────────┘

See Also

Table.order_by

order_by

order_by(*by)

Sort a table by one or more expressions.

Similar to pandas.DataFrame.sort_values().

Parameters

NameTypeDescriptionDefault
bystr | ir.Column | s.Selector | Sequence[str] | Sequence[ir.Column] | Sequence[s.Selector] | NoneExpressions to sort the table by.()

Returns

NameTypeDescription
TableSorted table

Examples

>>> import letsql as ls
>>> ls.options.interactive = True
>>> t = ls.memtable(
...     {
...         "a": [3, 2, 1, 3],
...         "b": ["a", "B", "c", "D"],
...         "c": [4, 6, 5, 7],
...     }
... )
>>> t
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
3 │ a      │     4
2 │ B      │     6
1 │ c      │     5
3 │ D      │     7
└───────┴────────┴───────┘

Sort by b. Default is ascending. Note how capital letters come before lowercase

>>> t.order_by("b")
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
2 │ B      │     6
3 │ D      │     7
3 │ a      │     4
1 │ c      │     5
└───────┴────────┴───────┘

Sort in descending order

>>> t.order_by(ls.desc("b"))
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
1 │ c      │     5
3 │ a      │     4
3 │ D      │     7
2 │ B      │     6
└───────┴────────┴───────┘

You can also use the deferred API to get the same result

>>> from letsql import _
>>> t.order_by(_.b.desc())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
1 │ c      │     5
3 │ a      │     4
3 │ D      │     7
2 │ B      │     6
└───────┴────────┴───────┘

Sort by multiple columns/expressions

>>> t.order_by(["a", _.c.desc()])
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
1 │ c      │     5
2 │ B      │     6
3 │ D      │     7
3 │ a      │     4
└───────┴────────┴───────┘

You can actually pass arbitrary expressions to use as sort keys. For example, to ignore the case of the strings in column b

>>> t.order_by(_.b.lower())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
3 │ a      │     4
2 │ B      │     6
1 │ c      │     5
3 │ D      │     7
└───────┴────────┴───────┘

This means that shuffling a Table is super simple

>>> t.order_by(ls.random())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
1 │ c      │     5
3 │ D      │     7
3 │ a      │     4
2 │ B      │     6
└───────┴────────┴───────┘

Selectors are allowed as sort keys and are a concise way to sort by multiple columns matching some criteria

>>> import letsql.selectors as s
>>> penguins = ls.examples.penguins.fetch()
>>> penguins[["year", "island"]].value_counts().order_by(s.startswith("year"))
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year  ┃ island    ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string    │ int64             │
├───────┼───────────┼───────────────────┤
2007 │ Torgersen │                20
2007 │ Biscoe    │                44
2007 │ Dream     │                46
2008 │ Torgersen │                16
2008 │ Dream     │                34
2008 │ Biscoe    │                64
2009 │ Torgersen │                16
2009 │ Dream     │                44
2009 │ Biscoe    │                60
└───────┴───────────┴───────────────────┘

Use the across selector to apply a specific order to multiple columns

>>> penguins[["year", "island"]].value_counts().order_by(
...     s.across(s.startswith("year"), _.desc())
... )
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year  ┃ island    ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string    │ int64             │
├───────┼───────────┼───────────────────┤
2009 │ Biscoe    │                60
2009 │ Dream     │                44
2009 │ Torgersen │                16
2008 │ Biscoe    │                64
2008 │ Dream     │                34
2008 │ Torgersen │                16
2007 │ Dream     │                46
2007 │ Biscoe    │                44
2007 │ Torgersen │                20
└───────┴───────────┴───────────────────┘

sample

sample(fraction, *, method='row', seed=None)

Sample a fraction of rows from a table.

Sampling is by definition a random operation. Some backends support specifying a seed for repeatable results, but not all backends support that option. And some backends (duckdb, for example) do support specifying a seed but may still not have repeatable results in all cases.

In all cases, results are backend-specific. An execution against one backend is unlikely to sample the same rows when executed against a different backend, even with the same seed set.

Parameters

NameTypeDescriptionDefault
fractionfloatThe percentage of rows to include in the sample, expressed as a float between 0 and 1.required
methodLiteral[‘row’, ‘block’]The sampling method to use. The default is “row”, which includes each row with a probability of fraction. If method is “block”, some backends may instead perform sampling a fraction of blocks of rows (where “block” is a backend dependent definition). This is identical to “row” for backends lacking a blockwise sampling implementation. For those coming from SQL, “row” and “block” correspond to “bernoulli” and “system” respectively in a TABLESAMPLE clause.'row'
seedint | NoneAn optional random seed to use, for repeatable sampling. The range of possible seed values is backend specific (most support at least [0, 2**31 - 1]). Backends that never support specifying a seed for repeatable sampling will error appropriately. Note that some backends (like DuckDB) do support specifying a seed, but may still not have repeatable results in all cases.None

Returns

NameTypeDescription
TableThe input table, with fraction of rows selected.

Examples

>>> import letsql as ls
>>> ls.options.interactive = True
>>> t = ls.memtable({"x": [1, 2, 3, 4], "y": ["a", "b", "c", "d"]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ x     ┃ y      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
1 │ a      │
2 │ b      │
3 │ c      │
4 │ d      │
└───────┴────────┘

Sample approximately half the rows, with a seed specified for reproducibility.

>>> t.sample(0.5, seed=1234)
┏━━━━━━━┳━━━━━━━━┓
┃ x     ┃ y      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
2 │ b      │
3 │ c      │
└───────┴────────┘

select

select(*exprs, **named_exprs)

Compute a new table expression using exprs and named_exprs.

Passing an aggregate function to this method will broadcast the aggregate’s value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.

For backwards compatibility the keyword argument exprs is reserved and cannot be used to name an expression. This behavior will be removed in v4.

Parameters

NameTypeDescriptionDefault
exprsir.Value | str | Iterable[ir.Value | str]Column expression, string, or list of column expressions and strings.()
named_exprsir.Value | strColumn expressions{}

Returns

NameTypeDescription
TableTable expression

Examples

>>> import letsql as ls
>>> ls.options.interactive = True
>>> t = ls.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ … │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼───┤
│ Adelie  │ Torgersen │           39.118.7181.0 │ … │
│ Adelie  │ Torgersen │           39.517.4186.0 │ … │
│ Adelie  │ Torgersen │           40.318.0195.0 │ … │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │ … │
│ Adelie  │ Torgersen │           36.719.3193.0 │ … │
│ Adelie  │ Torgersen │           39.320.6190.0 │ … │
│ Adelie  │ Torgersen │           38.917.8181.0 │ … │
│ Adelie  │ Torgersen │           39.219.6195.0 │ … │
│ Adelie  │ Torgersen │           34.118.1193.0 │ … │
│ Adelie  │ Torgersen │           42.020.2190.0 │ … │
│ …       │ …         │              … │             … │                 … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴───┘

Simple projection

>>> t.select("island", "bill_length_mm").head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island    ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string    │ float64        │
├───────────┼────────────────┤
│ Torgersen │           39.1
│ Torgersen │           39.5
│ Torgersen │           40.3
│ Torgersen │           NULL │
│ Torgersen │           36.7
└───────────┴────────────────┘

In that simple case, you could also just use python’s indexing syntax

>>> t[["island", "bill_length_mm"]].head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island    ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string    │ float64        │
├───────────┼────────────────┤
│ Torgersen │           39.1
│ Torgersen │           39.5
│ Torgersen │           40.3
│ Torgersen │           NULL │
│ Torgersen │           36.7
└───────────┴────────────────┘

Projection by zero-indexed column position

>>> t.select(t[0], t[4]).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string  │ float64           │
├─────────┼───────────────────┤
│ Adelie  │             181.0
│ Adelie  │             186.0
│ Adelie  │             195.0
│ Adelie  │              NULL │
│ Adelie  │             193.0
└─────────┴───────────────────┘

Projection with renaming and compute in one call

>>> t.select(next_year=t.year + 1).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64     │
├───────────┤
2008
2008
2008
2008
2008
└───────────┘

You can do the same thing with a named expression, and using the deferred API

>>> from letsql import _
>>> t.select((_.year + 1).name("next_year")).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64     │
├───────────┤
2008
2008
2008
2008
2008
└───────────┘

Projection with aggregation expressions

>>> t.select("island", bill_mean=t.bill_length_mm.mean()).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ island    ┃ bill_mean ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string    │ float64   │
├───────────┼───────────┤
│ Torgersen │  43.92193
│ Torgersen │  43.92193
│ Torgersen │  43.92193
│ Torgersen │  43.92193
│ Torgersen │  43.92193
└───────────┴───────────┘

Projection with a selector

>>> import letsql.selectors as s
>>> t.select(s.numeric() & ~s.cols("year")).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64        │ float64       │ float64           │ float64     │
├────────────────┼───────────────┼───────────────────┼─────────────┤
39.118.7181.03750.0
39.517.4186.03800.0
40.318.0195.03250.0
│           NULL │          NULL │              NULL │        NULL │
36.719.3193.03450.0
└────────────────┴───────────────┴───────────────────┴─────────────┘

Projection + aggregation across multiple columns

>>> from letsql import _
>>> t.select(s.across(s.numeric() & ~s.cols("year"), _.mean())).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64        │ float64       │ float64           │ float64     │
├────────────────┼───────────────┼───────────────────┼─────────────┤
43.9219317.15117200.9152054201.754386
43.9219317.15117200.9152054201.754386
43.9219317.15117200.9152054201.754386
43.9219317.15117200.9152054201.754386
43.9219317.15117200.9152054201.754386
└────────────────┴───────────────┴───────────────────┴─────────────┘

sql

sql(query, dialect=None)

Run a SQL query against a table expression.

Parameters

NameTypeDescriptionDefault
querystrQuery stringrequired
dialectstr | NoneOptional string indicating the dialect of query. Defaults to the backend’s native dialect.None

Returns

NameTypeDescription
TableAn opaque table expression

Examples

>>> import letsql as ls
>>> from letsql import _
>>> ls.options.interactive = True
>>> t = ls.examples.penguins.fetch(table_name="penguins")
>>> expr = t.sql(
...     """
...     SELECT island, mean(bill_length_mm) AS avg_bill_length
...     FROM penguins
...     GROUP BY 1
...     ORDER BY 2 DESC
...     """
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island    ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string    │ float64         │
├───────────┼─────────────────┤
│ Biscoe    │       45.257485
│ Dream     │       44.167742
│ Torgersen │       38.950980
└───────────┴─────────────────┘

Mix and match ibis expressions with SQL queries

>>> t = ls.examples.penguins.fetch(table_name="penguins")
>>> expr = t.sql(
...     """
...     SELECT island, mean(bill_length_mm) AS avg_bill_length
...     FROM penguins
...     GROUP BY 1
...     ORDER BY 2 DESC
...     """
... )
>>> expr = expr.mutate(
...     island=_.island.lower(),
...     avg_bill_length=_.avg_bill_length.round(1),
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island    ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string    │ float64         │
├───────────┼─────────────────┤
│ biscoe    │            45.3
│ torgersen │            39.0
│ dream     │            44.2
└───────────┴─────────────────┘

Because ibis expressions aren’t named, they aren’t visible to subsequent .sql calls. Use the alias method to assign a name to an expression.

>>> expr.alias("b").sql("SELECT * FROM b WHERE avg_bill_length > 40")
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64         │
├────────┼─────────────────┤
│ biscoe │            45.3
│ dream  │            44.2
└────────┴─────────────────┘

See Also

Table.alias

union

union(table, *rest, distinct=False)

Compute the set union of multiple table expressions.

The input tables must have identical schemas.

Parameters

NameTypeDescriptionDefault
tableTableA table expressionrequired
*restTableAdditional table expressions()
distinctboolOnly return distinct rowsFalse

Returns

NameTypeDescription
TableA new table containing the union of all input tables.

See Also

ibis.union

Examples

>>> import letsql as ls
>>> ls.options.interactive = True
>>> t1 = ls.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
1
2
└───────┘
>>> t2 = ls.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
2
3
└───────┘
>>> t1.union(t2)  # union all by default doctest: +SKIP
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
2
3
1
2
└───────┘
>>> t1.union(t2, distinct=True).order_by("a")
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
1
2
3
└───────┘

view

view()

Create a new table expression distinct from the current one.

Use this API for any self-referencing operations like a self-join.

Returns

NameTypeDescription
TableTable expression

cache

cache(storage=None)

Cache the results of a computation to improve performance on subsequent executions. This method allows you to cache the results of a computation either in memory, on disk using Parquet files, or in a database table. The caching strategy and storage location are determined by the storage parameter.

Parameters

NameTypeDescriptionDefault
storageCacheStorageThe storage strategy to use for caching. Can be one of: - ParquetCacheStorage: Caches results as Parquet files on disk - SourceStorage: Caches results in the source database - ParquetSnapshot: Creates a snapshot of data in Parquet format - SnapshotStorage: Creates a snapshot in the source database If None, uses the default storage configuration.None

Returns

NameTypeDescription
ExprA new expression that represents the cached computation.

Notes

The cache method supports two main strategies: 1. ModificationTimeStrategy: Tracks changes based on modification time 2. SnapshotStrategy: Creates point-in-time snapshots of the data

Each strategy can be combined with either Parquet or database storage.

Examples

Using ParquetCacheStorage:

>>> import letsql as ls
>>> from letsql.common.caching import ParquetCacheStorage
>>> from pathlib import Path
>>> pg = ls.postgres.connect_examples()
>>> con = ls.connect()
>>> storage = ParquetCacheStorage(source=con, path=Path.cwd())
>>> alltypes = pg.table("functional_alltypes")
>>> cached = (alltypes
...     .select(alltypes.smallint_col, alltypes.int_col, alltypes.float_col)
...     .cache(storage=storage))

Using SourceStorage with PostgreSQL:

>>> from letsql.common.caching import SourceStorage
>>> from letsql import _
>>> ddb = ls.duckdb.connect()
>>> path = ls.config.options.pins.get_path("batting")
>>> right = (ddb.read_parquet(path, table_name="batting")
...          .filter(_.yearID == 2014)
...          .pipe(con.register, table_name="ddb-batting"))
>>> left = (pg.table("batting")
...         .filter(_.yearID == 2015)
...         .pipe(con.register, table_name="pg-batting"))
>>> # Cache the joined result
>>> expr = left.join(right, "playerID").cache(SourceStorage(source=pg))

Using cache with filtering:

>>> cached = alltypes.cache(storage=storage)
>>> expr = cached.filter([
...     cached.float_col > 0,
...     cached.smallint_col > 4,
...     cached.int_col < cached.float_col * 2
... ])

See Also

ParquetCacheStorage : Storage implementation for Parquet files SourceStorage : Storage implementation for database tables ModificationTimeStrategy : Strategy for tracking changes by modification time SnapshotStrategy : Strategy for creating data snapshots

Notes

  • The cache is identified by a unique key based on the computation and strategy
  • Cache invalidation is handled automatically based on the chosen strategy
  • Cross-source caching (e.g., from PostgreSQL to DuckDB) is supported
  • Cache locations can be configured globally through letsql.config.options

into_backend

into_backend(con, name=None)

Converts the Expr to a table in the given backend con with an optional table name name.

The table is backed by a PyArrow RecordBatchReader, the RecordBatchReader is teed so it can safely be reaused without spilling to disk.

Parameters

NameTypeDescriptionDefault
conThe backend where the table should be createdrequired
nameThe name of the tableNone

Examples

>>> import letsql as ls
>>> from letsql import _
>>> ls.options.interactive = True
>>> ls_con = ls.connect()
>>> pg_con = ls.postgres.connect_examples()
>>> t = pg_con.table("batting").into_backend(ls_con, "ls_batting")
>>> expr = (
...     t.join(t, "playerID")
...     .order_by("playerID", "yearID")
...     .limit(15)
...     .select(player_id="playerID", year_id="yearID_right")
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━┓
┃ player_id ┃ year_id ┃
┡━━━━━━━━━━━╇━━━━━━━━━┩
│ string    │ int64   │
├───────────┼─────────┤
│ aardsda01 │    2015
│ aardsda01 │    2007
│ aardsda01 │    2006
│ aardsda01 │    2009
│ aardsda01 │    2008
│ aardsda01 │    2010
│ aardsda01 │    2004
│ aardsda01 │    2013
│ aardsda01 │    2012
│ aardsda01 │    2006
│ …         │       … │
└───────────┴─────────┘