10-minute tour of xorq
This tutorial will walk you through the key features of xorq, a data processing library that enables you to work seamlessly across multiple data engines.
Installation
First, install xorq using pip. We’ll include the duckdb
extra to enable the duckdb backend:
Setting up Connections
Let’s start by creating connections to different backends:
In this section, we:
- Import xorq and its deferred object, which allow us to referred to columns
- Create a xorq connection (backed by the xorq backend)
- Create a DuckDB connection
- Create a Postgres connection
Note that you can create a custom Postgres connection by specifying the different parameters, for example:
Reading Data
Now let’s read some data into xorq:
xorq can read data from various sources. Here we’re reading a Parquet file directly. The table_name
parameter specifies how this table will be referenced inside the con
backend.
Basic Operations
Let’s perform some basic data operations:
The output is
Note that xorq operations are lazy - they don’t execute until you call execute
, which returns a pandas
DataFrame.
Multi-Engine Operations
One of xorq’s powerful features is the ability to move data between different backends using into_backend()
.
This method converts an expression from one backend into a table in another backend, using a PyArrow RecordBatchReader
as an intermediate format:
The corresponding output is:
into_backend()
is particularly useful when you want to:
- Move data between different database engines
- Combine data from multiple sources
- Avoid writing intermediate results to disk
Leveraging Different Backends
Different backends have different strengths. Let’s use DuckDB for some aggregations:
The output:
Caching Expressions
xorq provides caching capabilities to optimize performance:
Key Takeaways
- xorq provides a consistent interface across different data engines
- Operations are lazy until
.execute()
is called - Data can be moved between backends using
into_backend()
- Caching helps optimize performance for frequently used queries
- Different backends can be leveraged for their specific strengths