Hybrid Features: Python + SQL
Fabra v1.1.0 introduces Hybrid Features, allowing you to choose the best tool for the job: Python for complex logic, and SQL for heavy lifting.
The Two Modes
1. Python Features (On-Demand)
Perfect for real-time transformations, complex math, or logic that is hard to express in SQL.
@feature(entity=User)
def name_length(user_id: str) -> int:
# Complex logic in pure Python
user = fetch_user_profile(user_id)
return len(user.name)How it works:
- Online: The function is executed in real-time when you request the feature.
- Offline: The function is applied row-by-row to your entity dataframe using
apply().
2. SQL Features (Batch / Materialized)
Perfect for aggregations, joins, and large-scale data processing that should happen in the warehouse.
@feature(
entity=User,
sql="SELECT user_id, count(*) as txn_count FROM transactions GROUP BY user_id",
materialize=True,
refresh=timedelta(minutes=10)
)
def user_txn_count(user_id: str) -> int:
return 0 # Fallback value or type hintHow it works:
- Offline Retrieval: Fabra executes the SQL query against your Offline Store (DuckDB or Postgres) and joins the result to your entity dataframe.
- Materialization: The Scheduler executes the SQL query periodically and bulk-loads the results into Redis.
- Online Serving: The API serves the pre-computed value from Redis.
[!IMPORTANT] Point-in-Time Correctness: For SQL features, your entity dataframe must have a timestamp column (default:
event_timestamportimestamp) to perform the ASOF join correctly.
Mixing Both
You can mix both types in the same request!
training_df = await store.get_training_data(
entity_df,
features=["name_length", "user_txn_count"]
)Fabra automatically handles the orchestration:
- Computes
name_lengthin Python. - Delegates
user_txn_countto the database. - Merges everything into a single DataFrame.