F
Fabra

Use Case: Churn Prediction

Churn prediction is the "Hello World" of Point-in-Time Correctness. If you get this wrong, your model will look perfect in training but fail in production.

The Problem: Data Leakage

Imagine you are training a model to predict if a user will churn next month. You have a features table:

user_id txn_count timestamp
u1 10 2024-01-01
u1 50 2024-02-01

And a labels table (churn events):

user_id churned timestamp
u1 True 2024-01-15

If you naively join these tables on user_id, you might accidentally use the txn_count=50 feature (from Feb 1st) to predict the churn event (on Jan 15th). This is data leakage. You are using data from the future to predict the past.

The Solution: Point-in-Time Correctness

Fabra solves this automatically using ASOF JOIN (DuckDB) or LATERAL JOIN (Postgres).

# features.py
@feature(entity=User, sql="SELECT * FROM transactions", materialize=True)
def txn_count(user_id: str) -> int:
    return 0

# training.py
training_df = await store.get_training_data(
    entity_df=labels_df,  # Contains user_id and timestamp (Jan 15th)
    features=["txn_count"]
)

Fabra ensures that for the label on Jan 15th, it only sees the feature value from Jan 1st (txn_count=10). It ignores the future value.

Why This Matters

  • Correctness: Your offline metrics (AUC/F1) will match online performance.
  • Simplicity: You don't need to write complex window functions or temporal joins manually.
  • Consistency: The same logic applies whether you are using DuckDB locally or Postgres in production.

Next Steps