Fabra: The Inference Context Ledger
Prove what your AI knew.
Fabra captures exactly what data your AI used at decision time — with full lineage, freshness guarantees, and replay. From notebook to production in 30 seconds.
Get Started → | Try in Browser →
At a Glance
| What | Inference Context Ledger — we own the write path |
| Context Record | Immutable snapshot of AI decision context |
| Install | pip install fabra-ai |
| Features | @feature decorator for ML features |
| RAG | @retriever + @context for LLM context assembly |
| Vector DB | pgvector (Postgres extension) |
| Local | DuckDB + in-memory (zero setup) |
| Production | Postgres + Redis (one env var) |
| Deploy | fabra deploy fly|cloudrun|ecs|railway|render |
The Problem
You're building an AI app. You need:
- Structured features (user tier, purchase history) for personalization
- Unstructured context (relevant docs, chat history) for your LLM
- Vector search for semantic retrieval
- Token budgets to fit your context window
Today, this means stitching together LangChain, Pinecone, a feature store, Redis, and prayer.
Fabra stores, indexes, and serves the data your AI uses — and tracks exactly what was retrieved for every decision.
This is "write path ownership": we ingest and manage your context data, not just query it. This enables replay, lineage, and traceability that read-only wrappers cannot provide.
The 30-Second Quickstart
Fastest Path
pip install fabra-ai && fabra demoThat's it. Server starts, makes a test request, and prints a context_id (your receipt). No Docker. No config files. No API keys.
Next:
fabra context show <context_id>
fabra context verify <context_id>Build Your Own
pip install fabra-aifrom fabra.core import FeatureStore, entity, feature
from fabra.context import context, ContextItem
from fabra.retrieval import retriever
from datetime import timedelta
store = FeatureStore()
@entity(store)
class User:
user_id: str
@feature(entity=User, refresh=timedelta(days=1))
def user_tier(user_id: str) -> str:
return "premium" if hash(user_id) % 2 == 0 else "free"
@retriever(index="docs", top_k=3)
async def find_docs(query: str):
pass # Automatic vector search via pgvector
@context(store, max_tokens=4000)
async def build_prompt(user_id: str, query: str):
tier = await store.get_feature("user_tier", user_id)
docs = await find_docs(query)
return [
ContextItem(content=f"User is {tier}.", priority=0),
ContextItem(content=str(docs), priority=1),
]fabra serve features.py
# Server running on http://localhost:8000
curl localhost:8000/features/user_tier?entity_id=user123
# {"value": "premium", "freshness_ms": 0, "served_from": "online"}That's it. No infrastructure. No config files. Just Python.
Why Fabra?
| Traditional Stack | Fabra | |
|---|---|---|
| Config | 500 lines of YAML | Python decorators |
| Infrastructure | Kubernetes + Spark + Pinecone | Your laptop (DuckDB) |
| RAG Pipeline | LangChain spaghetti | @retriever + @context |
| Feature Serving | Separate feature store | Same @feature decorator |
| Time to Production | Weeks | 30 seconds |
We Own the Write Path
LangChain and other frameworks are read-only wrappers — they query your data but don't manage it. Fabra is the system of record for inference context. Every context assembly becomes a durable Context Record with:
- Cryptographic integrity (tamper-evident hashes)
- Full lineage (what data was used, when, from where)
- Point-in-time replay (reproduce any decision exactly)
Infrastructure, Not a Framework
Fabra is not an orchestration layer. It's the system of record for what your AI knows. Features, retrievers, and context assembly in one infrastructure layer with production reliability.
Local-First, Production-Ready
FABRA_ENV=development # DuckDB + In-Memory (default)
FABRA_ENV=production # Postgres + Redis + pgvectorSame code. Zero changes. Just flip an environment variable.
Point-in-Time Correctness
Training ML models? We use ASOF JOIN (DuckDB) and LATERAL JOIN (Postgres) to ensure your training data reflects the world exactly as it was — no data leakage, ever.
Token Budget Management
@context(store, max_tokens=4000)
async def build_prompt(user_id: str, query: str):
return [
ContextItem(content=critical_info, priority=0, required=True),
ContextItem(content=nice_to_have, priority=2), # Dropped if over budget
]Automatically assembles context that fits your LLM's window. Priority-based truncation. No more "context too long" errors.
Production-Grade Reliability
- Self-Healing:
fabra doctordiagnoses environment issues - Fallback Chain: Cache → Compute → Default
- Circuit Breakers: Built-in protection against cascading failures
- Observability: Prometheus metrics, structured JSON logging, OpenTelemetry
Key Capabilities
For AI Engineers (Context Store)
- Vector Search: Built-in pgvector with automatic chunking and embedding
- Magic Retrievers:
@retrieverauto-wires to your vector index - Context Assembly: Token budgets, priority truncation, explainability API
- Semantic Cache: Cache expensive LLM calls and retrieval results
For ML Engineers (Feature Store)
- Hybrid Features: Mix Python logic and SQL in the same pipeline
- Event-Driven: Trigger updates via Redis Streams
- Point-in-Time Joins: Zero data leakage for training
- Hooks: Before/After hooks for custom pipelines
For Everyone
- One-Command Deploy:
fabra deploy fly|cloudrun|ecs|railway|render - Visual UI: Dependency graphs, live metrics, context debugging
- Unit Testing: Test features in isolation
For Compliance & Debugging
- Context Accountability: Full lineage tracking — every AI decision traces back through the data that informed it
- Context Replay: Reproduce exactly what your AI knew at any point in time for debugging and compliance
- Traceability: UUIDv7-based context IDs with complete data provenance
- Freshness SLAs: Ensure data freshness with configurable thresholds and degraded mode
Use Cases
- RAG Chatbot: Build a production RAG application
- Fraud Detection: Real-time feature serving
- Churn Prediction: Point-in-time correct training data
- Real-Time Recommendations: Async feature pipelines
Start Here
| I'm an ML Engineer | I'm an AI Engineer |
|---|---|
| "I need to serve features without Kubernetes" | "I need RAG with traceability" |
| Feature Store Without K8s → | Context Accountability → |
| Feast vs Fabra → | Context Store → |
| Quickstart (ML Track) → | Quickstart (AI Track) → |
Building in a regulated industry? Compliance Guide →
Documentation
Getting Started
- Quickstart — Zero to served features in 30 seconds
- Philosophy — Why we built this and who it's for
- Architecture — Boring technology, properly applied
For ML Engineers
- Feature Store Without Kubernetes — No K8s, no Docker, just Python
- Fabra vs Feast — The lightweight alternative
- Local to Production — Deploy when you're ready
For AI Engineers
- Context Store — RAG infrastructure with full lineage
- Context Accountability — Know what your AI knew
- Compliance Guide — GDPR, SOC2, and regulated industries
Guides
- Comparisons — vs Feast, LangChain, Pinecone, Tecton
Tools
- WebUI — Visual feature store & context explorer
Specifications
- Context Record Spec (CRS-001) — Technical specification for Context Records
Reference
- Glossary — Key terms defined
- FAQ — Common questions
- Troubleshooting — Common issues and fixes
- Changelog — Version history
Blog
- Why We Built a Feast Alternative
- Running a Feature Store Locally Without Docker
- RAG Without LangChain
- The Feature Store for Startups
- Context Assembly: Fitting LLM Prompts in Token Budgets
- Point-in-Time Features: Preventing ML Data Leakage
- pgvector vs Pinecone: When to Self-Host Vector Search
- Token Budget Management for Production RAG
- Python Decorators for ML Feature Engineering
- Deploy ML Features Without Kubernetes
- What Did Your AI Know? Introducing Context Replay
- Traceability for AI Decisions
- Freshness SLAs: When Your AI Needs Fresh Data
- Fabra vs Context Engineering Platforms: Choosing the Right Tool
Quick FAQ
Q: What is Fabra? A: Fabra is context infrastructure for AI applications. It stores, indexes, and serves the data your AI uses — and tracks exactly what was retrieved for every decision. We call this "write path ownership": we manage your context data, not just query it.
Q: How is Fabra different from LangChain? A: LangChain is a framework (orchestration). Fabra is infrastructure (storage + serving). LangChain queries external stores; Fabra owns the write path with freshness tracking, replay, and full lineage. You can use both together.
Q: How is Fabra different from Feast? A: Fabra is a lightweight alternative with Python decorators instead of YAML, plus built-in context/RAG support (vector search, token budgeting, lineage) that Feast doesn't have.
Q: Do I need Kubernetes or Docker?
A: No. Fabra runs locally with DuckDB and in-memory cache. For production, set FABRA_ENV=production with Postgres and Redis.
Q: What vector database does Fabra use? A: pgvector (Postgres extension). Your vectors live alongside your relational data—no separate vector database required.
Contributing
We love contributions! See CONTRIBUTING.md to get started.