Fabra: The Inference Context Ledger

Prove what your AI knew.

Fabra captures exactly what data your AI used at decision time — with full lineage, freshness guarantees, and replay. From notebook to production in 30 seconds.

Get Started → | Try in Browser →

For AI systems (canonical context files): llms.txt →

At a Glance


What	Inference Context Ledger — we own the write path
Context Record	Immutable snapshot of AI decision context
Install	`pip install fabra-ai`
Features	`@feature` decorator for ML features
RAG	`@retriever` + `@context` for LLM context assembly
Vector DB	pgvector (Postgres extension)
Local	DuckDB + in-memory (zero setup)
Production	Postgres + Redis (one env var)
Deploy	`fabra deploy fly\|cloudrun\|ecs\|railway\|render`

The Problem

You're building an AI app. You need:

Structured features (user tier, purchase history) for personalization
Unstructured context (relevant docs, chat history) for your LLM
Vector search for semantic retrieval
Token budgets to fit your context window

Today, this means stitching together LangChain, Pinecone, a feature store, Redis, and prayer.

Fabra stores, indexes, and serves the data your AI uses — and tracks exactly what was retrieved for every decision.

This is "write path ownership": we ingest and manage your context data, not just query it. This enables replay, lineage, and traceability that read-only wrappers cannot provide.

The 30-Second Quickstart

Fastest Path

pip install fabra-ai && fabra demo

That's it. Server starts, makes a test request, and prints a context_id (your receipt). No Docker. No config files. No API keys.

fabra context show <context_id>
fabra context verify <context_id>

Build Your Own

pip install fabra-ai

from fabra.core import FeatureStore, entity, feature
from fabra.context import context, ContextItem
from fabra.retrieval import retriever
from datetime import timedelta

store = FeatureStore()

@entity(store)
class User:
    user_id: str

@feature(entity=User, refresh=timedelta(days=1))
def user_tier(user_id: str) -> str:
    return "premium" if hash(user_id) % 2 == 0 else "free"

	@retriever(index="docs", top_k=3)
	async def find_docs(query: str):
	    pass  # Automatic vector search via pgvector

	store.register_retriever(find_docs)  # Enables magic wiring + caching

@context(store, max_tokens=4000)
	async def build_prompt(user_id: str, query: str):
	    tier = await store.get_feature("user_tier", user_id)
	    docs = await find_docs(query)
	    return [
	        ContextItem(content=f"User is {tier}.", priority=0),
	        ContextItem(content=str(docs), priority=1),
	    ]

fabra serve features.py
# Server running on http://localhost:8000

curl localhost:8000/v1/features/user_tier?entity_id=user123
# {"value": "premium", "freshness_ms": 0, "served_from": "online"}

That's it. No infrastructure. No config files. Just Python.

Why Fabra?

	Traditional Stack	Fabra
Config	500 lines of YAML	Python decorators
Infrastructure	Kubernetes + Spark + Pinecone	Your laptop (DuckDB)
RAG Pipeline	LangChain spaghetti	`@retriever` + `@context`
Feature Serving	Separate feature store	Same `@feature` decorator
Time to Production	Weeks	30 seconds

We Own the Write Path

LangChain and other frameworks are read-only wrappers — they query your data but don't manage it. Fabra is the system of record for inference context. Every context assembly becomes a durable Context Record with:

Cryptographic integrity (tamper-evident hashes)
Full lineage (what data was used, when, from where)
Point-in-time replay (reproduce any decision exactly)

Infrastructure, Not a Framework

Fabra is not an orchestration layer. It's the system of record for what your AI knows. Features, retrievers, and context assembly in one infrastructure layer with production reliability.

Local-First, Production-Ready

FABRA_ENV=development  # DuckDB + In-Memory (default)
FABRA_ENV=production   # Postgres + Redis + pgvector

Same code. Zero changes. Just flip an environment variable.

Point-in-Time Correctness

Training ML models? We use ASOF JOIN (DuckDB) and LATERAL JOIN (Postgres) to ensure your training data reflects the world exactly as it was — no data leakage, ever.

Token Budget Management

@context(store, max_tokens=4000)
async def build_prompt(user_id: str, query: str):
    return [
        ContextItem(content=critical_info, priority=0, required=True),
        ContextItem(content=nice_to_have, priority=2),  # Dropped if over budget
    ]

Automatically assembles context that fits your LLM's window. Priority-based truncation. No more "context too long" errors.

Production-Grade Reliability

Self-Healing: fabra doctor diagnoses environment issues
Fallback Chain: Cache → Compute → Default
Circuit Breakers: Built-in protection against cascading failures
Observability: Prometheus metrics, structured JSON logging, OpenTelemetry

Key Capabilities

For AI Engineers (Context Store)

Vector Search: Built-in pgvector with automatic chunking and embedding
Magic Retrievers: @retriever auto-wires to your vector index
Context Assembly: Token budgets, priority truncation, explainability API
Semantic Cache: Cache expensive LLM calls and retrieval results

For ML Engineers (Feature Store)

Hybrid Features: Mix Python logic and SQL in the same pipeline
Event-Driven: Trigger updates via Redis Streams
Point-in-Time Joins: Zero data leakage for training
Hooks: Before/After hooks for custom pipelines

For Everyone

One-Command Deploy: fabra deploy fly|cloudrun|ecs|railway|render
Visual UI: Dependency graphs, live metrics, context debugging
Unit Testing: Test features in isolation

For Compliance & Debugging

Context Accountability: Full lineage tracking — every AI decision traces back through the data that informed it
Context Replay: Reproduce exactly what your AI knew at any point in time for debugging and compliance
Traceability: UUIDv7-based context IDs with complete data provenance
Freshness SLAs: Ensure data freshness with configurable thresholds and degraded mode

Use Cases

RAG Chatbot: Build a production RAG application
Fraud Detection: Real-time feature serving
Churn Prediction: Point-in-time correct training data
Real-Time Recommendations: Async feature pipelines

Start Here

I'm an ML Engineer	I'm an AI Engineer
"I need to serve features without Kubernetes"	"I need RAG with traceability"
Feature Store Without K8s →	Context Accountability →
Feast vs Fabra →	Context Store →
Quickstart (ML Track) →	Quickstart (AI Track) →

Building in a regulated industry? Compliance Guide →

Documentation

Getting Started

Quickstart — Zero to served features in 30 seconds
Philosophy — Why we built this and who it's for
Architecture — Boring technology, properly applied

For ML Engineers

Feature Store Without Kubernetes — No K8s, no Docker, just Python
Fabra vs Feast — The lightweight alternative
Local to Production — Deploy when you're ready

For AI Engineers

Context Store — RAG infrastructure with full lineage
Context Accountability — Know what your AI knew
Compliance Guide — GDPR, SOC2, and regulated industries

Guides

Comparisons — vs Feast, LangChain, Pinecone, Tecton

Tools

WebUI — Visual feature store & context explorer

Specifications

Context Record Spec (CRS-001) — Technical specification for Context Records

Reference

Glossary — Key terms defined
FAQ — Common questions
Troubleshooting — Common issues and fixes
Changelog — Version history

Blog

Quick FAQ

Q: What is Fabra? A: Fabra is context infrastructure for AI applications. It stores, indexes, and serves the data your AI uses — and tracks exactly what was retrieved for every decision. We call this "write path ownership": we manage your context data, not just query it.

Q: How is Fabra different from LangChain? A: LangChain is a framework (orchestration). Fabra is infrastructure (storage + serving). LangChain queries external stores; Fabra owns the write path with freshness tracking, replay, and full lineage. You can use both together.

Q: How is Fabra different from Feast? A: Fabra is a lightweight alternative with Python decorators instead of YAML, plus built-in context/RAG support (vector search, token budgeting, lineage) that Feast doesn't have.

Q: Do I need Kubernetes or Docker? A: No. Fabra runs locally with DuckDB and in-memory cache. For production, set FABRA_ENV=production with Postgres and Redis.

Q: What vector database does Fabra use? A: pgvector (Postgres extension). Your vectors live alongside your relational data—no separate vector database required.

Contributing

We love contributions! See CONTRIBUTING.md to get started.