Lumyn vs Fine-Tuning: When to Use Each
Fine-tuning adapts LLM weights to improve response quality. Lumyn enforces deterministic policies without touching the model. They address orthogonal concerns: fine-tuning makes your LLM better at its task; Lumyn makes
your AI system governable regardless of what LLM you use.
The Core Difference
| Aspect | Fine-Tuning | Lumyn |
|---|---|---|
| What it changes | Model weights (neural network parameters) | Policy rules (YAML configuration) |
| When it runs | Training time (hours/days) | Decision time (milliseconds) |
| Purpose | Improve LLM output quality | Enforce governance rules |
| Determinism | Still probabilistic (LLMs vary) | Deterministic (same inputs → same verdict) |
| Cost | $100-$10,000 per training run | $0 (local evaluation) |
| Update speed | Days (retrain + deploy) | Seconds (update YAML) |
What is Fine-Tuning?
Fine-tuning takes a pre-trained LLM and adjusts its weights on domain-specific data:
from openai import OpenAI
# Fine-tune GPT-4 on customer support examples
client = OpenAI()
client.fine_tuning.jobs.create(
training_file="file-abc123", # Your examples
model="gpt-4-0613",
hyperparameters={"n_epochs": 3}
)
# Cost: ~$100-$500
# Time: 2-8 hoursFine-tuning is great for:
- Domain-specific language (medical, legal, technical)
- Consistent tone/style
- Task-specific formats (structured outputs)
- Reducing prompt engineering
Fine-tuning does NOT provide:
- Deterministic decisions
- Audit trails
- Governance rules
- Replay guarantees
What is Lumyn?
Lumyn evaluates structured policy rules to gate actions, without any model training:
# policy.yml - update in seconds, no retraining
rules:
- id: R_HIGH_AMOUNT
if:
amount_usd: { gt: 1000 }
then:
verdict: ESCALATE
reason_codes: ["AMOUNT_OVER_THRESHOLD"]from lumyn import decide, LumynConfig
record = decide(request, config=LumynConfig(policy_path="policy.yml"))
# Decision in milliseconds, deterministic, auditableLumyn is great for:
- Compliance-driven gating
- Fraud prevention
- Access control
- Audit requirements
Lumyn does NOT:
- Improve LLM output quality
- Change model behavior
- Require training data
When Fine-Tuning Fails for Governance
Problem 1: Still Non-Deterministic
Even after fine-tuning, LLMs are probabilistic:
# Fine-tuned model for refund approvals
response1 = fine_tuned_llm("Approve \$500 refund for user_123?")
# "Approved" (temperature=0.7)
response2 = fine_tuned_llm("Approve \$500 refund for user_123?")
# "I recommend escalating due to amount" (same temp, different output!)You cannot:
- Replay the exact decision for audits
- Alert on specific policy violations
- Guarantee same verdict for same inputs
With Lumyn:
record = decide(refund_request, config=cfg)
# Always: {"verdict": "ESCALATE", "reason_codes": ["REFUND_OVER_ESCALATION_LIMIT"]}
# Deterministic replay with inputs_digestProblem 2: Slow Policy Updates
Fine-tuning requires full retraining cycle:
1. Collect new training examples (days)
2. Fine-tune model ($100-$500, 2-8 hours)
3. Evaluate on test set (hours)
4. Deploy new model (CI/CD pipeline)
5. Monitor for regressions
Total: 1-2 weeks minimumWith Lumyn:
# Edit policy.yml
rules:
- id: R_NEW_FRAUD_BLOCK
if:
evidence.risk_score: { gt: 0.9 }
then:
verdict: DENY
reason_codes: ["HIGH_FRAUD_RISK"]$ git commit -m "Add fraud rule" policy.yml
$ git push
# Live in production in secondsProblem 3: No Structured Governance
Fine-tuned models still return natural language, not machine-stable codes:
fine_tuned_output = "This request seems risky due to high chargeback probability,
but the customer has been loyal for 3 years, so I'd suggest
manual review before deciding."Problems:
- Can't query SQL for "top deny reasons"
- Can't alert on "chargeback risk" threshold
- No compliance dashboard
With Lumyn:
{
"verdict": "ESCALATE",
"reason_codes": ["CHARGEBACK_RISK_HIGH", "MANUAL_REVIEW_REQUIRED"],
"risk_signals": {"chargeback_probability": 0.87}
}Now: SELECT COUNT(*) WHERE 'CHARGEBACK_RISK_HIGH' = ANY(reason_codes)
When to Use Both Together
Fine-tuning and Lumyn are complementary in production AI systems:
Pattern 1: Fine-Tuned LLM → Lumyn Gate
# 1. Fine-tuned LLM generates risk assessment
risk_assessment = fine_tuned_llm(
f"Analyze refund risk for customer {user_id}"
)
# Returns: {"risk_score": 0.87, "factors": ["high_chargeback_history"]}
# 2. Lumyn enforces policy on LLM output
record = decide(
request={
"action": {"type": "refund", "amount": {"value": 500, "currency": "USD"}},
"evidence": {"risk_score": risk_assessment["risk_score"]},
"context": {"mode": "inline", "inline": risk_assessment, "digest": "sha256:..."}
},
config=LumynConfig(policy_path="policy.yml")
)
if record["verdict"] == "DENY":
# Policy overrides LLM's "maybe approve" tendency
return {"status": "BLOCKED", "reason_codes": record["reason_codes"]}Result: LLM provides nuanced risk analysis (via fine-tuning), Lumyn enforces hard rules (governance).
Pattern 2: Lumyn Memory Informs Fine-Tuning Data
# 1. Lumyn collects labeled decisions over time
$ lumyn label 01JBQX... --label failure --summary "Fraudulent refund"
$ lumyn label 01JBQY... --label success --summary "Legitimate refund"
# 2. Export Lumyn memory as fine-tuning examples
decisions = export_decision_records(label="failure", limit=1000)
training_data = []
for decision in decisions:
training_data.append({
"messages": [
{"role": "user", "content": f"Analyze: {decision['request']}"},
{"role": "assistant", "content": f"DENY - {decision['reason_codes']}"}
]
})
# 3. Fine-tune on Lumyn's curated dataset
client.fine_tuning.jobs.create(training_file=training_data, model="gpt-4")Result: Lumyn's governance decisions become training signal for LLM.
Cost Comparison
Fine-Tuning Costs (OpenAI GPT-4)
Training: $0.0080 per 1K tokens
- 100K training examples × 500 tokens avg = 50M tokens
- Cost: $0.0080 × 50,000 = $400 per training run
Inference: $0.03 per 1K tokens (same as base model)
- 1M decisions × 200 tokens avg = 200M tokens
- Cost: $0.03 × 200,000 = $6,000/month
Total first month: $6,400Lumyn Costs
Policy evaluation: In-memory (free)
Memory search: Local vector DB (free)
Decision storage: 10 KB × $0.000001 (S3)
1M decisions: ~$10/month (640x cheaper)Key insight: Fine-tuning adds training cost BUT doesn't reduce inference cost. Lumyn avoids LLM calls entirely for governance.
Speed Comparison
Fine-Tuning Latency
Request → LLM API → Wait for response → Parse output
Latency: 500ms - 3 seconds (depending on model size)Lumyn Latency
Request → Evaluate policy rules → Return verdict
Latency: 1ms - 50ms (in-process, no API calls)30-3000x faster for gated decisions.
When Fine-Tuning Makes Sense
Use fine-tuning when you need to:
Improve LLM output quality
# Fine-tune for domain-specific language base_llm("Explain hemoptysis") → "Coughing up blood" fine_tuned_llm("Explain hemoptysis") → "Expectoration of blood from respiratory tract, often indicating pulmonary hemorrhage..."Consistent formatting
# Fine-tune for structured JSON output base_llm("Extract entities") → "The person is John, age 30..." fine_tuned_llm("Extract entities") → {"name": "John", "age": 30}Reduce prompt tokens
# Base model needs 500-token prompt base_llm(long_prompt + query) # Fine-tuned model internalizes instructions fine_tuned_llm(query) # Shorter prompt = cheaper
When Lumyn Makes Sense
Use Lumyn when you need to:
Enforce compliance rules
rules: - id: GDPR_RETENTION if: data_age_days: { gt: 365 } then: verdict: DENY reason_codes: ["GDPR_RETENTION_VIOLATION"]Deterministic replay for audits
$ lumyn replay decision_6months_ago.zip verdict: DENY reason_codes: REFUND_OVER_ESCALATION_LIMIT policy_hash: sha256:a4f2c8... # Exact reproduction for regulatorsFast policy updates
# Update policy in seconds rules: - id: NEW_FRAUD_RULE if: evidence.device_fingerprint: { in: ["banned_device_1", "banned_device_2"] } then: verdict: DENYMachine-stable reason codes
-- Dashboard query SELECT reason_code, COUNT(*) FROM decision_records WHERE verdict = 'DENY' GROUP BY reason_code;
Hybrid Architecture
Production systems often use both:
┌─────────────────────────────────────────────┐
│ Request (e.g., refund approval) │
└──────────────────┬──────────────────────────┘
↓
┌──────────────────────────┐
│ Fine-Tuned Risk Model │
│ (GPT-4 fine-tuned on │
│ past fraud cases) │
└──────────┬───────────────┘
↓
risk_score: 0.87
↓
┌──────────────────────────┐
│ Lumyn Policy Engine │
│ RULE: if risk > 0.8, │
│ verdict: DENY │
└──────────┬───────────────┘
↓
Decision Record
{verdict: "DENY",
reason_codes: ["FRAUD_RISK_HIGH"]}Why this works:
- Fine-tuning makes the risk model better (domain-specific learning)
- Lumyn enforces governance rules on top (deterministic gating)
Real-World Example: Content Moderation
Fine-Tuning Approach
# Fine-tune GPT-4 on moderation examples
moderation_llm = fine_tune(
model="gpt-4",
examples=[
{"input": "hate speech example", "output": "BLOCK: violates policy 3.2"},
{"input": "borderline case", "output": "ALLOW: within guidelines"},
]
)
# Use in production
decision = moderation_llm(user_content)
# Problem: Still probabilistic, no replay, no structured codesLumyn Approach
# policy.yml
rules:
- id: HATE_SPEECH_BLOCK
if:
evidence.hate_speech_score: { gt: 0.9 }
then:
verdict: DENY
reason_codes: ["HATE_SPEECH_DETECTED"]
- id: BORDERLINE_ESCALATE
if:
evidence.hate_speech_score: { between: [0.7, 0.9] }
then:
verdict: ESCALATE
reason_codes: ["BORDERLINE_CONTENT_MANUAL_REVIEW"]# In production
hate_score = fine_tuned_classifier(user_content) # Fine-tuned for better accuracy
record = decide(
request={"action": {"type": "publish"}, "evidence": {"hate_speech_score": hate_score}},
config=LumynConfig(policy_path="policy.yml")
)
# Deterministic gate based on LLM's scoreHybrid: Best of Both
- Fine-tuning improves hate speech classifier accuracy
- Lumyn enforces consistent thresholds with audit trail
Frequently Asked Questions
Can fine-tuning replace Lumyn for compliance?
No. Fine-tuning improves LLM quality but doesn't provide:
- Deterministic replay
- Machine-stable reason codes
- Audit trails
- Fast policy updates (seconds vs days)
Can I fine-tune on Lumyn's decision data?
Yes! Export Lumyn's labeled decisions as training examples:
failures = get_decisions(label="failure", limit=1000)
training_data = format_for_finetuning(failures)
fine_tune(model="gpt-4", data=training_data)This creates a feedback loop: Lumyn governance → LLM improvement.
Does Lumyn work with fine-tuned models?
Yes. Lumyn is model-agnostic. Use fine-tuned LLMs to generate risk scores/classifications, then gate them with Lumyn policies.
What if I can't afford fine-tuning?
Use Lumyn alone. Policy rules don't require training:
rules:
- if: {amount_usd: {gt: 1000}}
then: {verdict: ESCALATE}No LLM needed, no training cost.
Can I use fine-tuning for memory instead of Lumyn's similarity search?
No. Fine-tuning bakes patterns into weights (slow updates). Lumyn's memory is append-only and queryable in real-time:
$ lumyn label 01JBQX... --label failure
# Immediately affects next decision (no retraining)Next Steps
- Lumyn Memory - Real-time learning without fine-tuning
- Lumyn vs RAG - Another model enhancement technique vs governance
- Replay Guarantees - Why determinism matters for audits
- Quickstart - Implement policy-driven gates in 5 minutes