Durable Jobs
Durable Jobs
Use this page when your agent work can run for minutes to hours and you need deterministic lifecycle control with verifiable evidence.
What A Durable Job Is
A durable job is a checkpointed execution record managed locally by Gait with explicit lifecycle commands:
submitstatuscheckpoint add|list|showpauseapproveresumecancelinspect
The job surface is for runtime control and evidence, not prompt orchestration.
When To Use This
- multi-step agent workflows can fail mid-run and must resume deterministically
- human approvals are required before continuation
- operators need inspectable state transitions and stable stop reasons
- CI or incident workflows need portable evidence from job state
When Not To Use This
- tasks are short-lived and retries are trivial
- no Gait CLI/artifact path is available in the runtime
- you only need hosted traces and dashboards without local enforcement or artifact verification
Minimal Lifecycle
gait job submit --id job_1 --json
gait job checkpoint add --id job_1 --type progress --summary "step 1 complete" --json
gait job pause --id job_1 --json
gait job approve --id job_1 --actor reviewer_1 --reason "validated input" --json
gait job resume --id job_1 --actor worker_1 --reason "continue after approval" --json
gait job inspect --id job_1 --json
gait job status --id job_1 --jsonArtifact And Verification Path
Durable jobs produce state under the job root (default ./gait-out/jobs) and can be promoted to a pack:
gait pack build --type job --from job_1 --json
gait pack verify ./gait-out/pack_job_1.zip --json
gait pack inspect ./gait-out/pack_job_1.zip --jsonPortable evidence outputs:
- job lifecycle state/events under
./gait-out/jobs pack_(PackSpec v1 envelope).zip - deterministic verify/inspect JSON for CI, incident handoff, and audits
How This Differs From Checkpoint/Observability Tools
| Dimension | Gait durable jobs | LangChain/LangFuse-style checkpoint and observability stacks |
|---|---|---|
| Primary role | runtime control + evidence contract | orchestration tracing, hosted observability, debugging UX |
| Enforcement boundary | tool boundary with fail-closed non-execute rule | usually orchestration-time controls, not portable side-effect enforcement contract |
| Artifact portability | signed/offline-verifiable packs and traces | service-backed trace state, often not cryptographically portable by default |
| CI regression loop | first-class regress fixture and stable exit semantics |
typically custom harnesses around exported traces |
| Offline operation | core verify/diff/regress/job operations run locally | hosted components commonly required for full feature set |
This is a complementary model: teams can keep hosted observability while using Gait for enforceable runtime boundaries and deterministic evidence.
Better Fit Vs Not Necessary
Better fit:
- regulated or high-risk tool execution
- production incident-to-regression loops
- multi-team workflows requiring independently verifiable evidence artifacts
Not necessary:
- experimentation with no external side effects
- prototypes where deterministic replay and auditability are out of scope
Integration Anchors
- CLI entrypoint:
cmd/gait/job.go - runtime implementation:
core/jobruntime/ - pack conversion/verification:
core/pack/ - representative adapter path:
examples/integrations/openai_agents/quickstart.py