LLM output should not automatically become action. The AiGentsy Acceptance Runtime routes model output through policy, evidence, and HoverStack reuse — and decides whether the downstream consequence is allowed, blocked, or held.
5 deterministic benchmark fixtures · raw_output hand-written (not a live LLM call) · HoverStack metrics labeled estimated · live cross-model benchmark (GPT / Claude / Gemini / Llama / Qwen) deferred to a controlled provider-benchmark harness pass.
Developers can post raw model output to /acceptance-runtime/evaluate to receive a runtime decision, evidence record, and export path. The runtime returns the same decision shape (accepted / rejected / retry / escalated) and consequence state (allowed / blocked / held) you see in the 3-lane comparison below.
Recall what was proven. Accept what is allowed. Prove what happened. Verify the record. Settle only when consequence is authorized.
Methodology defined in /data/inference_acceptance_scenarios.json · estimated metrics labeled est
/verify.html#fetch:demo_deal_inference_<id>_v1 — the same 5-step verifier the handoff demo uses.
Back to the Vault →
The same five deterministic fixtures, viewed as a presentation layer over existing Acceptance Runtime, HoverStack reuse, ProofPack export, and verifier outputs. Nothing here is invented or persisted — every line below derives from a fixture or runtime field that already exists.
Deterministic demo fixtures only — no live LLM call. Live cross-model evaluation is the scope of the /acceptance-runtime/benchmarks operator harness. Every Savings Trace item is labeled measured, estimated, demo/reference, or provider-measured; default fixture labels are demo/reference + estimated.
This demo replays real HoverStack decisions from CUDA-validated benchmark runs, driving live settlement through our production protocol. Settlement is one of several consequence types AiGentsy gates — deployment, handoff, API action, procurement, and inference acceptance follow the same accept-before-consequence pattern shown in the six held-consequence cards below and in the Inference Acceptance Layer above. The compute decisions shown are from actual Qwen2.5-7B inference runs. The ProofPack is cryptographically real and appears in our production Merkle log. Settlement fires through real test-mode Stripe. ProofPack Reuse, our benchmark-proven v1.7 mechanism, eliminates redundant compute when agents encounter already-attested work.
Creates a real demo agent, deal, and ProofPack on our production protocol.
Signed REJECTED event recorded in the Vault — reason and failed checks travel in the bundle. Dispute path opened.
View signed rejection record →The bundle is a real, offline-verifiable cryptographic artifact from our production Merkle log.
The demo above is the happy path: the agent works, a proof is created, the proof verifies, and the buyer accepts — so settlement fires. The scenario below is the edge case the wedge is built around. Cryptographic verification and acceptance are two different gates. A proof can be authentic, untampered, and traceable to the mandate, and the acceptance policy can still reject it — in which case settlement, release, deployment, or handoff is held. The signed REJECTED event with reason and failed checks is what makes the rejection auditable.
A proof can verify cryptographically and still fail acceptance. In this example, the proof bundle is valid, but the acceptance policy rejects because required checks are missing. Settlement or downstream action is held.
This demo mirrors the settlement-native-mcp starter policy fixture and adapter contract (acceptance_policy.example.json + adapter_contract.example.json). The three booleans below are runtime-compatible policy fields — the adapter output is validated by starter_boolean_validator into normalized_policy_inputs before acceptance evaluates. The signed bundle freezes the AdapterEvaluation (adapter_id, adapter_version, contract_hash, input_schema_hash, input_hash, output_hash, validation_result); bundle_hash binds it.
Each row in the Vault is a real signed REJECTED event with reason and failed checks — the bundle passes aigentsy-verify offline. Signed rejection records include the policy_snapshot and evaluated_inputs needed to replay the decision; the bundle hash binds every byte.
Want to register your own adapter contract? Start with aigentsy adapter scaffold --id your.adapter --version 0.1.0 --validator boolean, then lint it against the AdapterContract schema. Docs →
Six live test-mode consequence gates show the same invariant: proof can verify while acceptance fails and downstream consequence stays held. AdapterContracts validate signals into typed inputs. The counterparty defines the standard. AiGentsy enforces it.
"Verified but Rejected", "Payout held", "Deployment held", "Handoff held", "API Action held", and "Procurement held" are all backed by live starter policy fixtures — click Run on each card to drive a real bundle through the gate. Each scenario uses test-mode consequence semantics; no real money moves, no real deployment triggers, no real external handoff fires, no real API call is made, no real purchase order is created. Every exported bundle replays offline with aigentsy-verify.
payout_held · no_funds_moved=truedeployment_held · no_deployment_triggered=truehandoff_held · no_handoff_triggered=trueapi_action_held · no_api_action_triggered=trueprocurement_held · no_purchase_order_created=trueaigentsy-verify.aigentsy-verify.aigentsy-verify.aigentsy-verify.aigentsy-verify.