Control D — HITL Bridge

Motivation

Some tools are irreversible: refund, delete_account, send_email. No amount of argument validation makes “the model decided to wire the money” acceptable without a human in the loop. Control D intercepts destructive tool calls and parks them for human approval instead of executing them — the real action happens later, only when a person approves.

Theory

A tool is destructive if its name matches the configured set under a match policy:

destructive(t)={tDexactdD:dtsubstring \text{destructive}(t) = \begin{cases} t \in D & \text{exact} \\ \exists\, d \in D : d \subseteq t & \text{substring} \end{cases}

where $D = $ hitl.destructive_tools. A destructive call is routed (not run): the bridge issues a PendingApproval through laravel-flow’s approvalGate() and returns a non-secret run reference to the model — never the approval token. On approval, a flow step runs the tool with the original arguments and the recorded principal.

Design

sequenceDiagram participant Model participant Gate as ApprovalGatedTool participant Flow as laravel-flow participant Human Model->>Gate: handle(destructive args) Gate->>Flow: route() → PendingApproval(runId) Gate-->>Model: "requires human approval. Ref: runId" Note over Gate,Flow: tool has NOT run Human->>Flow: approve(token) Flow->>Flow: ToolApprovalHandler runs the tool

Data model

Concept Shape
Destructive set hitl.destructive_tools['refund','delete','send_email']
Match policy tool_authorization.destructive_matchexact | substring
Unavailable fallback hitl.fallbackdeny (refuse) | pass (execute)
Execution allowlist hitl.allowed_tool_classes — FQCNs the handler may run (empty = no restriction)

Decision records

ADR-D1 · Never return the approval token to the model

Problem. The model needs something to relay to the user, but the approval token is a credential.

Decision. Return only the non-secret runId; the plain-text token never leaves the flow/DB layer.

Consequences. A conversation log or a model that relays its response cannot leak the approval credential.

ADR-D2 · Fail-closed when approval is unavailable

Problem. What if laravel-flow is absent or routing throws?

Decision. Default hitl.fallback=deny — refuse the destructive action. Any router exception is caught and also denies. pass is available for non-critical setups.

Consequences. A misconfigured approval system blocks destructive actions rather than letting them through.

ADR-D3 · Post-approval execution is allowlisted

Problem. If the flow DB row (which stores tool_class) is writable by an attacker, arbitrary tool invocation becomes possible.

Decision. hitl.allowed_tool_classes restricts which FQCNs ToolApprovalHandler may execute (empty = unrestricted; recommended: enumerate the destructive classes).

Consequences. Limits blast radius if the persistence layer is compromised.

Worked example

use Padosoft\AiGuardrails\Facades\AiGuardrails;

$gated = AiGuardrails::routeForApproval($refundTool, 'refund');
$result = (string) $gated->handle(new Request(['order_id' => 'A1']));
// → "This destructive action [refund] requires human approval. Reference: run-77 …"
// the tool has NOT executed.

Setting it up is turnkey — see the HITL guide and the ai-guardrails:hitl-install / ai-guardrails:hitl-status commands.

Gotchas

  • Control D needs laravel-flow installed and migrated. Run ai-guardrails:hitl-install, then verify with ai-guardrails:hitl-status (non-zero exit until HITL can actually gate a call).
  • In monitor mode the destructive call runs directly (with an observability log) — monitor is for shadow rollout, not protection.
  • Flow persistence (tokens, resume) is the host’s setup. The package provides the bridge; the host owns the database/flow configuration.