harness-designagent-opsclaude-workflow

Resolver：エージェントの知性をルーティングする

Garry Tan@garrytan2026年4月16日

♥ 113↻ 7

「Thin Harness, Fat Skills」で、実際に機能するエージェントシステムを構築するための5つの定義を紹介しました。スキルがすべての注目を集めました。スキル・アズ・メソッドコールパターン、ダイアリゼーションの概念、薄いハーネスアーキテクチャをブックマークしてくれました。良いです。それらは重要です。しかし、ほとんど注目されなかったのが最も重要なものです。リゾルバーです。そしてそれが無視された理由は、それが非常に重要な理由と同じです：うまく機能しているときは見えない、うまくいかないときは壊滅的です。リゾルバーはコンテキストのルーティングテーブルです。タスクタイプ X が現れたら、最初にドキュメント Y を読み込む。それだけです。1文。でもその1文が、インテリジェンスを積み重ねるエージェントと、知っていることをゆっくり忘れるエージェントの差です。これは私がその教訓を辛い経験から学んだ話です。 ## 20,000行の告白私の CLAUDE.md は20,000行でした。これを誇りに思っていません。すべての癖、すべてのパターン、Claude Code で経験したすべての教訓、コードベースのすべての規約、焼きついたすべてのエッジケース。追加し続けました。ファイルは成長し続けました。生産的に感じました。モデルをより賢くしていると感じました。違いました。溺れさせていました。モデルの注意が低下しました。レスポンスが遅くなり精度が落ちました。Claude Code は文字通り削減するよう言ってきました。そうなったときにやりすぎたとわかります——AI が話すのをやめるよう言ってきます。本能は自然です。モデルにすべてを知ってほしい。だからすべてをシステムプロンプト、指示ファイル、コンテキストウィンドウに詰め込みます。モデルを近くに置くことで全知にしようとしています。機能しません。大声で叫んで誰かをより賢くすることはできません。正しい瞬間に正しい本を与えることで賢くします。修正は約200行でした。番号付きの意思決定ツリー。ドキュメントへのポインタ。モデルが何かをファイルする必要があるとき、ツリーを歩きます： - 人物？→ /people/ ディレクトリ - 会社？→ /companies/ ディレクトリ - 政策分析？→ /civic/ ディレクトリ 20,000行の知識が要求に応じてアクセス可能、コンテキストウィンドウを汚染することなく。その200行のファイルがリゾルバーです。20,000行の指示を置き換えました。そしてシステムはすぐに良くなりました——より速いレスポンス、より正確なファイリング、より少ない幻覚。モデルが賢くなったからではありません。ノイズで目を眩まさなくなったからです。 ## すべてを明らかにした誤ファイリング Will Manidis のエッセイ「No New Deal for OpenAI」——OpenAI の産業政策ブリーフの壊滅的な政策分析——をエージェントに取り込むよう頼みました。会社の規制戦略を分解し、政治的な意味合いをマップし、機関的アクターを名指しするような作品です。鋭い市民分析です。エージェントは `sources/` にファイルしました。間違いです。`sources/` は生データのダンプと大量インポート用です。CSV ファイル。API エクスポート。スクレイピングされたデータセット。これは政治分析でした——政策の作品、政治的アクター、制度的ダイナミクスが生きている `civic/` に属しています。なぜ起きたのか？idea-ingest スキルに `brain/sources/` がデフォルトディレクトリとしてハードコードされていました。リゾルバーを参照しなかったのです。スキル自体に半端な独自のファイリングロジックが焼き込まれていました。明示的なパスが与えられないと、怠惰なインターンがすべてを「その他」フォルダに投げ込むように `sources/` にデフォルトしました。 1つの誤ファイリングされた記事。修正して先に進むこともできました。代わりに糸を引っ張りました。 ## 監査 Manidis の誤ファイリングを発見したとき、ブレインに書き込むすべてのスキルを監査しました。13個ありました。記事、PDF、ミーティングの録音、ビデオ、投資家のアップデート、ボイスメモ、ツイートを取り込むスキル。それぞれがブレインリポジトリにページを書き込みます。 **13のうち3つだけがリゾルバーを参照していました。** 残りの10はハードコードされたパスを持っていました。idea-ingest は `sources/` にデフォルト。pdf-ingest は `originals/` にデフォルト。meeting-ingest は `meetings/` に書き込み。各スキルは独自のファイリング前提を内面化していました。それぞれが誤ファイリングの待機中でした。これがエージェントシステムを殺すパターンです。劇的な失敗ではありません。ナンセンスを生み出す幻覚ではありません。情報が間違った場所に行き、繋がりが形成されず、ナレッジベースが14,700ファイルを持つジャンクドロワーになり、構造化されたインテリジェンスレイヤーではなくなるという静かなゆっくりとした漂流です。修正は10のスキルを個別に修正することではありませんでした。それはモグラ叩きです。1つを修正して、別のものが漂流します。修正は共有のファイリングルールドキュメント——`_brain-filing-rules.md`——と、ブレインに書き込むすべてのスキルが新しいページを作る前に `RESOLVER.md` を読むという命令でした。1つのルール。10個のスキルが修正されました。ファイリングルールドキュメントは一般的な誤ファイリングパターンもカタログします。sources vs. originals。people vs. companies（誰かが会社である場合）。civic vs. sources（Manidis の場合）。すべてのミス、記録され、同じミスが別の方法で起きないようにします。それ以来、誤ファイリングはゼロ。ブレインに書き込むすべての新しいスキルは今や最上部に2行の命令を持ちます：*新しいブレインページを作る前に、`brain/RESOLVER.md` と `skills/_brain-filing-rules.md` を読んでください。ソースフォーマットやスキル名ではなく、主要な主題でファイルしてください。* ## 見えないスキルの問題上の例はメモリリポジトリのどこにファイルを置くかについて話していますが、スキルファイル（ファットスキル）と呼び出すコード（ファットコード）にも当てはまります。リゾルバーはスキルにタスクをルーティングします。でもスキルが存在してリゾルバーがそれを知らない場合、何が起きますか？ OpenClaw のために、エグゼクティブアシスタントスキルの内部にシグネチャー追跡システムを構築しました。完璧に機能しました。DocuSign の締め切りを追跡し、未署名の文書を表面化し、リマインダーを下書きしました。美しい設計です。完全に見えない。誰かが「署名を確認して」または「何に署名する必要があるか」と聞いたとき、システムは肩をすくめました。リゾルバーに署名のためのトリガーがなかったのです。スキルは存在していました。能力は存在していました。システムがそれに届けませんでした。スタッフに外科医がいるのに病院のディレクトリに載っていないようなものです。これはスキルが全くないよりも悪いです。欠けているスキルは正直です——システムは「それができない」と言い、構築する必要があることがわかります。存在するが到達できないスキルは能力の幻想を作ります。システムが署名を処理すると思っています。しない。そして、それが重要な瞬間まで気づきません。 1ヶ月の構築の後、40以上のスキルがありました。特定のインシデントへの対応で作られたもの、午前3時にクロンを実行しているサブエージェントによって生み出されたもの。誰もリゾルバーテーブルを維持していませんでした。スキルが生まれていたが登録されていませんでした。システムは持っている能力を知りませんでした。だからリゾルバートリガー評価を構築しました。50のサンプル入力と期待されるアウトプットのテストスイート： ``` 入力: 「署名を確認して」期待: executive-assistant (signature セクション) 入力: 「Pedro Franceschi は誰ですか」期待: brain-ops → gbrain search 入力: 「この記事をブレインに保存して」期待: idea-ingest + RESOLVER.md ``` 2つの失敗モード。偽陰性：スキルは発火すべきだが発火しない（トリガーの説明が間違っているか欠けているため）。偽陽性：間違ったスキルが発火する（2つのトリガーが重なるため）。両方ともマークダウンを編集することで修正可能。コード変更なし。リゾルバーはドキュメントで、ドキュメントは安く修正できます。 Claw に言いました：「リゾルバーがテストされていること、リゾルバーを使うすべてのプロンプトとスキルのための適切な評価 LLM テストがあることを確認してください。」これは任意ではありません。正しいスキルが正しい入力のために発火することを証明できなければ、システムがある。スキルのコレクションと祈りがある。 ## メタスキルトリガー評価はルーティングの失敗をキャッチします。しかし、より深い問題があります：存在するがリゾルバーからのパスがまったくないスキル。間違ったパスではなく——パスなし。発火すべきだが発火しなかったスキルをデバッグしていました。いつもの手順：トリガーの説明を確認、リゾルバーテーブルを確認、チェーンを辿る。そしてスキルが到達可能かどうかを確認する系統的な方法がないことに気づきました。一度に1つのスキルを確認できました。すべてを確認できませんでした。そこで `check-resolvable` を作りました。チェーン全体——AGENTS.md →スキルファイル→コード——を歩いてデッドリンクを見つけるメタスキルです。エージェントに言いました：「agents.md リゾルバーからこの実行までの直接のラインがあるか確認してください。そしてこれを 'check-resolvable' スキルとして覚えておいてください。スキルはこのスキルまたはコードパスがリゾルバーで直接言及されているか、リゾルバーの何かから呼び出せるかどうかを実際に確認すべきです。そうでなければ、どの解決可能なスキルがそれを呼び出すべきかを考え出してください。」最初の実行で6つの到達不可能なスキルが見つかりました。システムが構築したが届けられなかった6つの能力。フライト追跡システムは、誰もフライトについて尋ねて呼び出せませんでした。コンテンツアイデアジェネレーターはクロンでのみ実行され、手動でトリガーできませんでした。引用修正器はスキルディレクトリに存在したがリゾルバーに全くリストされていませんでした。 6つ。40以上のうち。システムの能力の15%が暗闇でした。 1時間で修正されました。AGENTS.md にトリガーを追加しただけです。今や check-resolvable は週次で実行されます。リゾルバーに相当するリンター——ユーザーがハードウェイで発見する前に壊れているものを教えてくれます。 ## コンテキストロットリゾルバーについて誰も教えないことがあります：腐ります。 1日目、ルーティングテーブルは完璧です。すべてのスキルが登録されています。すべてのトリガーが正確です。すべてのパスが解決します。天才のように感じます。 30日目、誰もリゾルバーに追加しなかった3つの新しいスキルが存在します。実際のニーズへの対応で構築され、午前3時にクロンを実行しているサブエージェントによって、テーブルを更新した人は誰もいませんでした。 60日目、2つのトリガーの説明がユーザーが実際にフレーズする方法と一致しません。スキルは「このフライトを追跡して」を処理しますが、ユーザーは「フライトは遅延していますか？」と言います。説明は1つのことを言います。ユーザーは別のことを言います。スキルは発火しません。 90日目、リゾルバーは歴史的なドキュメントです。システムが*かつて*何をできたかのアーティファクト。今何ができるかではなく。システムが漂流していることに気づきました。スキルが直接の指示——「skills/flight-tracker/SKILL.md を読んでください」——によって呼び出されていました。リゾルバーが正しいトリガーを持っていなかったからです。どのスキルを呼ぶか知っているから機能していました。それはシステムではありません。それはファイリングキャビネットを持つ人です。昨日、YC の企業とのオフィスアワーで、CTOが聞いてきました：「RLM はリゾルバー周りのコンテキストロットを特に解決するために使えますか？」アイデア：システムがすべてのタスクのディスパッチを観察する強化学習ループ。どのスキルが発火したか。しなかったか。マッチするものがなかったタスク。間違ったスキルにマッチしたタスク。そして定期的に——おそらく毎晩、おそらく毎週——観察されたエビデンスに基づいてリゾルバーを書き換えます。人間がテーブルを維持するのではなく。テーブルが自分自身を維持します。 1ヶ月で800件のタスクディスパッチ。システムは「フライトは時間通りですか」が flight-tracker をトリガーしないが「フライトを確認して」がすることを見ます。トリガーの説明を書き換えます。システムは pdf-ingest が投資家のアップデートメールに対して発火するが、investor-update-ingest が最初にそれをキャッチするべきだったことを見ます。優先度を調整します。これは先を見越しています。まだ完全には構築していません。Claude Code の AutoDream システム——アイドル時間中のメモリ統合——は原始的なバージョンです。蓄積されたコンテキストをレビューして圧縮します。そのプリンシプルを特にリゾルバーに適用すると、使用で改善されるルーティングテーブルが得られます。自分自身のトラフィックから学ぶリゾルバー。それがエージェントガバナンスのエンドゲームです。 ## リゾルバーはフラクタル最後に1つの原則があります。それですべてがクリックします。リゾルバーは構成されます。それらはシステムのすべてのレイヤーに存在します。トップだけではなく。 **スキルリゾルバー**は AGENTS.md に存在します。タスクタイプをスキルファイルにマッピングします。「この人は誰？」→ brain-ops。「この PDF を取り込んで」→ pdf-ingest。「カレンダーを確認して」→ google-calendar。これが全員が考えるものです。 **ファイリングリゾルバー**は RESOLVER.md に存在します。コンテンツタイプをディレクトリにマッピングします。人物→ `people/`。会社→ `companies/`。政策分析→ `civic/`。これが Manidis の誤ファイリングをキャッチしたものです。 **コンテキストリゾルバー**は各スキルの内部に存在します。エグゼクティブアシスタントスキルが発火するとき、独自の内部ルーティングがあります：メールトリアージは一方向に行き、スケジューリングは別の方向に行き、シグネチャー追跡は3番目の方向に行きます。スキル内のサブルーティングです。 Claude Code はすでにこのパターンを持っています。すべてのスキルに説明フィールドがあります。モデルはユーザーの意図をスキルの説明に自動的にマッチします。`/ship` が存在することを覚えなくていいです。説明がリゾルバーです。全部下にリゾルバーがあります。同じアーキテクチャが、すべてのレイヤーで。それが5つのスキルから50に、1,000ファイルから25,000に、おもちゃのデモから1日200インプットを処理する本番システムにスケールする理由です。 ## 全体の形これをまとめましょう。リゾルバーは、20,000行の詰め込まれたコンテキストを置き換えた200行のマークダウンです。欠けているとき、スキルは独自のファイリングロジックを発明してすべてがゆっくり劣化します。存在するがテストされていないとき、能力が暗くなります——病院が見つけられない外科医がいます。テストされているが静的なとき、90日以内に腐ります。テストされて自己回復するとき、システムは積み重なります。パターン： 1. 正しい瞬間に正しいコンテキストを読み込む。詰め込まない。 2. すべてのスキルがリゾルバーを参照することを義務付ける。個別のファイリングロジックを信頼しない。 3. アウトプットだけでなくルーティングをテストする。トリガー評価。 4. 到達可能性を監査する。check-resolvable。週次。 5. リゾルバーが自身のトラフィックから学ぶようにする。エンドゲーム。リゾルバーはエージェントシステムのガバナンスレイヤーです。交通警察、ファイリング担当、組織図、機関記憶、すべてがモデルが200ミリ秒で読める1つのドキュメントに。ほぼ誰も明示的にそれを構築していません。全員がシステムプロンプトに20,000行を詰め込んで、モデルがあるべきよりも愚かに見える理由を疑っています。モデルは愚かではありません。溺れています。ルーティングテーブルを与えて、何が起きるかを見てください。 --- ## 私が実際に何を構築していたかこれまで、私はリゾルバーを技術的なパターンとして説明してきました。エージェントをより良く機能させる方法。タスクをルーティングする。正しいコンテキストを読み込む。モデルを溺れさせない。そのフレーミングは真実です。でも小さすぎます。私が実際に構築したのは、マネジメントに近いものです。 40以上のスキルと25,000ファイルを持つ実際のシステムで何が起きているかを考えてみてください。コードだけがあるのではありません。組織があります。スキルは従業員です。各々に能力があります。専門家もいます。ジェネラリストもいます。クロンでのみ実行されるものもいます。ユーザー向けのものもいます。リゾルバーは組織図です。誰が何を処理するかを定義し、タスクがどのようにルーティングされるかを定義し、何かがマッチしないときに何が起きるかを定義します。エスカレーションロジックでもあります——1つのパスが失敗したとき、次はどこに行くか？ファイリングルールは内部プロセスです。情報がどこに住むか。決定がどのように記録されるか。「人物」vs「会社」vs「政策分析」が何を意味するか。それがなければ、ナレッジベースはありません。ジャンクドロワーがあります。 check-resolvable は監査とコンプライアンスです。コードが美しいかどうかには関心がありません。よりシンプルな質問をします：システムは主張することを実際にできますか？存在するが到達できない能力はありますか？トリガー評価はパフォーマンスレビューです。実際の入力が与えられたとき、組織の正しい部分が対応しますか？そうでなければ、モデルを再訓練しません。説明を修正します。ルーティングを更新します。組織を読み取り可能にします。このように見ると、エージェントに関する多くの混乱が消えます。問題はモデルが十分にスマートでないことではありません。問題は、マネジメントレイヤーのない組織を構築してきたことです。ただの才能ある従業員の山と、協調してくれるという漠然とした希望だけです。リゾルバーがその欠けているレイヤーです。そしてそのように扱うと、目標が変わります。ただのツールをつなぐだけでなく。成長し、適応し、時間をかけて一貫性を保てる組織を設計しています。それは別の問題です。そして、ずっと大きな問題です。 --- ## あなた自身のブレインを構築してほしいこの記事のすべて——リゾルバーパターン、トリガー評価、check-resolvable、ファイリングルール、自己回復ループ——は本番で毎日実行されています。毎日200インプットを処理します。25,000ファイルがあります。積み重なります。オープンソース化しました。オープンソースプロジェクト GBrain にはリゾルバーパターンが組み込まれています。`gbrain init` が RESOLVER.md、意思決定ツリー、曖昧解消ルールを作ります。あなたのエージェントは最初から正しくファイリングし始めます。check-resolvable スキルが組み込まれています。これらのパターンを壊しながら発見する必要はありません——システムがそれらを体現しています。 GStack がコーディングレイヤーです。マークダウンでのファットスキル。GitHub で72,000以上のスター。GStack のスキルが GBrain の知識を呼び出します。合わさって完全なアーキテクチャです：タップで利用可能なインテリジェンス。 OpenClaw または Hermes Agent がコンダクター——エージェントループを実行し、セッションを管理し、クロンを実行する薄いハーネスです。GBrain と GStack はそれにプラグインするスキルです。あなたのエージェントは答える前に GBrain のコンパイルされた真実を読みます。あなたのクロンが眠っている間にロールアッププロセスを実行します。これは SaaS 製品ではありません。アーキテクチャです。ソースコードはオープンです。スキルはマークダウンです。ブレインはあなたが所有する git リポジトリです。明日何かが消えても、あなたの知識はプレーンテキストファイルとして生き残ります。これが個人ソフトウェアの新しい夜明けです。これはパッケージ化されたソフトウェアではありません。自分自身のために構築するソフトウェアですが、ファットスキルとファットコードと薄いハーネスで、それがあなた自身の個人的なミニAGIです。未来はすでにここにあり、あなたのポケットに入れてほしいと思っています。アーキテクチャはインデックスカードに収まります。知識は git リポジトリに収まります。欠けているのはあなたが始めることだけです。 --- GBrain で OpenClaw または Hermes Agent にあなたの個人的なミニAGI を構築する： github.com/garrytan/gbrain GStack で Claude Code でより速く構築する： github.com/garrytan/gstack

原文を表示 / Show original

In "Thin Harness, Fat Skills", I introduced five definitions for building agent systems that actually work. Skills got all the attention. People bookmarked the skill-as-method-call pattern, the diarization concept, the thin harness architecture. Good. Those matter. But the one that got almost no attention is the one that matters most. Resolvers. And the reason they got ignored is the same reason they're so important: they're invisible when they work, and catastrophic when they don't. A resolver is a routing table for context. When task type X appears, load document Y first. That's it. One sentence. But that one sentence is the difference between an agent that compounds intelligence and an agent that slowly forgets what it knows. This is the story of how I learned that the hard way. The 20,000-line confession My CLAUDE.md was 20,000 lines. I'm not proud of this. Every quirk, every pattern, every lesson I'd ever encountered with Claude Code, every convention for my codebase, every edge case I'd been burned by. I kept adding. The file kept growing. It felt productive. It felt like I was making the model smarter. I wasn't. I was drowning it. The model's attention degraded. Responses got slower and less precise. Claude Code literally told me to cut it back. That's when you know you've gone too far — the AI is telling you to stop talking. The instinct is natural. You want the model to know everything. So you cram everything into the system prompt, the instructions file, the context window. You're trying to make the model omniscient by proximity. It doesn't work. You can't make someone smarter by shouting louder. You make them smarter by giving them the right book at the right moment. The fix was about 200 lines. A numbered decision tree. Pointers to documents. When the model needs to file something, it walks the tree: Is it a person? → /people/ directory A company? → /companies/ directory A policy analysis? → /civic/ directory Twenty thousand lines of knowledge, accessible on demand, without polluting the context window. That 200-line file is the resolver. It replaced 20,000 lines of instructions. And the system immediately got better — faster responses, more accurate filing, fewer hallucinations. Not because the model got smarter. Because I stopped blinding it with noise. The misfiling that revealed everything I asked my agent to ingest Will Manidis's essay "No New Deal for OpenAI" — a devastating policy analysis of OpenAI's industrial policy brief. It's the kind of piece that breaks down a company's regulatory strategy, maps the political implications, names the institutional actors. Sharp civic analysis. The agent filed it in `sources/`. Wrong. `sources/` is for raw data dumps and bulk imports. CSV files. API exports. Scraped datasets. This was political analysis — it belongs in `civic/`, where policy pieces, political actors, and institutional dynamics live. Why did it happen? The idea-ingest skill had hardcoded `brain/sources/` as the default directory. It didn't consult the resolver. It had its own half-assed filing logic baked into the skill itself. When no explicit path was given, it fell back to `sources/` the way a lazy intern throws everything in the "misc" folder. One misfiled article. I could have fixed it and moved on. Instead I pulled the thread. The audit When I caught the Manidis misfiling, I audited every skill that writes to the brain. I have 13 of them. Skills for ingesting articles, PDFs, meeting transcripts, videos, investor updates, voice notes, tweets. Each one writes pages to the brain repo. Only 3 out of 13 referenced the resolver. The other 10 had hardcoded paths. Idea-ingest defaulted to `sources/`. PDF-ingest defaulted to `originals/`. Meeting-ingest wrote to `meetings/`. Each skill had internalized its own filing assumptions. Each one was a potential misfiling waiting to happen. This is the pattern that kills agent systems. Not a dramatic failure. Not a hallucination that produces nonsense. A slow, silent drift where information goes to the wrong place, connections don't form, and the knowledge base gradually becomes a junk drawer with 14,700 files in it instead of a structured intelligence layer. The fix wasn't fixing 10 skills individually. That's whack-a-mole. You fix one, another drifts. The fix was a shared filing rules document — `_brain-filing-rules.md` — and a mandate that every brain-writing skill reads `RESOLVER.md` before creating any page. One rule. Ten skills fixed. The filing rules doc also catalogs common misfiling patterns. Sources vs. originals. People vs. companies (when someone IS a company). Civic vs. sources (the Manidis case). Every mistake, documented, so the same mistake can't happen a different way. Zero misfilings since. Every new skill that writes to the brain now has a two-line mandate at the top: *Before creating any new brain page, read `brain/RESOLVER.md` and `skills/_brain-filing-rules.md`. File by primary subject, not by source format or skill name.* The invisible skill problem The above example talks about where to put files in your memory repo, but it applies to skill files (fat skills) and code to call (fat code) as well. A resolver routes tasks to skills. But what happens when a skill exists and the resolver doesn't know about it? For my OpenClaw, we built a signature-tracking system inside the executive assistant skill. It worked perfectly. Tracked DocuSign deadlines, surfaced unsigned documents, drafted reminders. Beautiful piece of engineering. Completely invisible. When someone asked "check my signatures" or "what do I need to sign," the system shrugged. The resolver didn't have a trigger for signatures. The skill existed. The capability existed. The system couldn't reach it. It's like having a surgeon on staff but not listing them in the hospital directory. This is worse than not having the skill at all. A missing skill is honest — the system says "I can't do that" and you know to build it. A skill that exists but isn't reachable creates the illusion of capability. You think the system handles signatures. It doesn't. And you don't find out until the moment it matters. After a month of building, we had 40+ skills. Some created in response to specific incidents, others spawned by sub-agents running crons. Nobody was maintaining the resolver table. Skills were being born but not registered. The system had capabilities it didn't know it had. So I built resolver trigger evals. A test suite of 50 sample inputs with expected outputs: Input: "check my signatures" Expected: executive-assistant (signature section) Input: "who is Pedro Franceschi" Expected: brain-ops → gbrain search Input: "save this article to brain" Expected: idea-ingest + RESOLVER.md Two failure modes. False negative: skill should fire but doesn't, because the trigger description is wrong or missing. False positive: wrong skill fires, because two triggers overlap. Both fixable by editing markdown. No code changes. The resolver is a document, and documents are cheap to fix. I told my Claw: "Make sure the resolver is tested and also there are proper eval LLM tests for all the prompts and skills that use the resolver." This isn't optional. If you can't prove the right skill fires for the right input, you don't have a system. You have a collection of skills and a prayer. The meta-skill The trigger evals catch routing failures. But there's a deeper problem: skills that exist but have no path from the resolver at all. Not a wrong path — no path. I was debugging a skill that should have fired and didn't. The usual drill: check the trigger description, check the resolver table, trace the chain. And I realized there was no systematic way to verify that a skill was reachable. You could check one skill at a time. You couldn't check all of them. So I invented `check-resolvable`. A meta-skill that walks the entire chain — AGENTS.md → skill file → code — and finds dead links. I told my agent: "Check if there is a direct line between the agents.md resolver all the way to this running. And then remember this as a 'check-resolvable' skill. The skill should actually check if this skill or codepath is either directly called out in the resolver or callable via something in the resolver. And if it isn't, figure out what resolvable skill should call it." First run found 6 unreachable skills. Six capabilities the system had built but couldn't access. A flight tracker that nobody could invoke by asking about flights. A content-ideas generator that only ran on cron but couldn't be triggered manually. A citation fixer that existed in the skills directory but wasn't listed in the resolver at all. Six. Out of 40+. Fifteen percent of the system's capabilities were dark. Fixed in an hour. Just added triggers to AGENTS.md. Now check-resolvable runs weekly. It's the resolver equivalent of a linter — it tells you what's broken before a user discovers it the hard way. Context rot Here's the thing nobody tells you about resolvers: they decay. Day 1, the routing table is perfect. Every skill is registered. Every trigger is accurate. Every path resolves. You feel like a genius. Day 30, three new skills exist that nobody added to the resolver. They were built in response to real needs, by sub-agents running at 3 AM, and nobody updated the table. Day 60, two trigger descriptions don't match how users actually phrase things. The skill handles "track this flight" but users say "is my flight delayed?" The description says one thing. The user says another. The skill doesn't fire. Day 90, the resolver is a historical document. An artifact of what the system *used to* be able to do. Not what it can do now. I noticed the system was drifting. Skills were being invoked by direct instruction — "read skills/flight-tracker/SKILL.md" — instead of through the resolver, because the resolver didn't have the right triggers. The system worked because I knew which skill to call. That's not a system. That's a person with a filing cabinet. Yesterday, in office hours with a YC company, a CTO asked me: "Could an RLM be used to solve context rot particularly around resolvers?" The idea: a reinforcement learning loop where the system observes every task dispatch. Which skill fired. Which didn't. Which tasks had no match. Which tasks matched the wrong skill. And periodically — maybe nightly, maybe weekly — it rewrites the resolver based on observed evidence. Not a human maintaining a table. The table maintaining itself. Eight hundred task dispatches over a month. The system sees that "is my flight on time" never triggers flight-tracker but "check my flight" does. It rewrites the trigger description. The system sees that pdf-ingest fires for investor update emails, but investor-update-ingest should have caught them first. It adjusts priority. This is forward-looking. We haven't fully built it. Claude Code's AutoDream system — memory consolidation during idle time — is a primitive version. It reviews accumulated context and compresses it. Apply that principle to the resolver specifically, and you get a routing table that improves with use. A resolver that learns from its own traffic. That's the endgame for agent governance. Resolvers are fractal One more principle, and it's the one that makes everything click. Resolvers compose. They exist at every layer of the system, not just the top. The skill resolver lives in AGENTS.md. It maps task types to skill files. "Who is this person?" → brain-ops. "Ingest this PDF" → pdf-ingest. "Check my calendar" → google-calendar. This is the one everyone thinks of. The filing resolver lives in RESOLVER.md. It maps content types to directories. Person → `people/`. Company → `companies/`. Policy analysis → `civic/`. This is the one that caught the Manidis misfiling. The context resolver lives inside each skill. When the executive assistant skill fires, it has its own internal routing: email triage goes one way, scheduling goes another, signature tracking goes a third. Sub-routing within the skill. Claude Code already has this pattern. Every skill has a description field. The model matches user intent to skill descriptions automatically. You never have to remember that `/ship` exists. The description *is* the resolver. It's resolvers all the way down. The same architecture, at every layer. That's what makes it scale from 5 skills to 50, from 1,000 files to 25,000, from a toy demo to a production system that processes 200 inputs a day. The shape of the thing Let me pull this together. A resolver is 200 lines of markdown that replaced 20,000 lines of crammed context. When it's missing, skills invent their own filing logic and everything slowly degrades. When it's present but untested, capabilities go dark — you have a surgeon the hospital can't find. When it's tested but static, it rots within 90 days. When it's tested and self-healing, the system compounds. The pattern: Load the right context at the right moment. Don't cram. Mandate that every skill consults the resolver. Don't trust individual filing logic. Test the routing, not just the output. Trigger evals. Audit reachability. Check-resolvable. Weekly. Make the resolver learn from its own traffic. The endgame. The resolver is the governance layer of an agent system. The traffic cop, the filing clerk, the org chart, and the institutional memory, all in one document that a model can read in 200 milliseconds. Almost nobody is building them explicitly. Everyone is cramming 20,000 lines into the system prompt and wondering why the model seems dumber than it should be. The model isn't dumb. It's drowning. Give it a routing table and watch what happens. The thing I didn’t realize I was building Up to this point, I’ve been describing resolvers as a technical pattern. A way to make agents work better. Route tasks. Load the right context. Avoid drowning the model. That framing is true. It’s also too small. What I actually built is closer to management. Think about what’s happening in a real system with 40+ skills and 25,000 files. You don’t just have code. You have an organization. Skills are employees. Each one has a capability. Some are specialists. Some are generalists. Some only run on cron. Some are user-facing. The resolver is the org chart. It defines who handles what, how tasks get routed, and what happens when something doesn’t match. It’s also escalation logic — when one path fails, where does it go next? The filing rules are internal process. Where information lives. How decisions get recorded. What counts as a “person” vs a “company” vs a “policy analysis.” Without that, you don’t have a knowledge base. You have a junk drawer. check-resolvable is audit and compliance. It doesn’t care if the code is beautiful. It asks a simpler question: can the system actually do what it claims? Are there capabilities that exist but can’t be reached? Trigger evals are performance reviews. Given a real input, does the right part of the organization respond? If not, you don’t retrain the model. You fix the description. You update the routing. You make the org legible. Once you see it this way, a lot of the confusion around agents disappears. The problem isn’t that models aren’t smart enough. The problem is that we’ve been building organizations with no management layer. Just a pile of talented employees and a vague hope they’ll coordinate. Resolvers are that missing layer. And once you treat them that way, the goal changes. You’re not just wiring up tools. You’re designing an organization that can grow, adapt, and stay coherent over time. That’s a different problem. And a much bigger one. I want you to build your own brain Everything in this article — the resolver pattern, the trigger evals, check-resolvable, the filing rules, the self-healing loop — runs in production, every day, on my personal agent. It processes 200 inputs daily. It has 25,000 files. It compounds. I open-sourced the entire system. My open source project GBrain ships with the resolver pattern built in. `gbrain init` creates RESOLVER.md, the decision tree, and the disambiguation rules. Your agent starts filing correctly from day one. The check-resolvable skill comes built-in. You don't have to discover these patterns by breaking things — the system embodies them. GStack is the coding layer. Fat skills in markdown. 72,000+ stars on GitHub. The skills in GStack call the knowledge in GBrain. Together they're the full architecture: intelligence on tap. OpenClaw or Hermes Agent is the conductor — the thin harness that runs the agent loop, manages sessions, and executes crons. GBrain and GStack are skills that plug into it. Your agent reads GBrain's compiled truth before answering. Your crons run the rollup pipelines while you sleep. This isn't a SaaS product. It's an architecture. The source code is open. The skills are markdown. The brain is a git repo you own. If any piece disappeared tomorrow, your knowledge survives as plain text files. This is the new dawn of personal software. This is not packaged software. This is software that you build for yourself, but with the fat skills and fat code and thin harness that is your own personal mini-AGI. The future is already here, and I want you to have it in your pocket. The architecture fits on an index card. The knowledge fits in a git repo. The only thing missing is you starting. -- GBrain to build your personal mini-AGI in OpenClaw or Hermes Agent github.com/garrytan/gbrain GStack to help you build faster in Claude Code github.com/garrytan/gstack

X でシェア LINE でシェア X で元記事を開く

AIFCC — AI Fluent CxO Club

読み書きそろばん、AI。経営者が AI を自分で動かせるようになるコミュニティ。

他の記事を見る AIFCC について