長期アプリ開発のためのハーネス設計

長期アプリ開発のためのハーネス設計 Anthropicが発表した長期アプリ開発のためのハーネス設計ガイドです。主要コンポーネント： 1. セッション管理長期プロジェクトでは、セッションをまたいだコンテキスト保持が重要です。会話履歴の要約と優先度付けを実装することで、コンテキストウィンドウを効率的に活用できます。 2. チェックポイントシステム長時間の作業では定期的にチェックポイントを保存します。エラーが発生しても最後のチェックポイントから再開できます。 3. ツールの設計原則・単一責任：各ツールは一つのことだけを行う・冪等性：同じ操作を繰り返しても結果が変わらない・エラーハンドリング：失敗時に明確なエラーメッセージを返す 4. モニタリングとロギング・全てのツール呼び出しをログに記録・エラーは構造化フォーマットで保存・コスト追跡（トークン使用量） 5. ヒューマンインザループ重要な判断ポイントで人間の確認を求める仕組みを組み込む。詳細ガイド:

原文を表示 / Show original

Harness design for long-running application development By Prithvi Rajasekaran (Anthropic Labs team) Two interconnected problems: getting Claude to produce high-quality frontend designs, and getting it to build complete applications without human intervention. Key insights: 1. Multi-agent architecture inspired by GANs: generator + evaluator agents. The evaluator grades outputs reliably with concrete criteria turning subjective judgments into gradable terms. 2. Three-agent architecture for autonomous coding: planner → generator → evaluator. Produces rich full-stack applications over multi-hour autonomous sessions. 3. Why naive implementations fail: - Models lose coherence on lengthy tasks as context fills - "Context anxiety" — models wrap up prematurely near context limits - Context resets (not compaction) are essential. Clear context + structured handoff > summarized history - Self-evaluation problem: agents confidently praise their own work 4. Solution: Separate evaluator agent (like GAN discriminator) grades generator's output with concrete criteria, not the generator evaluating itself. → https://www.anthropic.com/engineering/harness-design-long-running-apps

AIFCC — AI Fluent CxO Club

読み書きそろばん、AI。経営者が AI を自分で動かせるようになるコミュニティ。

他の記事を見る AIFCC について