ハーネスこそがプロダクト、モデルは違う

♥ 437↻ 55👁 150,000

ハーネスこそがプロダクト。モデルは違う LangChainのコーディングエージェントは、モデルパラメータをゼロ変更してTerminal Bench 2.0でTop 30からTop 5（52.8%→66.5%）へ跳躍しました。ハーネスを変えたのです。Gartnerは2027年までにエージェントAIプロジェクトの40%以上がキャンセルされると予測しています。7つの独立した研究が、複雑なエンタープライズタスクでエージェントが70〜95%の確率で失敗することを確認しています。一般的な見方ではモデルがボトルネックとされています。しかし実際にテストすると、モデルはコモディティです。Claude、GPT-5、Gemini 2.5は四半期ごとに能力が収束しています。プロダクションエージェントをリリースしているチームは、最高のモデルアクセスではなく、最高のハーネスを持っているチームです。分岐点は、どのモデルを選ぶかではなく、どのシステムがモデルを包むかです。

原文を表示 / Show original

The Harness Is The Product. The Model Never Was LangChain's coding agent jumped from 52.8% to 66.5% on Terminal Bench 2.0 — Top 30 to Top 5 — by changing zero model parameters. They changed the harness. Gartner predicts that over 40% of agentic AI projects will be canceled by 2027. Seven independent studies confirm agents fail 70-95% of the time on complex enterprise tasks. The consensus view is that the model is the bottleneck. When you test this, the model is a commodity. Claude, GPT-5, Gemini 2.5 — they converge on capability every quarter. The teams shipping production agents are the ones with the best harness, not the best model access. The fork in the road: which system wraps around the model, not which model you pick.

AIFCC — AI Fluent CxO Club

読み書きそろばん、AI。経営者が AI を自分で動かせるようになるコミュニティ。

他の記事を見る AIFCC について