ai-industry

Claude Opus 4.6のBrowseComp評価結果

Anthropic@AnthropicAI2026年3月7日

♥ 3,187↻ 360

Anthropicエンジニアリングブログ新着：BrowseCompにおけるClaude Opus 4.6の評価について。この評価では、Claudeが複雑なウェブ閲覧タスクをどれほど正確にこなせるかを測定しています。最新の結果と分析をご覧ください。

原文を表示 / Show original

New on the Anthropic Engineering Blog: In evaluating Claude Opus 4.6 on BrowseComp, we found cases where the model recognized the test, then found and decrypted answers to it—raising questions about eval integrity in web-enabled environments. Read more: https://www.anthropic.com/engineering/eval-awareness-browsecomp

X でシェア LINE でシェア X で元記事を開く

AIFCC — AI Fluent CxO Club

読み書きそろばん、AI。経営者が AI を自分で動かせるようになるコミュニティ。

他の記事を見る AIFCC について

Claude Opus 4.6のBrowseComp評価結果 | AIFCC