AIFCC
記事一覧へ
AIエージェントClaude Codeオーケストレーションagent-ops

自律AIコワーカーを作る10の要件:環境・文脈・スキル・フィードバックループ

Shiv@shivsakhuja
202
自律AIコワーカーを作るには何が必要か? OpenClawを発見して以来、スタートアップのためにAIエージェントコワーカーを構築することに夢中になっています。過去数ヶ月、OpenClaw / Claude Code / Claude Agent SDKと私たちのプロダクト @GooseworksAI を組み合わせた様々な構成を試してきました。目標は、常に監視しなくても24時間自律的に有益な仕事をこなすエージェントチームを持つことです。 まだ夢の状態には到達していませんが、近づいてきており、必要なものについて多くを学びました。Claude Codeのようなエージェントはすでに実際の作業をこなせるほど優秀であり、ボトルネックはすべてオーケストレーションにあることは明らかです。 AIコワーカーをオーケストレーションするために必要な要素を以下にまとめます。 0. 環境 AIコワーカーには自由に動ける環境が必要です。自分専用のコンピュータが必要です。ローカルでは常時稼働できませんし、個人マシンへのフルアクセスを与えるのはリスクがあります。OpenClawを使うならMac MiniやEC2インスタンス、GooseworksAIではe2bが提供するクラウドサンドボックスを使っています。 1. 文脈(コンテキスト) エージェントの質はコンテキストの質に依存します。コンテキストは一度の投入ではなく、継続的なストリームが必要です。具体的には:エージェントとの定期的な1on1、メールアドレス・電話番号・Slackアカウントの付与、ミーティングノートの同期、個人CRM(カレンダー・メール・ミーティングノート・LinkedInから自動更新)、NotionやLinearへのアクセス権限付与、ニュースレターの購読、コンテンツ摂取の設定などを行っています。 2. 永続性 ファイルシステムを使っています。各エージェントに大量のコンテキストをダンプ/ストリームしており、ほとんどの場合うまく機能しています。エージェントには重要情報を自動保存するメモリフォルダと、過去のチャット履歴を検索するツールがあります。構造化されたクエリ可能な永続性にはSQLite DBも有効で、例えばSEOエージェントは過去の作業追跡や重複排除にSQLite DBを使っています。ナレッジグラフや専用メモリ層の追加も検討中です。 3. スキル — 実行の原子単位 スキルとは、プロンプト+スクリプトにパッケージ化された反復タスクで、エージェントが一貫して実行できます。一部はgooseworks.ai/skillsで公開していますが、社内GitHubリポジトリにはさらに多くがあります。何よりも、すべてをスキル化することが大きな違いを生みます。オーケストレータースキル(関連サブスキルを調整する親スキル)も非常に有効です。 4. 自動化・ハートビート — いつ動かすか 明確に反復可能なタスク → スケジュール自動化(cron)。毎週月曜にメトリクスを取得、毎日曜にインスピレーションアカウントをスクレイピングなど。エージェントが動的に何をすべきか判断する必要がある場合 → ハートビート。インターバルで起動し、何が変化したかを確認し、何に対処するかを決定します。 5. ツールとアクセス エージェントはAPIをたたき、メッセージを送り、Webをスクレイピングするなど実際のことをする必要があります。MCPサーバーと直接API呼び出しをスキルでラップした組み合わせを使っています。 6. コミュニケーションチャネル — 双方向 エージェントにメッセージを送り、エージェントもあなたにメッセージを送れるチャネルが必要です。主にSlackを使っていますが、メールやテキストメッセージも使えます。Slackの利点はチームの可視性(エージェントが共有されている)。メールも重要で、多くのWebサービスはメール経由で動作します。 7. フィードバックループ 多くの人が省略しますが、非常に重要です。エージェントが作業を行い → レビューし → フィードバックをシステムに反映する(スキル更新、プロンプト調整、新ルール)。エージェントが自己省察し、学習・更新の提案をします。これが機能すれば、エージェントの有用性は理論上複利的に向上します。 8. ビュー層 2つの課題があります:ファイルシステム内のファイル閲覧(GooseworksAIは内蔵ファイルビューアで解決)とエージェントアクティビティダッシュボード。特定エージェント用のカスタムダッシュボードが必要で、現在解決策を模索中です。 9. 1エージェントか複数か? コンテキストとツールアクセスの混在を気にしないなら1エージェント、厳密な分離が必要なら複数エージェント。複数エージェントの課題はサイロ化したエージェント間の通信です。 10. システム思考でまとめる OpenClaw / Gooseworksはすべての要素を提供しますが、工学的な思考でシステムを適切に設計する必要があります。例えばLinkedInスクレイピングでリードを見つけようとすると、重複排除、タイムスタンプフィルター、CRM統合など実際はエンジニアリング問題の連続です。 気になること:多くの人が今同じことを取り組んでいると思うので、情報交換したいです。エージェントコワーカーに何をさせたいか?スキルとフィードバックループはどう構成しているか?コンテキスト層は?ハートビートベースのシステムを実現した人はいますか?
原文を表示 / Show original
What does it really take to create autonomous AI coworkers? 6 12 118 40K Ever since I discovered OpenClaw, I've been obsessed with building AI agent coworkers for my startup. I've tried tons of setups over the past few months using a mix of OpenClaw / Claude Code / Claude Agent SDK and our own product @GooseworksAI. The dream is to have a team of agents that do useful work autonomously around the clock without constant supervision. I wouldn't say we're in the dream state yet, but it's getting close and I've learned a lot about what it takes. I think the reason that I (and many others like me) are obsessed with this idea, is because it's so clear that the agents like Claude Code are already good enough to do all the actual work. The bottleneck is all just orchestration. Here's what I've learned about the pieces you need to orchestrate a coworker: 0. Environment When we're talking about AI coworkers, they need an environment to run free. They need their own computer. You can't run them locally otherwise, they can't work when your computer is off, and its too risky to give them full access to your personal machine. If you're using OpenClaw, this could be a Mac Mini or an EC2 instance. For @GooseworksAI, we use cloud sandboxes powered by @e2b. 1. Context Your agent is only as good as the context it has. The thing with context is – it requires a continuous stream, not a one-time dump. I do a few things: I do frequent 1:1s with my agent: https://x.com/shivsakhuja/status/2038711292073320734 My agent has a email address, a phone number and a Slack account, so I'm always texting it stuff - links I find interesting, notes, ideas, etc I have a cron job to sync meeting notes I have a personal CRM for my agent (updates automatically from calendar + email + meeting notes + linkedin) so it knows about people I'm connected to I have given it accounts so it can search through Notion + Linear I have signed up my agent to some newsletters (using @agentmail) I have an agent consume a content diet similar to what I consume (twitter mostly, but will probably add some podcasts soon via @ListenNotes) I have a heartbeat for the agent to read through recent notes and ask me questions about important context in Slack. It's pretty easy for me to respond when I'm prompted with a message. I haven't tried it yet, but from what I hear @garrytan's gbrain is a pretty good system to stream context too. 2. Persistence We’re using a filesystem. We dump / stream tons of context in there for each agent. This does very well for the most part. But there is a limit beyond which it's not sufficient. The agent has a memory folder where it stores important info automatically. The agent also has a tool to search through it’s chat history (past sessions) I’ve also found SQLite DB to be super helpful for structured query-able persistence. For example, we have an SEO agent, and that needs to keep track of past work, de-dupe, etc before we run a strategy, push to our CMS, email for backlinks, etc. So currently that’s using a SQLite DB in the filesystem for persistence. There’s probably a very good case to be made for adding a knowledge graph / dedicated memory layer though we haven’t tried this yet. Our agents also read / write from Linear to manage tasks, which is very helpful. 3. Skills — atomic units of execution A skill is a repeatable task packaged into a prompt + script so your agent can execute it consistently. We made some of these skills public at http://skills.gooseworks.ai/ but we have tons more in an internal github repo. If nothing else, just turning everything into skills makes a massive difference. I'm making skills for everything now. This makes my agent good at the things I'm good at. An orchestrator skill has also been pretty useful – this is a parent skill that knows how to orchestrate related sub-skills. 4. Automations and/or a heartbeat — when does it run? Still very much figuring this out, but my current mental framework is: If it's a clear repeatable task → scheduled automation (cron). Pull metrics every Monday, scrape inspiration accounts every Sunday, etc. If the agent needs to figure out what to do dynamically → heartbeat. The agent wakes up on an interval, checks what's changed, decides what to act on. I’m still tinkering with heartbeats but getting the agent to make good prioritization decisions when it wakes up is harder than it sounds. For many things, I’m just running the skills / orchestrator manually because I don’t trust the agent enough yet. 5. Tools and access The agent needs to actually do things — hit APIs, send messages, scrape the web, etc I use a mix of MCP servers and direct API calls wrapped in skills. Haven't found a "one system to rule them all" — it's whatever works for the task at hand. 6. Communication channels — two-way A channel where you can message the agent and it can message you. We use Slack primarily, but I can also email or text the agent. The key is two-way — not just notifications, but actual back-and-forth. We also integrated WhatsApp, iMessage and Telegram, but mostly I just use Slack. An advantage of Slack is team visibility (the agents are shared, not personal) Email is important too because a lot of the web works over email (notifications, newsletters, etc). 7. Feedback loop This is the piece that most people skip but seems super important to me. The agent does work → you review it → your feedback gets fed back into the system (updated skills, adjusted prompts, new rules). The agent self-reflects → proposes learnings / updates to its own system If you have this and it works, the agent’s usefulness / success should (theoretically) compound over time. Without it, the agent only improves when you do dev work. 8. View layer There are 2 challenges here. Viewing files in the filesystem. We solved this with our own product (has a baked in filesystem and file viewer), but others may solve this with Obsidian remote vaults or something. Agent activity dashboard: We have this in Gooseworks, but I found this to be pretty tough in OpenClaw. Custom dashboards: I need custom dashboards for specific agents. Like I need a custom dashboard for my Outbound sales agent. I don’t have a great solution here right now though we're experimenting with one. We use Slack alerts but that can get very chaotic if you’re relying on the agent to do a bigger scope of work. Not sure what the best option is. I suspect it's to give each agent a DB + Dashboard and allow it to customize. 9. One agent or multiple? I wrote a post about this: https://x.com/shivsakhuja/status/2035176670286786744 the TLDR is 1 agent if you don't mind mixing context + tool access. Multi-agent if you need strict separation. The challenge with multi-agent is how these agents communicate if they are silo-ed. I have not yet solved this, but I'd love to talk to anyone who has. 10. Tying it all together - the systems mindset In theory, OpenClaw / Gooseworks exposes all these pieces but you still need to have an engineering mindset and stitch them all together, and engineer the system in the right way. This is often the hardest part. For example, let's say I want my agent to find me leads by scraping LinkedIn posts for some keyword. Sounds like an easy problem to solve. I wire up my agent to an Apify actor and run an automation, right? But not really. Because the agent will just send me the same leads every day. No deduping happening. Now if I have the LLM dedupe, it's incredibly inefficient. So I need to make sure that my Apify scraper is ONLY checking the last 24 hours. But the Apify scraper doesn't have a way to filter by timestamp, so now what? Also, the same people tend to post about the same topics a lot. This can be solved by sending the leads to a CRM and making sure there's a way to track the outreach status there, but the point I'm making is that these are actually engineering problems and require a systems mindset to solve. The example above is probably easier than most real systems. What I'm curious about: I imagine a lot of people are figuring this stuff out right now, so I'd love to trade notes: What are you trying to get your agent coworker to do? What's the dream scenario 6 months out? How do you structure your skills and feedback loops? What's your context layer? just a filesystem or anything else? Has anyone gotten a good heartbeat-based system working? What does your agent check for when it "wakes up"? Feel free to comment and / or DM me. You can also find me on LinkedIn at https://linkedin.com/in/shivsakhuja

AIFCC — AI Fluent CxO Club

読み書きそろばん、AI。経営者が AI を自分で動かせるようになるコミュニティ。

自律AIコワーカーを作る10の要件:環境・文脈・スキル・フィードバックループ | AIFCC