docs: add look_at tool and multimodal-looker agent documentation

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
This commit is contained in:
YeonGyu-Kim
2025-12-13 15:26:44 +09:00
parent a3938e8c25
commit 96886f18ac
2 changed files with 14 additions and 0 deletions

View File

@@ -215,6 +215,7 @@ I believe in the right tool for the job. For your wallet's sake, use CLIProxyAPI
- **explore** (`opencode/grok-code`): Fast exploration and pattern matching. Claude Code uses Haiku; we use Grok. It is currently free, blazing fast, and intelligent enough for file traversal. Inspired by Claude Code.
- **frontend-ui-ux-engineer** (`google/gemini-3-pro-preview`): A designer turned developer. Creates stunning UIs. Uses Gemini because its creativity and UI code generation are superior.
- **document-writer** (`google/gemini-3-pro-preview`): A technical writing expert. Gemini is a wordsmith; it writes prose that flows naturally.
- **multimodal-looker** (`google/gemini-2.5-flash`): Specialized agent for visual content interpretation. Analyzes PDFs, images, and diagrams to extract information.
Each agent is automatically invoked by the main agent, but you can also explicitly request them:
@@ -269,6 +270,12 @@ The features you use in your editor—other agents cannot access them. Oh My Ope
- The default `glob` lacks timeout. If ripgrep hangs, it waits indefinitely.
- This tool enforces timeouts and kills the process on expiration.
#### Built-in Multimodal Tools
- **look_at**: Analyzes media files (PDFs, images, diagrams) that require visual interpretation using Gemini 2.5 Flash. Inspired by Sourcegraph Ampcode's `look_at` tool.
- Parameters: `file_path` (absolute path), `goal` (what to extract)
- Use cases: PDF text extraction, image description, diagram analysis
#### Built-in MCPs
- **websearch_exa**: Exa AI web search. Performs real-time web searches and can scrape content from specific URLs. Returns LLM-optimized context from relevant websites.