docs: add look_at tool and multimodal-looker agent documentation

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
2025-12-13 15:26:44 +09:00
parent a3938e8c25
commit 96886f18ac
2 changed files with 14 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -215,6 +215,7 @@ I believe in the right tool for the job. For your wallet's sake, use CLIProxyAPI
 - **explore** (`opencode/grok-code`): Fast exploration and pattern matching. Claude Code uses Haiku; we use Grok. It is currently free, blazing fast, and intelligent enough for file traversal. Inspired by Claude Code.
 - **frontend-ui-ux-engineer** (`google/gemini-3-pro-preview`): A designer turned developer. Creates stunning UIs. Uses Gemini because its creativity and UI code generation are superior.
 - **document-writer** (`google/gemini-3-pro-preview`): A technical writing expert. Gemini is a wordsmith; it writes prose that flows naturally.
+- **multimodal-looker** (`google/gemini-2.5-flash`): Specialized agent for visual content interpretation. Analyzes PDFs, images, and diagrams to extract information.

 Each agent is automatically invoked by the main agent, but you can also explicitly request them:

@@ -269,6 +270,12 @@ The features you use in your editor—other agents cannot access them. Oh My Ope
  - The default `glob` lacks timeout. If ripgrep hangs, it waits indefinitely.
  - This tool enforces timeouts and kills the process on expiration.

+#### Built-in Multimodal Tools
+
+- **look_at**: Analyzes media files (PDFs, images, diagrams) that require visual interpretation using Gemini 2.5 Flash. Inspired by Sourcegraph Ampcode's `look_at` tool.
+  - Parameters: `file_path` (absolute path), `goal` (what to extract)
+  - Use cases: PDF text extraction, image description, diagram analysis
+
 #### Built-in MCPs

 - **websearch_exa**: Exa AI web search. Performs real-time web searches and can scrape content from specific URLs. Returns LLM-optimized context from relevant websites.