fix(look-at): use direct file passthrough instead of Read tool (#173)
- Embed files directly in message parts using file:// URL format - Remove dependency on Read tool for multimodal-looker agent - Add inferMimeType helper for proper MIME type detection - Disable read tool in agent tools config (no longer needed) - Upgrade multimodal-looker model to gemini-3-flash - Update all README docs to reflect gemini-3-flash change Fixes #126 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
This commit is contained in:
@@ -87,7 +87,7 @@ oh-my-opencode/
|
||||
| explore | opencode/grok-code | Fast codebase exploration, file patterns |
|
||||
| frontend-ui-ux-engineer | google/gemini-3-pro-preview | UI generation, design-focused |
|
||||
| document-writer | google/gemini-3-pro-preview | Technical documentation |
|
||||
| multimodal-looker | google/gemini-2.5-flash | PDF/image/diagram analysis |
|
||||
| multimodal-looker | google/gemini-3-flash | PDF/image/diagram analysis |
|
||||
|
||||
## COMMANDS
|
||||
|
||||
|
||||
@@ -317,12 +317,12 @@ opencode auth login
|
||||
"agents": {
|
||||
"frontend-ui-ux-engineer": { "model": "google/gemini-3-pro-high" },
|
||||
"document-writer": { "model": "google/gemini-3-flash" },
|
||||
"multimodal-looker": { "model": "google/gemini-2.5-flash" }
|
||||
"multimodal-looker": { "model": "google/gemini-3-flash" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**利用可能なモデル名**: `google/gemini-3-pro-high`, `google/gemini-3-pro-medium`, `google/gemini-3-pro-low`, `google/gemini-3-flash`, `google/gemini-2.5-flash`, `google/gemini-2.5-flash-lite`, `google/claude-sonnet-4-5`, `google/claude-sonnet-4-5-thinking`, `google/claude-opus-4-5-thinking`, `google/gpt-oss-120b-medium`
|
||||
**利用可能なモデル名**: `google/gemini-3-pro-high`, `google/gemini-3-pro-medium`, `google/gemini-3-pro-low`, `google/gemini-3-flash`, `google/gemini-3-flash`, `google/gemini-3-flash-lite`, `google/claude-sonnet-4-5`, `google/claude-sonnet-4-5-thinking`, `google/claude-opus-4-5-thinking`, `google/gpt-oss-120b-medium`
|
||||
|
||||
その後、認証を行います:
|
||||
|
||||
@@ -432,7 +432,7 @@ gh repo star code-yeongyu/oh-my-opencode
|
||||
- **explore** (`opencode/grok-code`): 高速なコードベース探索、ファイルパターンマッチング。Claude Code は Haiku を使用しますが、私たちは Grok を使います。現在無料であり、極めて高速で、ファイル探索タスクには十分な知能を備えているからです。Claude Code からインスピレーションを得ました。
|
||||
- **frontend-ui-ux-engineer** (`google/gemini-3-pro-preview`): 開発者に転身したデザイナーという設定です。素晴らしい UI を作ります。美しく独創的な UI コードを生成することに長けた Gemini を使用します。
|
||||
- **document-writer** (`google/gemini-3-pro-preview`): テクニカルライティングの専門家という設定です。Gemini は文筆家であり、流れるような文章を書きます。
|
||||
- **multimodal-looker** (`google/gemini-2.5-flash`): 視覚コンテンツ解釈のための専門エージェント。PDF、画像、図表を分析して情報を抽出します。
|
||||
- **multimodal-looker** (`google/gemini-3-flash`): 視覚コンテンツ解釈のための専門エージェント。PDF、画像、図表を分析して情報を抽出します。
|
||||
|
||||
メインエージェントはこれらを自動的に呼び出しますが、明示的に呼び出すことも可能です:
|
||||
|
||||
@@ -675,7 +675,7 @@ Oh My OpenCode は以下の場所からフックを読み込んで実行しま
|
||||
"agents": {
|
||||
"frontend-ui-ux-engineer": { "model": "google/gemini-3-pro-high" },
|
||||
"document-writer": { "model": "google/gemini-3-flash" },
|
||||
"multimodal-looker": { "model": "google/gemini-2.5-flash" }
|
||||
"multimodal-looker": { "model": "google/gemini-3-flash" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
@@ -314,12 +314,12 @@ opencode auth login
|
||||
"agents": {
|
||||
"frontend-ui-ux-engineer": { "model": "google/gemini-3-pro-high" },
|
||||
"document-writer": { "model": "google/gemini-3-flash" },
|
||||
"multimodal-looker": { "model": "google/gemini-2.5-flash" }
|
||||
"multimodal-looker": { "model": "google/gemini-3-flash" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**사용 가능한 모델 이름**: `google/gemini-3-pro-high`, `google/gemini-3-pro-medium`, `google/gemini-3-pro-low`, `google/gemini-3-flash`, `google/gemini-2.5-flash`, `google/gemini-2.5-flash-lite`, `google/claude-sonnet-4-5`, `google/claude-sonnet-4-5-thinking`, `google/claude-opus-4-5-thinking`, `google/gpt-oss-120b-medium`
|
||||
**사용 가능한 모델 이름**: `google/gemini-3-pro-high`, `google/gemini-3-pro-medium`, `google/gemini-3-pro-low`, `google/gemini-3-flash`, `google/gemini-3-flash`, `google/gemini-3-flash-lite`, `google/claude-sonnet-4-5`, `google/claude-sonnet-4-5-thinking`, `google/claude-opus-4-5-thinking`, `google/gpt-oss-120b-medium`
|
||||
|
||||
그 후 인증:
|
||||
|
||||
@@ -429,7 +429,7 @@ gh repo star code-yeongyu/oh-my-opencode
|
||||
- **explore** (`opencode/grok-code`): 빠른 코드베이스 탐색, 파일 패턴 매칭. Claude Code는 Haiku를 쓰지만, 우리는 Grok을 씁니다. 현재 무료이고, 극도로 빠르며, 파일 탐색 작업에 충분한 지능을 갖췄기 때문입니다. Claude Code 에서 영감을 받았습니다.
|
||||
- **frontend-ui-ux-engineer** (`google/gemini-3-pro-preview`): 개발자로 전향한 디자이너라는 설정을 갖고 있습니다. 멋진 UI를 만듭니다. 아름답고 창의적인 UI 코드를 생성하는 데 탁월한 Gemini를 사용합니다.
|
||||
- **document-writer** (`google/gemini-3-pro-preview`): 기술 문서 전문가라는 설정을 갖고 있습니다. Gemini 는 문학가입니다. 글을 기가막히게 씁니다.
|
||||
- **multimodal-looker** (`google/gemini-2.5-flash`): 시각적 콘텐츠 해석을 위한 전문 에이전트. PDF, 이미지, 다이어그램을 분석하여 정보를 추출합니다.
|
||||
- **multimodal-looker** (`google/gemini-3-flash`): 시각적 콘텐츠 해석을 위한 전문 에이전트. PDF, 이미지, 다이어그램을 분석하여 정보를 추출합니다.
|
||||
|
||||
각 에이전트는 메인 에이전트가 알아서 호출하지만, 명시적으로 요청할 수도 있습니다:
|
||||
|
||||
@@ -669,7 +669,7 @@ Schema 자동 완성이 지원됩니다:
|
||||
"agents": {
|
||||
"frontend-ui-ux-engineer": { "model": "google/gemini-3-pro-high" },
|
||||
"document-writer": { "model": "google/gemini-3-flash" },
|
||||
"multimodal-looker": { "model": "google/gemini-2.5-flash" }
|
||||
"multimodal-looker": { "model": "google/gemini-3-flash" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
@@ -346,12 +346,12 @@ The `opencode-antigravity-auth` plugin uses different model names than the built
|
||||
"agents": {
|
||||
"frontend-ui-ux-engineer": { "model": "google/gemini-3-pro-high" },
|
||||
"document-writer": { "model": "google/gemini-3-flash" },
|
||||
"multimodal-looker": { "model": "google/gemini-2.5-flash" }
|
||||
"multimodal-looker": { "model": "google/gemini-3-flash" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Available model names**: `google/gemini-3-pro-high`, `google/gemini-3-pro-medium`, `google/gemini-3-pro-low`, `google/gemini-3-flash`, `google/gemini-2.5-flash`, `google/gemini-2.5-flash-lite`, `google/claude-sonnet-4-5`, `google/claude-sonnet-4-5-thinking`, `google/claude-opus-4-5-thinking`, `google/gpt-oss-120b-medium`
|
||||
**Available model names**: `google/gemini-3-pro-high`, `google/gemini-3-pro-medium`, `google/gemini-3-pro-low`, `google/gemini-3-flash`, `google/gemini-3-flash`, `google/gemini-3-flash-lite`, `google/claude-sonnet-4-5`, `google/claude-sonnet-4-5-thinking`, `google/claude-opus-4-5-thinking`, `google/gpt-oss-120b-medium`
|
||||
|
||||
Then authenticate:
|
||||
|
||||
@@ -493,7 +493,7 @@ To remove oh-my-opencode:
|
||||
- **explore** (`opencode/grok-code`): Fast codebase exploration and pattern matching. Claude Code uses Haiku; we use Grok—it's free, blazing fast, and plenty smart for file traversal. Inspired by Claude Code.
|
||||
- **frontend-ui-ux-engineer** (`google/gemini-3-pro-preview`): A designer turned developer. Builds gorgeous UIs. Gemini excels at creative, beautiful UI code.
|
||||
- **document-writer** (`google/gemini-3-pro-preview`): Technical writing expert. Gemini is a wordsmith—writes prose that flows.
|
||||
- **multimodal-looker** (`google/gemini-2.5-flash`): Visual content specialist. Analyzes PDFs, images, diagrams to extract information.
|
||||
- **multimodal-looker** (`google/gemini-3-flash`): Visual content specialist. Analyzes PDFs, images, diagrams to extract information.
|
||||
|
||||
The main agent invokes these automatically, but you can call them explicitly:
|
||||
|
||||
@@ -733,7 +733,7 @@ When using `opencode-antigravity-auth`, disable the built-in auth and override a
|
||||
"agents": {
|
||||
"frontend-ui-ux-engineer": { "model": "google/gemini-3-pro-high" },
|
||||
"document-writer": { "model": "google/gemini-3-flash" },
|
||||
"multimodal-looker": { "model": "google/gemini-2.5-flash" }
|
||||
"multimodal-looker": { "model": "google/gemini-3-flash" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
@@ -325,12 +325,12 @@ opencode auth login
|
||||
"agents": {
|
||||
"frontend-ui-ux-engineer": { "model": "google/gemini-3-pro-high" },
|
||||
"document-writer": { "model": "google/gemini-3-flash" },
|
||||
"multimodal-looker": { "model": "google/gemini-2.5-flash" }
|
||||
"multimodal-looker": { "model": "google/gemini-3-flash" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**可用模型名**:`google/gemini-3-pro-high`, `google/gemini-3-pro-medium`, `google/gemini-3-pro-low`, `google/gemini-3-flash`, `google/gemini-2.5-flash`, `google/gemini-2.5-flash-lite`, `google/claude-sonnet-4-5`, `google/claude-sonnet-4-5-thinking`, `google/claude-opus-4-5-thinking`, `google/gpt-oss-120b-medium`
|
||||
**可用模型名**:`google/gemini-3-pro-high`, `google/gemini-3-pro-medium`, `google/gemini-3-pro-low`, `google/gemini-3-flash`, `google/gemini-3-flash`, `google/gemini-3-flash-lite`, `google/claude-sonnet-4-5`, `google/claude-sonnet-4-5-thinking`, `google/claude-opus-4-5-thinking`, `google/gpt-oss-120b-medium`
|
||||
|
||||
然后认证:
|
||||
|
||||
@@ -440,7 +440,7 @@ gh repo star code-yeongyu/oh-my-opencode
|
||||
- **explore** (`opencode/grok-code`):极速代码库扫描、模式匹配。Claude Code 用 Haiku,我们用 Grok——免费、飞快、扫文件够用了。致敬 Claude Code。
|
||||
- **frontend-ui-ux-engineer** (`google/gemini-3-pro-preview`):设计师出身的程序员。UI 做得那是真漂亮。Gemini 写这种创意美观的代码是一绝。
|
||||
- **document-writer** (`google/gemini-3-pro-preview`):技术写作专家。Gemini 文笔好,写出来的东西读着顺畅。
|
||||
- **multimodal-looker** (`google/gemini-2.5-flash`):视觉内容专家。PDF、图片、图表,看一眼就知道里头有啥。
|
||||
- **multimodal-looker** (`google/gemini-3-flash`):视觉内容专家。PDF、图片、图表,看一眼就知道里头有啥。
|
||||
|
||||
主 Agent 会自动调遣它们,你也可以亲自点名:
|
||||
|
||||
@@ -675,7 +675,7 @@ Agent 爽了,你自然也爽。但我还想直接让你爽。
|
||||
"agents": {
|
||||
"frontend-ui-ux-engineer": { "model": "google/gemini-3-pro-high" },
|
||||
"document-writer": { "model": "google/gemini-3-flash" },
|
||||
"multimodal-looker": { "model": "google/gemini-2.5-flash" }
|
||||
"multimodal-looker": { "model": "google/gemini-3-flash" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
@@ -4,7 +4,7 @@ export const multimodalLookerAgent: AgentConfig = {
|
||||
description:
|
||||
"Analyze media files (PDFs, images, diagrams) that require interpretation beyond raw text. Extracts specific information or summaries from documents, describes visual content. Use when you need analyzed/extracted data rather than literal file contents.",
|
||||
mode: "subagent",
|
||||
model: "google/gemini-2.5-flash",
|
||||
model: "google/gemini-3-flash",
|
||||
temperature: 0.1,
|
||||
tools: { write: false, edit: false, bash: false, background_task: false },
|
||||
prompt: `You interpret media files that cannot be read as plain text.
|
||||
|
||||
@@ -1,8 +1,33 @@
|
||||
import { extname, basename } from "node:path"
|
||||
import { tool, type PluginInput } from "@opencode-ai/plugin"
|
||||
import { LOOK_AT_DESCRIPTION, MULTIMODAL_LOOKER_AGENT } from "./constants"
|
||||
import type { LookAtArgs } from "./types"
|
||||
import { log } from "../../shared/logger"
|
||||
|
||||
function inferMimeType(filePath: string): string {
|
||||
const ext = extname(filePath).toLowerCase()
|
||||
const mimeTypes: Record<string, string> = {
|
||||
".jpg": "image/jpeg",
|
||||
".jpeg": "image/jpeg",
|
||||
".png": "image/png",
|
||||
".gif": "image/gif",
|
||||
".webp": "image/webp",
|
||||
".svg": "image/svg+xml",
|
||||
".bmp": "image/bmp",
|
||||
".ico": "image/x-icon",
|
||||
".pdf": "application/pdf",
|
||||
".txt": "text/plain",
|
||||
".md": "text/markdown",
|
||||
".json": "application/json",
|
||||
".xml": "application/xml",
|
||||
".html": "text/html",
|
||||
".css": "text/css",
|
||||
".js": "text/javascript",
|
||||
".ts": "text/typescript",
|
||||
}
|
||||
return mimeTypes[ext] || "application/octet-stream"
|
||||
}
|
||||
|
||||
export function createLookAt(ctx: PluginInput) {
|
||||
return tool({
|
||||
description: LOOK_AT_DESCRIPTION,
|
||||
@@ -13,12 +38,14 @@ export function createLookAt(ctx: PluginInput) {
|
||||
async execute(args: LookAtArgs, toolContext) {
|
||||
log(`[look_at] Analyzing file: ${args.file_path}, goal: ${args.goal}`)
|
||||
|
||||
const mimeType = inferMimeType(args.file_path)
|
||||
const filename = basename(args.file_path)
|
||||
|
||||
const prompt = `Analyze this file and extract the requested information.
|
||||
|
||||
File path: ${args.file_path}
|
||||
Goal: ${args.goal}
|
||||
|
||||
Read the file using the Read tool, then provide ONLY the extracted information that matches the goal.
|
||||
Provide ONLY the extracted information that matches the goal.
|
||||
Be thorough on what was requested, concise on everything else.
|
||||
If the requested information is not found, clearly state what is missing.`
|
||||
|
||||
@@ -38,7 +65,7 @@ If the requested information is not found, clearly state what is missing.`
|
||||
const sessionID = createResult.data.id
|
||||
log(`[look_at] Created session: ${sessionID}`)
|
||||
|
||||
log(`[look_at] Sending prompt to session ${sessionID}`)
|
||||
log(`[look_at] Sending prompt with file passthrough to session ${sessionID}`)
|
||||
await ctx.client.session.prompt({
|
||||
path: { id: sessionID },
|
||||
body: {
|
||||
@@ -47,8 +74,12 @@ If the requested information is not found, clearly state what is missing.`
|
||||
task: false,
|
||||
call_omo_agent: false,
|
||||
look_at: false,
|
||||
read: false,
|
||||
},
|
||||
parts: [{ type: "text", text: prompt }],
|
||||
parts: [
|
||||
{ type: "text", text: prompt },
|
||||
{ type: "file", mime: mimeType, url: `file://${args.file_path}`, filename },
|
||||
],
|
||||
},
|
||||
})
|
||||
|
||||
|
||||
Reference in New Issue
Block a user