Guides
Sandbox (Think in Code)
Run atomic image operations locally to extract only the context the LLM needs.
The Think in Code paradigm lets your agent execute atomic image operations locally — before any pixels reach the LLM. Instead of sending a full 12MP screenshot to read one error log, crop to just that region and save up to 99.9% of the tokens.
Operations
The sandbox supports a chain of atomic ops:
| Operation | Purpose |
|---|---|
crop | Extract a sub-region |
grayscale | Drop color channels |
binarize | Threshold to black/white (great for OCR) |
resize | Scale dimensions |
contrast | Adjust contrast |
brightness | Adjust brightness |
CLI
Pass operations as JSON via --ops:
Terminal
vision-squeezer screenshot.png --ops '[
{"op": "crop", "x": 0, "y": 1200, "width": 1920, "height": 300},
{"op": "grayscale"},
{"op": "binarize", "threshold": 128}
]'
MCP
Via the MCP server, call sandbox_execute with an operations[] array. This is the recommended path when an agent only needs to see a specific part of a high-resolution image.
Best practices for agents
- Crop first, send second. If you only need one panel of a dashboard screenshot,
cropto it before sending. - Binarize for text. OCR-style reads are far cheaper and more reliable on binarized images.
- Report ROI. Use
get_savings_statsto surface cumulative token/USD savings to the user.
