Guides

Sandbox (Think in Code)

Run atomic image operations locally to extract only the context the LLM needs.

The Think in Code paradigm lets your agent execute atomic image operations locally — before any pixels reach the LLM. Instead of sending a full 12MP screenshot to read one error log, crop to just that region and save up to 99.9% of the tokens.

Operations

The sandbox supports a chain of atomic ops:

Operation	Purpose
`crop`	Extract a sub-region
`grayscale`	Drop color channels
`binarize`	Threshold to black/white (great for OCR)
`resize`	Scale dimensions
`contrast`	Adjust contrast
`brightness`	Adjust brightness

CLI

Pass operations as JSON via --ops:

Terminal

vision-squeezer screenshot.png --ops '[
  {"op": "crop", "x": 0, "y": 1200, "width": 1920, "height": 300},
  {"op": "grayscale"},
  {"op": "binarize", "threshold": 128}
]'

MCP

Via the MCP server, call sandbox_execute with an operations[] array. This is the recommended path when an agent only needs to see a specific part of a high-resolution image.

Best practices for agents

Crop first, send second. If you only need one panel of a dashboard screenshot, crop to it before sending.
Binarize for text. OCR-style reads are far cheaper and more reliable on binarized images.
Report ROI. Use get_savings_stats to surface cumulative token/USD savings to the user.

Python Bindings

Use VisionSqueezer from Python via native pyo3 wheels.

Crawler Integration

Automate token optimization for high-scale web scraping with Firecrawl, Crawl4AI, and Playwright.