VisionSqueezer
Guides

Sandbox (Think in Code)

Run atomic image operations locally to extract only the context the LLM needs.

The Think in Code paradigm lets your agent execute atomic image operations locally — before any pixels reach the LLM. Instead of sending a full 12MP screenshot to read one error log, crop to just that region and save up to 99.9% of the tokens.

Operations

The sandbox supports a chain of atomic ops:

OperationPurpose
cropExtract a sub-region
grayscaleDrop color channels
binarizeThreshold to black/white (great for OCR)
resizeScale dimensions
contrastAdjust contrast
brightnessAdjust brightness

CLI

Pass operations as JSON via --ops:

Terminal
vision-squeezer screenshot.png --ops '[
  {"op": "crop", "x": 0, "y": 1200, "width": 1920, "height": 300},
  {"op": "grayscale"},
  {"op": "binarize", "threshold": 128}
]'

MCP

Via the MCP server, call sandbox_execute with an operations[] array. This is the recommended path when an agent only needs to see a specific part of a high-resolution image.

Best practices for agents

  • Crop first, send second. If you only need one panel of a dashboard screenshot, crop to it before sending.
  • Binarize for text. OCR-style reads are far cheaper and more reliable on binarized images.
  • Report ROI. Use get_savings_stats to surface cumulative token/USD savings to the user.