Getting Started
Introduction
LLM-native image optimization that mathematically snaps images to provider grid boundaries to cut vision token usage.
VisionSqueezer is a middleware that bridges the gap between human-centric images and LLM-native vision tokenomics. It pre-processes images to trigger the absolute minimum billable tiles across major AI providers — Claude, GPT-4o/GPT-5, and Gemini.
Vision models do not "see" pixels the way you do. Each provider tokenizes images through its own internal grid math, and a handful of stray pixels can push an image into an extra tile row that doubles its cost. VisionSqueezer simulates that math and snaps your images to the cheapest valid boundary.
Why it matters
- Padding is expensive. Claude bills by pixel area
(W × H / 750). Every pixel of solid border costs tokens. - Tile spill-over is silent. A 769px-wide image costs the same as a 1536px one on a 768px-tile grid.
- Format ≠ tokens. Token cost is dimensional. AVIF/WebP/JPEG only affect file size and upload latency — not API token count.
Key Features
- Provider-aware resizing — exact internal grid math for Claude (area-based), GPT-4o/GPT-5 (512px tiles), and Gemini (768px tiles).
- Smart crop — saliency-based crop (Sobel-lite gradient energy) via
--smart-crop, plus default corner-tolerance padding strip. - AVIF / WebP / JPEG output — AVIF is typically 20–50% smaller than WebP, ~3× smaller than JPEG at equal quality.
- Auto-quality —
--auto-quality 0.95binary-searches quality to hit a perceptual SSIM target. - Batch / recursive mode — squeeze a whole directory tree, mirror structure with
--output-dir. - Machine-readable output —
--jsonand--dry-runfor pipelines. - Think in Code (Sandbox) — execute atomic image ops locally before sending to the LLM.
- Persistent analytics — cumulative token & USD savings in a local SQLite database.
- Universal MCP — one-liner integration for Claude Code, Cursor, Zed, VS Code, Windsurf, JetBrains.
- Python bindings —
pip install vision-squeezervia pyo3 wheels for Linux/macOS/Windows.
