VisionSqueezer
Providers

GPT-4o & GPT-5 (Tiling)

How OpenAI tiles images and how VisionSqueezer avoids spill-over tiles.

GPT-4o / GPT-4.5 (Tiling)

OpenAI processes images in three steps:

  1. Fit the image within 2048×2048.
  2. Rescale the short side to 768px.
  3. Chop the result into 512px tiles — each tile is billed.

The trap is spill-over: a few extra pixels on the long side can push the image into an entire new tile row, doubling the cost for almost no extra information.

Optimization strategy

Reverse-calculate the 768px scaling, then snap the long side to a 512px boundary after the internal scale factor is applied. This lands the image exactly on a tile boundary with no spill-over.

Terminal
vision-squeezer image.png --model gpt4o

Example

A 4096×3072 photo (12MP) targeted at GPT-4o snaps perfectly into a contained grid:

MetricBeforeAfter
Tokens16,77711,182
Savings5,595 tokens (-33.3%)
File size2.2 MB1.2 MB
OpenAI Aspect-Ratio Anomaly: stripping padding can make an image "wider", which ironically pushes the long side into a new grid row. Always pass --model gpt4o so Squeezer accounts for the tiling math instead of optimizing agnostically.

GPT-5 / GPT-5.5 (Tiling, Capped)

GPT-5 raises the limits dramatically:

  • 6000px max dimension
  • 10.24M total pixel cap
  • 512×512 tiles
  • 1536 token hard cap

Because tokens above the cap are wasted, a 1536-token image and a 5000-token image are billed identically. Grid-tiling optimization is rarely needed.

Optimization strategy

Snap to 512px boundaries where it helps, but the real win for GPT-5 is stripping heavy padding and compressing file size (MBs → KBs) for faster uploads and lower latency — not token reduction.

Terminal
vision-squeezer image.png --model gpt5