Providers

GPT-4o & GPT-5 (Tiling)

How OpenAI tiles images and how VisionSqueezer avoids spill-over tiles.

GPT-4o / GPT-4.5 (Tiling)

OpenAI processes images in three steps:

Fit the image within 2048×2048.
Rescale the short side to 768px.
Chop the result into 512px tiles — each tile is billed.

The trap is spill-over: a few extra pixels on the long side can push the image into an entire new tile row, doubling the cost for almost no extra information.

Optimization strategy

Reverse-calculate the 768px scaling, then snap the long side to a 512px boundary after the internal scale factor is applied. This lands the image exactly on a tile boundary with no spill-over.

Terminal

vision-squeezer image.png --model gpt4o

Example

A 4096×3072 photo (12MP) targeted at GPT-4o snaps perfectly into a contained grid:

Metric	Before	After
Tokens	16,777	11,182
Savings	—	5,595 tokens (-33.3%)
File size	2.2 MB	1.2 MB

OpenAI Aspect-Ratio Anomaly: stripping padding can make an image "wider", which ironically pushes the long side into a new grid row. Always pass --model gpt4o so Squeezer accounts for the tiling math instead of optimizing agnostically.

GPT-5 / GPT-5.5 (Tiling, Capped)

GPT-5 raises the limits dramatically:

6000px max dimension
10.24M total pixel cap
512×512 tiles
1536 token hard cap

Because tokens above the cap are wasted, a 1536-token image and a 5000-token image are billed identically. Grid-tiling optimization is rarely needed.

Optimization strategy

Snap to 512px boundaries where it helps, but the real win for GPT-5 is stripping heavy padding and compressing file size (MBs → KBs) for faster uploads and lower latency — not token reduction.

Terminal

vision-squeezer image.png --model gpt5

Claude (Area-Based)

How Anthropic Claude bills image tokens and how VisionSqueezer minimizes area cost.

Gemini (Large Tiles)

How Google Gemini tiles images and why snapping to 768px boundaries halves the cost.