VisionSqueezer
Providers

DeepSeek-VL (Open Weights)

How DeepSeek-VL2 tiles images at 384px and why the win is local-inference context, not API billing.
Open-weights, not a billed API. At time of writing DeepSeek's public API (deepseek-chat / deepseek-reasoner) is text-only — DeepSeek-VL2 ships as open weights. So optimizing here saves your local-inference context budget and latency, not API dollars. Verify if/when a vision endpoint ships.

How DeepSeek-VL2 processes images

DeepSeek-VL2 uses a 384×384 global view plus dynamic local tiles on an anyres canvas of (m·384, n·384) with m·n ≤ 9. The encoder is SigLIP-SO400M-384 (14px patch) with a 2× downsample, giving a per-view grid side of h = ⌈(384/14)/2⌉ = 14.

The exact token count (from tokenize_with_images):

h          = 14
global view = h·(h+1) = 210          # +1 per row = line separator
separator   = 1
local tiles = (nh·h)·(nw·h + 1)      # nw·nh ≤ 9
tokens      = 210 + 1 + local

The 384px boundary

Imagenw × nhTokens
≤ 384×3841×1211 + 210 = 421
768×7682×2211 + 28·29 = 1,023
1152×11523×3211 + 42·43 = 2,017

Snapping each side down to the 384px grid keeps nw·nh (and the token bill) minimal.

Optimization strategy

≤384px stays a single tile; otherwise snap each side down to the 384px grid.

Terminal
vision-squeezer image.png --model deepseek

CLI aliases: deepseek, deepseek-vl. MCP target_model: "deepseek".

Token savings

Crossing a 384px boundary adds a whole tile row or column — snapping back undoes it.

ScenarioBeforeAfterSaved
800×768 → snap to 768×7683×2 tiles · 1,415 tok2×2 tiles · 1,023 tok−28%
385×384 (1px over a tile) → 384×384617 tok421 tok−32%

A single pixel past a 384px edge can cost ~30% more. Snapping each side down to the grid keeps nw·nh minimal.

Source

Formula taken verbatim from the DeepSeek-VL2 technical report, §2 Model Architecture (arXiv:2412.10302, submitted 13 Dec 2024) and the reference implementation processing_deepseek_vl_v2.py. Grid constants (patch_size 14, siglip_so400m_patch14_384, candidate_resolutions up to 1152×1152) cross-checked against the model config.json. Verified 2026-06-11.