Guides
Crawler Integration
Automate token optimization for high-scale web scraping with Firecrawl, Crawl4AI, and Playwright.
Squeeze screenshots in your scraping pipeline before they hit the LLM. Integrate via post-processing or request interception.
Firecrawl / Crawl4AI
Post-process screenshots before sending them to the model. Use the CLI as a bridge from Node.js or Python.
bridge.js
// Execute CLI as a bridge
const { execSync } = require('child_process')
const output = execSync('vision-squeezer screenshot.png --output opt.jpg --model gpt4o')
console.log(output.toString())
bridge.py
import subprocess
subprocess.run([
"vision-squeezer", "screenshot.png",
"--output", "opt.jpg", "--model", "gpt4o"
], check=True)
Playwright interceptor
Intercept outgoing image requests during a crawl and optimize them on the fly.
interceptor.js
await page.route('**/*.{png,jpg}', async (route) => {
const response = await route.fetch()
const body = await response.body()
const optimized = await squeeze(body) // pipe through vision-squeezer
route.fulfill({ body: optimized })
})
At scale
For high-volume pipelines:
- Prefer
--json --dry-runto estimate token impact before committing writes. - Use
--format avifto minimize storage and transfer (token cost is unchanged). - Always pass an explicit
--modelso the tiling math matches your target provider.
