Claude Vision API Tester

Upload images and generate production-ready code for Claude's vision capabilities. Preview base64 encoding, estimate token costs, and get code in Python, JavaScript, and curl with automatic media type detection.

By Michael Lip · May 25, 2026

Image Input

🖼

Drop images here or click to upload

JPEG, PNG, GIF, WebP — max 20MB each

Prompt

Model:

Max tokens:

Generated Code

Upload an image to generate code...

Sample API Response

{ "id": "msg_...", "type": "message", "role": "assistant", "content": [ { "type": "text", "text": "Upload an image to see a simulated response..." } ], "model": "claude-sonnet-4-20250514", "stop_reason": "end_turn", "usage": { "input_tokens": 0, "output_tokens": 0 } }

What This Tool Does

The Claude Vision API Tester lets you upload images and instantly generate the API request code needed to send those images to Claude for analysis. Instead of manually base64-encoding files, calculating media types, and constructing the content array, this tool handles everything in the browser. Drop an image, write your prompt, and get production-ready code in Python, JavaScript, or curl that you can copy directly into your application.

The tester runs entirely client-side. Your images never leave your browser. The base64 encoding happens in JavaScript using the FileReader API, and the generated code includes the encoded data ready to send to the Anthropic API. The token estimator calculates approximate image token usage based on the image dimensions, helping you understand the cost impact before making actual API calls.

Understanding Claude's Vision System

Claude's vision capabilities allow it to understand and reason about image content with remarkable accuracy. The model can describe scenes, read text in images (OCR), analyze charts and graphs, compare multiple images, identify objects and people in context, debug UI screenshots, and extract structured data from documents. Vision works with all Claude models — Opus, Sonnet, and Haiku — with varying levels of detail and accuracy. Sonnet provides the best balance of quality and cost for most vision tasks.

Images are processed as tokens, similar to text. The token count depends on the image dimensions. Claude resizes large images to fit within a 1568-pixel maximum on the longest side, then divides the image into tiles of approximately 384x384 pixels. Each tile costs about 170 tokens. A small thumbnail might use only 170 tokens, while a large screenshot could use 1,600 or more tokens. The tester displays these estimates automatically when you upload an image, so you can assess the cost before writing any code.

Sending Images via Base64

Base64 encoding converts binary image data into a text string that can be embedded directly in JSON requests. The encoding increases the data size by approximately 33%, so a 1MB image becomes about 1.33MB of base64 text. Despite this overhead, base64 is the most reliable method for sending images because it does not depend on external URL availability. The image data travels with the request and is guaranteed to be received by the API exactly as you encoded it.

To use base64 in the API, include an image content block with source.type set to "base64", the media_type matching the image format (e.g., "image/png"), and the data field containing the base64 string without any data URI prefix. The tester generates this structure automatically and detects the correct media type from the uploaded file. For comparing how different providers handle image APIs, LockML provides cross-provider comparison tools.

Multi-Image Requests

Claude can process multiple images in a single request by including multiple image blocks in the content array. This is powerful for comparison tasks ("which design looks better?"), document processing (multiple pages of a PDF as images), visual workflows (before/after screenshots), and any task that requires reasoning across multiple images. Each image adds its own token cost, so the total cost is the sum of all image tokens plus the text prompt tokens.

When sending multiple images, order matters. Place images before your text prompt so Claude has the visual context before reading your instructions. You can also interleave images and text to reference specific images: "Image 1 shows the homepage. Image 2 shows the checkout. Compare the design consistency." The tester supports uploading multiple images and generates the correct multi-image content array in the output code.

Token Cost Estimation

Understanding image token costs is essential for budgeting API usage. The tester estimates tokens using the same algorithm Claude applies: images up to 384x384 pixels use a single tile of approximately 170 tokens. Larger images are scaled to fit within 1568 pixels on the longest side, then divided into tiles. The formula is approximately ceil(width/384) * ceil(height/384) * 170 after resizing. This estimate is close to the actual token count reported by the API.

The cost estimate uses current Anthropic pricing for the selected model. Sonnet processes image tokens at the same rate as input text tokens ($3 per million input tokens as of the current pricing). For high-volume vision applications, consider using Haiku for initial classification and only sending complex images to Sonnet or Opus. Prompt caching does not apply to image tokens, so repeated analysis of the same image costs the same each time. For tracking vision API costs at scale, KickLLM provides cost monitoring dashboards.

Best Practices for Vision Requests

Write clear, specific prompts that tell Claude exactly what to look for in the image. "Describe this image" works but produces generic descriptions. "Extract all text visible in this screenshot and return it as a JSON object with field labels as keys" produces structured, actionable output. For OCR tasks, mention that you want exact text extraction. For analysis tasks, specify the format you want (bullet points, JSON, table). For comparison tasks, list the specific dimensions you want compared.

Optimize image size for your use case. If you need Claude to read small text in a screenshot, send the full resolution. If you just need object identification or scene description, resize to 768x768 or smaller to save tokens. For documents, crop to the relevant section rather than sending the entire page. Remove unnecessary whitespace and borders. These optimizations can reduce token costs by 50-80% without affecting output quality for most tasks.

Test with representative images before deploying to production. Vision performance varies with image quality, lighting, text size, and complexity. Screenshots of code or text are processed very accurately. Photographs in low light or with heavy compression may produce less reliable results. Charts and graphs are well understood, but highly stylized infographics may need more detailed prompts. Use the generated code from this tester as your starting point, then iterate on the prompt based on real results from the API. For building complete API request flows including vision, use the ClaudKit API Request Builder.

Frequently Asked Questions

How does Claude's vision API process images?

Claude processes images by converting them into tokens. Images are sent in the content array as base64-encoded data or URLs. Claude analyzes the image content and can describe, extract text, compare images, and reason about visual information. Image tokens are counted separately from text tokens and vary based on image dimensions. Larger images use more tokens but provide more detail for analysis.

What image formats does the Claude API support?

The Claude API supports JPEG, PNG, GIF (including animated), and WebP image formats. The maximum file size is 20MB per image. For best results, use JPEG for photographs and PNG for screenshots or images with text. The media_type field in the API request must match the actual format: image/jpeg, image/png, image/gif, or image/webp.

How many tokens does an image use in the Claude API?

Image token usage depends on the dimensions. A small image (up to 384x384) uses approximately 170 tokens. Larger images are scaled and tiled, with each tile using approximately 170 tokens. A typical 1024x1024 image uses about 1,600 tokens. Very large images are automatically resized to fit within the maximum dimension of 1568 pixels on the longest side. Use smaller images when detail is not critical to reduce costs.

Can Claude analyze multiple images in a single request?

Yes, you can include multiple image blocks in the content array of a single message. Claude can compare, contrast, and reason across all images. Place image blocks before or between text blocks to provide context. There is no hard limit on the number of images, but the total token count of all images plus text must fit within the model's context window.

Should I use base64 or URL for sending images to Claude?

Use base64 when you have local files, need guaranteed availability, or are processing sensitive images that should not be hosted publicly. Use URLs when images are already hosted publicly and you want to avoid the base64 encoding overhead. Base64 increases the request payload size by approximately 33% compared to the original file. URLs require that the image be accessible at request time.

Michael Lip

Developer and creator of the Zovo Tools network. Building free, privacy-first developer tools that run entirely in the browser. No tracking, no sign-ups, no server-side processing. Open source on GitHub.

Claude Vision API Tester

Image Input

Generated Code

Sample API Response

What This Tool Does

Understanding Claude's Vision System

Sending Images via Base64

Multi-Image Requests

Token Cost Estimation

Best Practices for Vision Requests

Frequently Asked Questions

Related Tools

API Request Builder

Tool/Function Builder

Code Snippets Library

API Error Guide

Krzen

KickLLM