Claude Vision API Playground
Upload an image or paste a URL, write an analysis prompt, and see a simulated Claude Vision response. Explore examples for charts, documents, screenshots, and photos. Generate ready-to-use Python, Node.js, and curl code.
By Michael Lip · May 28, 2026
A bar chart showing quarterly revenue across four product lines. Simulated response demonstrates Claude's chart data extraction capability.
- Extracts numeric data values from bar and line charts
- Reads axis labels, legends, and chart titles
- Returns data as structured JSON when requested
- Works best with high-contrast, legible chart text
Production-ready code for the current prompt and model. Switch between base64 upload and URL modes.
How Claude Vision Works
Claude Vision allows you to include images alongside text in your API requests. The model processes images natively, without a separate OCR step — it understands the visual content directly and responds to questions about it in the same conversational format as text-only requests. Images are included as content blocks in the messages array, with a type: "image" block followed by (or preceding) the text prompt.
Images can be provided in two ways. For locally generated or user-uploaded images, encode the raw bytes as base64 and include the media_type (e.g., image/jpeg) and the encoded string. For publicly accessible images, provide the URL directly using source.type: "url". The URL approach is simpler in code but requires the image to be publicly reachable by Anthropic's servers at request time. For sensitive images (user data, internal documents), always use base64 to avoid transmitting URLs to external services.
Image Format and Size Requirements
Claude Vision supports JPEG, PNG, GIF, and WebP. The maximum file size per image is 5MB. There is no hard pixel dimension limit, but images are resized internally to fit within a 1568x1568 bounding box before processing. Higher resolution images provide better OCR accuracy for small text but use more input tokens. For chart and document analysis, 1024x1024 or higher is recommended. For general scene description, smaller images work well.
Token billing for images is based on the processed dimensions: approximately (width * height) / 750 tokens, with a minimum of 1,600 tokens for a 1024x1024 image. A 512x512 image uses approximately 400 tokens. Include this in cost calculations for image-heavy pipelines. For high-volume document processing, batch multiple images in a single request (up to 20 images) to amortize per-request overhead. For batch document analysis at scale, the Batch API Guide explains how to process thousands of images at 50% cost.
Chart and Data Extraction
Claude Vision excels at extracting structured data from charts and visualizations. Provide a specific extraction prompt: "List all data points in this bar chart as JSON with keys for category and value." The model will identify axis labels, data series, legend entries, and approximate numeric values. Accuracy is high for charts with legible labels and clear color differentiation between series.
For line charts with many data points, ask Claude to describe the trend and identify key inflection points rather than requesting every individual value — the model will produce more accurate results when the task matches its visual reasoning capability. For pie charts, Claude returns approximate percentages that sum to 100. For scatter plots, Claude identifies clusters and outliers rather than reading individual coordinates, unless the axes are clearly labeled with numeric scales.
Document and OCR Use Cases
Claude Vision performs high-quality OCR on typed documents, forms, invoices, and receipts. It understands document structure: headings, paragraphs, tables, and form fields are recognized and can be extracted with layout context. For invoices, ask Claude to extract line items, amounts, dates, and vendor information as JSON. For forms, ask it to return all field labels and their filled values as a key-value object.
Handwritten text is supported with reasonable accuracy for clear, printed-style handwriting. Cursive handwriting has lower reliability. For multilingual documents, Claude Vision handles Latin-script languages well; results for non-Latin scripts vary by the proportion of that script in Claude's training data. For medical documents, legal forms, and financial records, always validate extracted data against the original image before using it in automated pipelines.
UI Screenshot Analysis
Claude Vision can describe and analyze user interface screenshots with high accuracy. This is useful for automated QA testing (describe what is visible on screen), accessibility analysis (identify missing alt text or contrast issues), documentation generation (describe UI components for user manuals), and bug reporting (describe the visual state when a bug occurs). Provide screenshots at 1x or 2x (Retina) resolution for best text legibility.
For automated screenshot analysis in CI/CD pipelines, combine Claude Vision with the tool use API: define a tool that returns structured QA results (pass/fail per test assertion) and force Claude to call it for each screenshot. This gives you machine-readable test output that integrates with existing test frameworks. For multi-step UI testing workflows, ClaudFlow provides a visual pipeline designer for chaining screenshot captures with Vision analysis steps.
Multi-Image Requests
A single Claude API request can include up to 20 images. Include multiple type: "image" blocks in the content array, separated by text blocks that reference each image. For example: "Image 1 shows the before state. Image 2 shows the after state. Describe what changed between them." Claude maintains context across all images in the request and can reason about relationships between them.
Multi-image requests are useful for: change detection (before/after comparisons), slide deck analysis (present multiple slides and ask for a summary), product comparison (show multiple product images side by side), and document series (multiple pages of a multi-page document). Token costs add up for large image sets, so calculate total image tokens plus text tokens before estimating request cost. For cost analysis across vision workloads, KickLLM offers per-request cost tracking with image token breakdowns.
Frequently Asked Questions
How do I send images to the Claude API?
Include a content block with type: "image", source.type: "base64", the media_type, and the base64-encoded image data. Alternatively, use source.type: "url" with a direct image URL. Both approaches are shown in the code snippets above.
What image formats does Claude Vision support?
Claude Vision supports JPEG, PNG, GIF, and WebP. Maximum 5MB per image. Up to 20 images per request. Images are resized internally to fit within a 1568x1568 bounding box for processing.
What can Claude Vision analyze in images?
Claude Vision can read and transcribe text (OCR), describe scenes and objects, extract data from charts and graphs, analyze documents and forms, describe UI screenshots, identify products, and answer specific questions about any visible content in the image.
How much does Claude Vision cost?
Images are billed as input tokens. A 1024x1024 image costs approximately 1,600 input tokens with Claude Sonnet. Smaller images use fewer tokens. Calculate cost as approximately (width * height) / 750 tokens for images within size limits.
Can Claude Vision extract data from charts?
Yes. Claude Vision extracts axis labels, data series, legend entries, and numeric values from bar charts, line charts, pie charts, and scatter plots. For best results, prompt Claude to return extracted data as JSON. Accuracy is highest for charts with legible text and clear color differentiation.