Claude API Quickstart Guide — Python, JavaScript & cURL Examples

Go from zero to your first API call in under 10 minutes. This guide covers everything you need to start building with the Anthropic Claude API: authentication, model selection, request formatting, streaming, tool use, vision, error handling, and production best practices.

By Michael Lip · May 16, 2026 · 15 min read

1. Getting Your API Key

Before you can make any Claude API call, you need an API key from Anthropic. The key authenticates your requests and tracks your usage against your account's billing and rate limits. Here is the step-by-step process to get one.

1 Create an Anthropic account. Go to console.anthropic.com and sign up with your email address or Google account. Verify your email if prompted.

2 Navigate to API Keys. Once logged in, click Settings in the left sidebar, then select API Keys. This page lists all your active keys and lets you create new ones.

3 Create a new key. Click the Create Key button. Give your key a descriptive name (e.g., "quickstart-dev" or "production-backend"). Click Create.

4 Copy and store your key securely. The key will be displayed exactly once. Copy it immediately. It starts with sk-ant-. Store it in a password manager or secrets vault — never commit it to version control.

5 Set it as an environment variable. This keeps your key out of source code. Add this to your shell profile (~/.bashrc, ~/.zshrc, or equivalent):

export ANTHROPIC_API_KEY="sk-ant-api03-your-key-here"

Reload your shell with source ~/.zshrc (or ~/.bashrc) for the change to take effect. On Windows, use setx ANTHROPIC_API_KEY "sk-ant-api03-your-key-here" in a Command Prompt or set it through System Properties.

Tip: New Anthropic accounts receive free credits to experiment with the API. Check your usage and billing tier at console.anthropic.com under the Plans & Billing section.

2. Your First API Call

The Claude API uses the Messages API endpoint. Every request requires three things: a model identifier, a max_tokens limit for the response, and a messages array containing the conversation. Let us walk through your first call in three languages.

Install the SDK

Python (requires Python 3.8+):

pip install anthropic

JavaScript/TypeScript (requires Node.js 18+):

npm install @anthropic-ai/sdk

For cURL, no installation is needed — it comes pre-installed on macOS and most Linux distributions.

Make the request

Python example:

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

message = client.messages.create(
    model="claude-sonnet-4-6-20250516",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the Claude API and how does it work?"}
    ]
)

print(message.content[0].text)

JavaScript/TypeScript example:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();  // reads ANTHROPIC_API_KEY from env

const message = await client.messages.create({
    model: "claude-sonnet-4-6-20250516",
    max_tokens: 1024,
    messages: [
        { role: "user", content: "What is the Claude API and how does it work?" }
    ],
});

console.log(message.content[0].text);

cURL example:

curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-6-20250516",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "What is the Claude API and how does it work?"}
    ]
  }'

Understanding the response

A successful API call returns a JSON object with this structure:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "The Claude API is a RESTful interface..."
    }
  ],
  "model": "claude-sonnet-4-6-20250516",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 15,
    "output_tokens": 247
  }
}

Key fields in the response:

content — An array of content blocks. For text responses, each block has type: "text" and the actual response in text.
stop_reason — Why the model stopped generating. "end_turn" means the model finished naturally. "max_tokens" means it hit your token limit. "stop_sequence" means it encountered a custom stop sequence.
usage — Token counts for billing. input_tokens is what you sent; output_tokens is what Claude generated.
model — The exact model version that processed your request.

Tip: Try the ClaudKit Playground to experiment with different parameters and see responses in real time before writing any code.

3. Interactive API Request Builder

Configure your API request below and get formatted code in cURL, Python, JavaScript, and Go. Adjust the model, parameters, system prompt, and user message to see the request update in real time.

Configure Request

Model

Max Tokens

Temperature

1.0

System Prompt (optional)

User Message

Est. Input Tokens

~18

Max Output Tokens

1,024

Est. Max Cost

$0.0154

Generated Code

4. Model Selection Guide — Opus vs Sonnet vs Haiku

Anthropic offers three model tiers. Choosing the right one depends on your task complexity, latency requirements, and budget. Here is a detailed comparison to help you decide.

Claude Opus 4.6

$15 / $75 per 1M tokens (in/out)

Most capable model
Best for complex reasoning
Research & analysis
Multi-step agentic tasks
200K context window

Claude Sonnet 4.6

$3 / $15 per 1M tokens (in/out)

Best balance of speed & quality
Production workhorse
Code generation
Content creation
200K context window

Claude Haiku 4.5

$0.80 / $4 per 1M tokens (in/out)

Fastest response time
Lowest cost per token
Classification & extraction
Real-time chat
200K context window

When to use each model

Choose Opus when you need maximum intelligence and accuracy matters more than speed or cost. Ideal use cases: legal document analysis, scientific reasoning, complex code architecture decisions, writing that requires nuance, and agentic workflows where the model needs to plan multiple steps autonomously. If a wrong answer costs more than the API call, use Opus.

Choose Sonnet for the vast majority of production workloads. It delivers excellent quality at a fraction of Opus pricing and responds significantly faster. Sonnet is the default choice for: chat applications, code generation and review, content writing, summarization, data analysis, and customer support automation. Most developers start and stay with Sonnet.

Choose Haiku when speed and cost are your primary constraints. Haiku excels at high-volume, lower-complexity tasks: text classification, entity extraction, sentiment analysis, simple Q&A, content moderation, and routing decisions. At roughly 4x cheaper than Sonnet, Haiku can process millions of requests economically. It is also the best choice for real-time applications where low latency is critical.

Pro tip: Many production systems use a model cascade — route simple queries to Haiku for speed, medium queries to Sonnet, and only escalate complex queries to Opus. This optimizes both cost and user experience.

5. Advanced Features

Streaming responses

Streaming delivers response tokens as they are generated rather than waiting for the complete response. This dramatically reduces perceived latency — your users see text appearing in real time instead of waiting seconds for a full response to arrive.

Python streaming:

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6-20250516",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about APIs."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

JavaScript streaming:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const stream = await client.messages.stream({
    model: "claude-sonnet-4-6-20250516",
    max_tokens: 1024,
    messages: [{ role: "user", content: "Write a haiku about APIs." }],
});

for await (const event of stream) {
    if (event.type === "content_block_delta" &&
        event.delta.type === "text_delta") {
        process.stdout.write(event.delta.text);
    }
}

With cURL, add "stream": true to the request body. The response will be Server-Sent Events (SSE) with each event containing a delta of the response content.

System prompts

System prompts set the behavior, tone, and constraints for Claude's responses. They are provided as a top-level system parameter, not inside the messages array. System prompts are powerful for controlling output format, enforcing guardrails, and establishing persona.

message = client.messages.create(
    model="claude-sonnet-4-6-20250516",
    max_tokens=1024,
    system="You are a senior Python developer. Respond only with code. No explanations unless asked.",
    messages=[
        {"role": "user", "content": "Write a function to retry HTTP requests with exponential backoff."}
    ]
)

System prompts are cached separately from messages, so reusing the same system prompt across multiple requests can benefit from Anthropic's prompt caching and reduce input token costs.

Tool use (function calling)

Tool use lets Claude call external functions you define. You describe tools with JSON schemas, Claude decides when to call them, and you execute the function and return results. This enables Claude to interact with databases, APIs, file systems, and any external service.

message = client.messages.create(
    model="claude-sonnet-4-6-20250516",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and state, e.g., 'San Francisco, CA'"
                    }
                },
                "required": ["location"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "What's the weather like in London?"}
    ]
)

When Claude wants to use a tool, the response will contain a tool_use content block with the tool name and input arguments. You then execute the function, and send the result back in a tool_result message to continue the conversation.

Vision (image understanding)

Claude can analyze images passed as base64-encoded data or URLs. This enables document analysis, chart reading, screenshot understanding, UI review, and visual question answering. Supported formats: JPEG, PNG, GIF, and WebP.

import base64

with open("chart.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-6-20250516",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe the trends shown in this chart."
                }
            ]
        }
    ]
)

Images are tokenized based on their dimensions. A typical screenshot uses around 1,000-2,000 tokens. You can send multiple images in a single request by including multiple image content blocks in the messages array.

6. Error Handling

The Claude API returns standard HTTP status codes with detailed error messages. Proper error handling is essential for building reliable applications. Here are the most common errors and how to fix them.

Common error codes

400 — Bad Request: Your request body is malformed. Common causes: messages array does not start with a user role, max_tokens exceeds the model limit, invalid model name, or empty content blocks. Read the error message carefully — Anthropic provides field-level validation details.
401 — Unauthorized: Invalid or missing API key. Verify your key starts with sk-ant-, has not been revoked, and is passed in the x-api-key header (not Authorization: Bearer).
403 — Forbidden: Your API key does not have permission for the requested resource or model. Check your account's access level in the Anthropic Console.
429 — Rate Limited: You have exceeded your requests-per-minute or tokens-per-minute limit. Implement exponential backoff and check the Retry-After header for the required wait time.
500 — Internal Server Error: A server-side issue at Anthropic. Retry with exponential backoff. If persistent, check status.anthropic.com.
529 — Overloaded: Anthropic's servers are temporarily overloaded. Different from 429 — this means the service itself is under heavy load. Retry with longer backoff intervals.

Error handling in Python

import anthropic

client = anthropic.Anthropic()

try:
    message = client.messages.create(
        model="claude-sonnet-4-6-20250516",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
except anthropic.AuthenticationError as e:
    print(f"Invalid API key: {e}")
except anthropic.RateLimitError as e:
    print(f"Rate limited — retry after backoff: {e}")
except anthropic.APIStatusError as e:
    print(f"API error {e.status_code}: {e.message}")
except anthropic.APIConnectionError as e:
    print(f"Connection error: {e}")

The Python SDK has built-in retry logic. By default, it retries 429 and 5xx errors with exponential backoff up to 2 times. You can configure this with the max_retries parameter when creating the client:

# Retry up to 5 times with exponential backoff
client = anthropic.Anthropic(max_retries=5)

Deep dive: For a complete reference of every error code with community-reported causes and fixes, see the Claude API Error Guide.

7. Rate Limits & Best Practices

Anthropic enforces rate limits to ensure fair usage and service stability. Understanding these limits and designing around them is critical for production applications.

Rate limit tiers

Your rate limits depend on your account's usage tier, which increases automatically as you spend more on the API:

Tier 1 (new accounts): 50 requests/min, 40,000 input tokens/min, 8,000 output tokens/min
Tier 2 ($40+ spend): 1,000 requests/min, 80,000 input tokens/min, 16,000 output tokens/min
Tier 3 ($200+ spend): 2,000 requests/min, 160,000 input tokens/min, 32,000 output tokens/min
Tier 4 ($2,000+ spend): 4,000 requests/min, 400,000 input tokens/min, 80,000 output tokens/min

Check your current tier and limits in the Anthropic Console under Settings > Limits.

Reading rate limit headers

Every API response includes rate limit headers that tell you exactly where you stand:

x-ratelimit-limit-requests: 50        # Your max requests per minute
x-ratelimit-limit-tokens: 40000       # Your max tokens per minute
x-ratelimit-remaining-requests: 47    # Requests remaining in this window
x-ratelimit-remaining-tokens: 38500   # Tokens remaining in this window
x-ratelimit-reset-requests: 2026-05-16T12:00:30Z  # When requests reset
x-ratelimit-reset-tokens: 2026-05-16T12:00:15Z    # When tokens reset

Best practices for rate limit management

Implement exponential backoff — When you receive a 429, wait 1 second, then 2, then 4, up to a maximum of 60 seconds. Add random jitter (0-1 second) to prevent thundering herd effects.
Monitor remaining capacity — Read the x-ratelimit-remaining-* headers proactively. Throttle requests before hitting limits rather than reacting to 429s.
Use token-aware batching — Group small requests together and spread large requests over time. A single 100K-token request consumes more rate limit capacity than ten 10K-token requests.
Cache identical requests — If multiple users send the same prompt, cache the response and reuse it. This is especially effective for common questions and system prompt outputs.
Use the Batch API for non-urgent work — Anthropic's Message Batches API processes requests asynchronously at 50% cost with no rate limits. Ideal for data processing, evaluations, and offline analysis.

8. Building Production Apps

Moving from a prototype to a production Claude integration requires careful attention to reliability, cost management, and observability. Here are the key patterns every production app should implement.

Retry logic with exponential backoff

Production applications must handle transient errors gracefully. Here is a robust retry implementation:

import anthropic
import time
import random

def call_claude_with_retry(prompt, max_retries=5):
    client = anthropic.Anthropic()

    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-6-20250516",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}]
            )
        except anthropic.RateLimitError:
            if attempt == max_retries - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait)
        except anthropic.APIStatusError as e:
            if e.status_code >= 500:
                if attempt == max_retries - 1:
                    raise
                wait = (2 ** attempt) + random.uniform(0, 1)
                time.sleep(wait)
            else:
                raise  # Don't retry client errors (4xx)

Note: The official Python SDK already has built-in retry logic for 429 and 5xx errors. Use anthropic.Anthropic(max_retries=5) instead of writing your own retry loop in most cases. The example above is useful when you need custom retry behavior or logging.

Token budgeting and cost control

Uncontrolled token usage is the most common source of unexpected API bills. Implement these safeguards:

Set max_tokens appropriately — Do not use the maximum value unless you need it. A classification task might only need 10-50 tokens. A summary might need 500. Match the limit to the expected output size.
Track usage per request — Log usage.input_tokens and usage.output_tokens from every response. Aggregate by user, feature, or endpoint.
Implement spending alerts — Set up daily and monthly cost thresholds. Alert when spending exceeds expected levels.
Truncate long inputs — If users can submit arbitrary text, truncate or summarize inputs before sending them to the API. A 200K-token input on Opus costs $3 per request.
Use prompt caching — Anthropic caches repeated system prompts automatically in many cases. Structure your prompts so the static parts (system prompt, few-shot examples) are consistent across requests.

Structured output and validation

For applications that need structured data (JSON, CSV, specific formats), use system prompts and output validation:

import json

message = client.messages.create(
    model="claude-sonnet-4-6-20250516",
    max_tokens=1024,
    system="You are a data extraction API. Always respond with valid JSON matching the requested schema. Never include explanations outside the JSON.",
    messages=[
        {"role": "user", "content": "Extract the name, email, and company from this text: 'Contact Jane Smith at jane@acme.com, she works at Acme Corp.'"}
    ]
)

# Parse and validate
try:
    data = json.loads(message.content[0].text)
    assert "name" in data and "email" in data
except (json.JSONDecodeError, AssertionError):
    # Handle malformed output — retry or fall back
    pass

Observability and logging

In production, you need visibility into every API call. Log these fields for each request:

Request timestamp, model, and parameters
Response latency (time to first token and total time)
Token usage (input and output)
Stop reason (end_turn, max_tokens, stop_sequence)
Error codes and retry counts
Cost per request (calculated from model pricing and token usage)

Many teams use structured logging with tools like Datadog, Grafana, or simple JSON log files. The key is to have enough data to debug issues and optimize costs after deployment.

Security considerations

Never expose your API key to the client — Always proxy API calls through your backend. Embedding the key in frontend JavaScript exposes it to anyone who inspects your page source.
Sanitize user inputs — While Claude is resistant to prompt injection, you should still validate and sanitize user inputs before including them in API calls.
Implement output filtering — For user-facing applications, check Claude's outputs against your content policies before displaying them.
Use separate API keys per environment — Create distinct keys for development, staging, and production. If a key is compromised, you can revoke it without affecting other environments.

9. Frequently Asked Questions

How much does the Claude API cost?

Claude API pricing varies by model. Claude Haiku 4.5 costs $0.80 per million input tokens and $4 per million output tokens — the most economical option. Claude Sonnet 4.6 costs $3 per million input tokens and $15 per million output tokens — the best value for most applications. Claude Opus 4.6 costs $15 per million input tokens and $75 per million output tokens — for when maximum quality matters. A typical request with 1,000 input tokens and 500 output tokens on Sonnet costs about $0.01. Use the request builder above to estimate costs for your specific use case.

What is the difference between Claude Opus, Sonnet, and Haiku?

They represent three tiers of capability and cost. Opus 4.6 is the most intelligent model — best for complex reasoning, research, and multi-step agentic tasks. Sonnet 4.6 balances capability and speed, making it the default choice for most production workloads including code generation, content creation, and chat. Haiku 4.5 is the fastest and cheapest model, optimized for high-volume tasks like classification, extraction, and real-time applications. All three share a 200K-token context window.

How do I get a Claude API key?

Sign up at console.anthropic.com, go to Settings > API Keys, click Create Key, name it, and copy it immediately (it is shown only once). Store the key as an environment variable called ANTHROPIC_API_KEY. New accounts receive free credits to experiment. See Section 1 for the full step-by-step process.

Does the Claude API support streaming responses?

Yes. Add "stream": true to your request body (cURL) or use the SDK's streaming methods: client.messages.stream() in Python and client.messages.stream() in JavaScript. Streaming delivers tokens via Server-Sent Events as they are generated, dramatically reducing time-to-first-token. See Section 5 for full streaming examples in Python and JavaScript.

What are Claude API rate limits?

Rate limits depend on your account tier. Tier 1 (new accounts) allows 50 requests per minute and 40,000 tokens per minute. Limits increase automatically as you spend more — up to 4,000 requests/min and 400,000 tokens/min at Tier 4. Check your current limits under Settings > Limits in the Anthropic Console. Every response includes x-ratelimit-remaining-* headers so you can monitor usage proactively. See Section 7 for detailed tier breakdowns.

Can the Claude API process images and PDFs?

Yes. Claude supports vision by accepting base64-encoded images or image URLs in the messages content array with type: "image". Supported formats include JPEG, PNG, GIF, and WebP. Claude can also process PDF documents. Use cases include document analysis, chart reading, screenshot understanding, UI review, and visual QA. Images are tokenized based on their dimensions, typically using 1,000-2,000 tokens per image. See Section 5 for a complete vision code example.