Claude API Quickstart Guide — Python, JavaScript & cURL Examples
Go from zero to your first API call in under 10 minutes. This guide covers everything you need to start building with the Anthropic Claude API: authentication, model selection, request formatting, streaming, tool use, vision, error handling, and production best practices.
1. Getting Your API Key
Before you can make any Claude API call, you need an API key from Anthropic. The key authenticates your requests and tracks your usage against your account's billing and rate limits. Here is the step-by-step process to get one.
1 Create an Anthropic account. Go to console.anthropic.com and sign up with your email address or Google account. Verify your email if prompted.
2 Navigate to API Keys. Once logged in, click Settings in the left sidebar, then select API Keys. This page lists all your active keys and lets you create new ones.
3 Create a new key. Click the Create Key button. Give your key a descriptive name (e.g., "quickstart-dev" or "production-backend"). Click Create.
4 Copy and store your key securely. The key will be displayed exactly once. Copy it immediately. It starts with sk-ant-. Store it in a password manager or secrets vault — never commit it to version control.
5 Set it as an environment variable. This keeps your key out of source code. Add this to your shell profile (~/.bashrc, ~/.zshrc, or equivalent):
export ANTHROPIC_API_KEY="sk-ant-api03-your-key-here"
Reload your shell with source ~/.zshrc (or ~/.bashrc) for the change to take effect. On Windows, use setx ANTHROPIC_API_KEY "sk-ant-api03-your-key-here" in a Command Prompt or set it through System Properties.
2. Your First API Call
The Claude API uses the Messages API endpoint. Every request requires three things: a model identifier, a max_tokens limit for the response, and a messages array containing the conversation. Let us walk through your first call in three languages.
Install the SDK
Python (requires Python 3.8+):
pip install anthropic
JavaScript/TypeScript (requires Node.js 18+):
npm install @anthropic-ai/sdk
For cURL, no installation is needed — it comes pre-installed on macOS and most Linux distributions.
Make the request
Python example:
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env
message = client.messages.create(
model="claude-sonnet-4-6-20250516",
max_tokens=1024,
messages=[
{"role": "user", "content": "What is the Claude API and how does it work?"}
]
)
print(message.content[0].text)
JavaScript/TypeScript example:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic(); // reads ANTHROPIC_API_KEY from env
const message = await client.messages.create({
model: "claude-sonnet-4-6-20250516",
max_tokens: 1024,
messages: [
{ role: "user", content: "What is the Claude API and how does it work?" }
],
});
console.log(message.content[0].text);
cURL example:
curl https://api.anthropic.com/v1/messages \
-H "content-type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-6-20250516",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "What is the Claude API and how does it work?"}
]
}'
Understanding the response
A successful API call returns a JSON object with this structure:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "The Claude API is a RESTful interface..."
}
],
"model": "claude-sonnet-4-6-20250516",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 15,
"output_tokens": 247
}
}
Key fields in the response:
content— An array of content blocks. For text responses, each block hastype: "text"and the actual response intext.stop_reason— Why the model stopped generating."end_turn"means the model finished naturally."max_tokens"means it hit your token limit."stop_sequence"means it encountered a custom stop sequence.usage— Token counts for billing.input_tokensis what you sent;output_tokensis what Claude generated.model— The exact model version that processed your request.
3. Interactive API Request Builder
Configure your API request below and get formatted code in cURL, Python, JavaScript, and Go. Adjust the model, parameters, system prompt, and user message to see the request update in real time.
Configure Request
Generated Code
4. Model Selection Guide — Opus vs Sonnet vs Haiku
Anthropic offers three model tiers. Choosing the right one depends on your task complexity, latency requirements, and budget. Here is a detailed comparison to help you decide.
Claude Opus 4.6
- Most capable model
- Best for complex reasoning
- Research & analysis
- Multi-step agentic tasks
- 200K context window
Claude Sonnet 4.6
- Best balance of speed & quality
- Production workhorse
- Code generation
- Content creation
- 200K context window
Claude Haiku 4.5
- Fastest response time
- Lowest cost per token
- Classification & extraction
- Real-time chat
- 200K context window
When to use each model
Choose Opus when you need maximum intelligence and accuracy matters more than speed or cost. Ideal use cases: legal document analysis, scientific reasoning, complex code architecture decisions, writing that requires nuance, and agentic workflows where the model needs to plan multiple steps autonomously. If a wrong answer costs more than the API call, use Opus.
Choose Sonnet for the vast majority of production workloads. It delivers excellent quality at a fraction of Opus pricing and responds significantly faster. Sonnet is the default choice for: chat applications, code generation and review, content writing, summarization, data analysis, and customer support automation. Most developers start and stay with Sonnet.
Choose Haiku when speed and cost are your primary constraints. Haiku excels at high-volume, lower-complexity tasks: text classification, entity extraction, sentiment analysis, simple Q&A, content moderation, and routing decisions. At roughly 4x cheaper than Sonnet, Haiku can process millions of requests economically. It is also the best choice for real-time applications where low latency is critical.
5. Advanced Features
Streaming responses
Streaming delivers response tokens as they are generated rather than waiting for the complete response. This dramatically reduces perceived latency — your users see text appearing in real time instead of waiting seconds for a full response to arrive.
Python streaming:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-6-20250516",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a haiku about APIs."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
JavaScript streaming:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const stream = await client.messages.stream({
model: "claude-sonnet-4-6-20250516",
max_tokens: 1024,
messages: [{ role: "user", content: "Write a haiku about APIs." }],
});
for await (const event of stream) {
if (event.type === "content_block_delta" &&
event.delta.type === "text_delta") {
process.stdout.write(event.delta.text);
}
}
With cURL, add "stream": true to the request body. The response will be Server-Sent Events (SSE) with each event containing a delta of the response content.
System prompts
System prompts set the behavior, tone, and constraints for Claude's responses. They are provided as a top-level system parameter, not inside the messages array. System prompts are powerful for controlling output format, enforcing guardrails, and establishing persona.
message = client.messages.create(
model="claude-sonnet-4-6-20250516",
max_tokens=1024,
system="You are a senior Python developer. Respond only with code. No explanations unless asked.",
messages=[
{"role": "user", "content": "Write a function to retry HTTP requests with exponential backoff."}
]
)
System prompts are cached separately from messages, so reusing the same system prompt across multiple requests can benefit from Anthropic's prompt caching and reduce input token costs.
Tool use (function calling)
Tool use lets Claude call external functions you define. You describe tools with JSON schemas, Claude decides when to call them, and you execute the function and return results. This enables Claude to interact with databases, APIs, file systems, and any external service.
message = client.messages.create(
model="claude-sonnet-4-6-20250516",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get the current weather for a location.",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g., 'San Francisco, CA'"
}
},
"required": ["location"]
}
}
],
messages=[
{"role": "user", "content": "What's the weather like in London?"}
]
)
When Claude wants to use a tool, the response will contain a tool_use content block with the tool name and input arguments. You then execute the function, and send the result back in a tool_result message to continue the conversation.
Vision (image understanding)
Claude can analyze images passed as base64-encoded data or URLs. This enables document analysis, chart reading, screenshot understanding, UI review, and visual question answering. Supported formats: JPEG, PNG, GIF, and WebP.
import base64
with open("chart.png", "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-6-20250516",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe the trends shown in this chart."
}
]
}
]
)
Images are tokenized based on their dimensions. A typical screenshot uses around 1,000-2,000 tokens. You can send multiple images in a single request by including multiple image content blocks in the messages array.
6. Error Handling
The Claude API returns standard HTTP status codes with detailed error messages. Proper error handling is essential for building reliable applications. Here are the most common errors and how to fix them.
Common error codes
- 400 — Bad Request: Your request body is malformed. Common causes: messages array does not start with a
userrole,max_tokensexceeds the model limit, invalid model name, or empty content blocks. Read the error message carefully — Anthropic provides field-level validation details. - 401 — Unauthorized: Invalid or missing API key. Verify your key starts with
sk-ant-, has not been revoked, and is passed in thex-api-keyheader (notAuthorization: Bearer). - 403 — Forbidden: Your API key does not have permission for the requested resource or model. Check your account's access level in the Anthropic Console.
- 429 — Rate Limited: You have exceeded your requests-per-minute or tokens-per-minute limit. Implement exponential backoff and check the
Retry-Afterheader for the required wait time. - 500 — Internal Server Error: A server-side issue at Anthropic. Retry with exponential backoff. If persistent, check status.anthropic.com.
- 529 — Overloaded: Anthropic's servers are temporarily overloaded. Different from 429 — this means the service itself is under heavy load. Retry with longer backoff intervals.
Error handling in Python
import anthropic
client = anthropic.Anthropic()
try:
message = client.messages.create(
model="claude-sonnet-4-6-20250516",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
except anthropic.AuthenticationError as e:
print(f"Invalid API key: {e}")
except anthropic.RateLimitError as e:
print(f"Rate limited — retry after backoff: {e}")
except anthropic.APIStatusError as e:
print(f"API error {e.status_code}: {e.message}")
except anthropic.APIConnectionError as e:
print(f"Connection error: {e}")
The Python SDK has built-in retry logic. By default, it retries 429 and 5xx errors with exponential backoff up to 2 times. You can configure this with the max_retries parameter when creating the client:
# Retry up to 5 times with exponential backoff
client = anthropic.Anthropic(max_retries=5)
7. Rate Limits & Best Practices
Anthropic enforces rate limits to ensure fair usage and service stability. Understanding these limits and designing around them is critical for production applications.
Rate limit tiers
Your rate limits depend on your account's usage tier, which increases automatically as you spend more on the API:
- Tier 1 (new accounts): 50 requests/min, 40,000 input tokens/min, 8,000 output tokens/min
- Tier 2 ($40+ spend): 1,000 requests/min, 80,000 input tokens/min, 16,000 output tokens/min
- Tier 3 ($200+ spend): 2,000 requests/min, 160,000 input tokens/min, 32,000 output tokens/min
- Tier 4 ($2,000+ spend): 4,000 requests/min, 400,000 input tokens/min, 80,000 output tokens/min
Check your current tier and limits in the Anthropic Console under Settings > Limits.
Reading rate limit headers
Every API response includes rate limit headers that tell you exactly where you stand:
x-ratelimit-limit-requests: 50 # Your max requests per minute
x-ratelimit-limit-tokens: 40000 # Your max tokens per minute
x-ratelimit-remaining-requests: 47 # Requests remaining in this window
x-ratelimit-remaining-tokens: 38500 # Tokens remaining in this window
x-ratelimit-reset-requests: 2026-05-16T12:00:30Z # When requests reset
x-ratelimit-reset-tokens: 2026-05-16T12:00:15Z # When tokens reset
Best practices for rate limit management
- Implement exponential backoff — When you receive a 429, wait 1 second, then 2, then 4, up to a maximum of 60 seconds. Add random jitter (0-1 second) to prevent thundering herd effects.
- Monitor remaining capacity — Read the
x-ratelimit-remaining-*headers proactively. Throttle requests before hitting limits rather than reacting to 429s. - Use token-aware batching — Group small requests together and spread large requests over time. A single 100K-token request consumes more rate limit capacity than ten 10K-token requests.
- Cache identical requests — If multiple users send the same prompt, cache the response and reuse it. This is especially effective for common questions and system prompt outputs.
- Use the Batch API for non-urgent work — Anthropic's Message Batches API processes requests asynchronously at 50% cost with no rate limits. Ideal for data processing, evaluations, and offline analysis.
8. Building Production Apps
Moving from a prototype to a production Claude integration requires careful attention to reliability, cost management, and observability. Here are the key patterns every production app should implement.
Retry logic with exponential backoff
Production applications must handle transient errors gracefully. Here is a robust retry implementation:
import anthropic
import time
import random
def call_claude_with_retry(prompt, max_retries=5):
client = anthropic.Anthropic()
for attempt in range(max_retries):
try:
return client.messages.create(
model="claude-sonnet-4-6-20250516",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
except anthropic.RateLimitError:
if attempt == max_retries - 1:
raise
wait = (2 ** attempt) + random.uniform(0, 1)
time.sleep(wait)
except anthropic.APIStatusError as e:
if e.status_code >= 500:
if attempt == max_retries - 1:
raise
wait = (2 ** attempt) + random.uniform(0, 1)
time.sleep(wait)
else:
raise # Don't retry client errors (4xx)
anthropic.Anthropic(max_retries=5) instead of writing your own retry loop in most cases. The example above is useful when you need custom retry behavior or logging.
Token budgeting and cost control
Uncontrolled token usage is the most common source of unexpected API bills. Implement these safeguards:
- Set
max_tokensappropriately — Do not use the maximum value unless you need it. A classification task might only need 10-50 tokens. A summary might need 500. Match the limit to the expected output size. - Track usage per request — Log
usage.input_tokensandusage.output_tokensfrom every response. Aggregate by user, feature, or endpoint. - Implement spending alerts — Set up daily and monthly cost thresholds. Alert when spending exceeds expected levels.
- Truncate long inputs — If users can submit arbitrary text, truncate or summarize inputs before sending them to the API. A 200K-token input on Opus costs $3 per request.
- Use prompt caching — Anthropic caches repeated system prompts automatically in many cases. Structure your prompts so the static parts (system prompt, few-shot examples) are consistent across requests.
Structured output and validation
For applications that need structured data (JSON, CSV, specific formats), use system prompts and output validation:
import json
message = client.messages.create(
model="claude-sonnet-4-6-20250516",
max_tokens=1024,
system="You are a data extraction API. Always respond with valid JSON matching the requested schema. Never include explanations outside the JSON.",
messages=[
{"role": "user", "content": "Extract the name, email, and company from this text: 'Contact Jane Smith at jane@acme.com, she works at Acme Corp.'"}
]
)
# Parse and validate
try:
data = json.loads(message.content[0].text)
assert "name" in data and "email" in data
except (json.JSONDecodeError, AssertionError):
# Handle malformed output — retry or fall back
pass
Observability and logging
In production, you need visibility into every API call. Log these fields for each request:
- Request timestamp, model, and parameters
- Response latency (time to first token and total time)
- Token usage (input and output)
- Stop reason (end_turn, max_tokens, stop_sequence)
- Error codes and retry counts
- Cost per request (calculated from model pricing and token usage)
Many teams use structured logging with tools like Datadog, Grafana, or simple JSON log files. The key is to have enough data to debug issues and optimize costs after deployment.
Security considerations
- Never expose your API key to the client — Always proxy API calls through your backend. Embedding the key in frontend JavaScript exposes it to anyone who inspects your page source.
- Sanitize user inputs — While Claude is resistant to prompt injection, you should still validate and sanitize user inputs before including them in API calls.
- Implement output filtering — For user-facing applications, check Claude's outputs against your content policies before displaying them.
- Use separate API keys per environment — Create distinct keys for development, staging, and production. If a key is compromised, you can revoke it without affecting other environments.
9. Frequently Asked Questions
ANTHROPIC_API_KEY. New accounts receive free credits to experiment. See Section 1 for the full step-by-step process."stream": true to your request body (cURL) or use the SDK's streaming methods: client.messages.stream() in Python and client.messages.stream() in JavaScript. Streaming delivers tokens via Server-Sent Events as they are generated, dramatically reducing time-to-first-token. See Section 5 for full streaming examples in Python and JavaScript.x-ratelimit-remaining-* headers so you can monitor usage proactively. See Section 7 for detailed tier breakdowns.type: "image". Supported formats include JPEG, PNG, GIF, and WebP. Claude can also process PDF documents. Use cases include document analysis, chart reading, screenshot understanding, UI review, and visual QA. Images are tokenized based on their dimensions, typically using 1,000-2,000 tokens per image. See Section 5 for a complete vision code example.