Claude API Request Builder

Configure every parameter visually and generate production-ready code in curl, Python, and JavaScript. Includes real-time token counting and per-request cost estimation.

By Michael Lip · May 25, 2026

Request Configuration

0.3
0.0 = deterministic, 1.0 = creative
1.0
1.0 = consider all tokens, 0.1 = only top 10% probability mass
Input Tokens
~25
Max Output
1,024
Est. Cost
$0.0154

Generated Code

JSON Request Body

What This Tool Does

The Claude API Request Builder is an interactive code generator for the Anthropic Messages API. It takes the guesswork out of constructing API requests by providing a visual form for every configurable parameter. As you adjust the model, system prompt, user message, temperature, max tokens, and top_p values, the tool generates syntactically correct code in three languages simultaneously: curl for terminal testing, Python using the official anthropic SDK, and JavaScript using the @anthropic-ai/sdk package. Every code snippet updates in real time as you modify any field.

The builder also provides a formatted JSON request body preview that shows exactly what will be sent to the /v1/messages endpoint. This is useful for developers who prefer to see the raw payload structure or who are integrating with the API using a language or HTTP client that is not covered by the three built-in output formats. You can copy the JSON body and use it with any HTTP library in any programming language.

Understanding Each Parameter

The model parameter determines which Claude model processes your request. Claude Opus 4 is the most capable model, designed for complex reasoning, research synthesis, and multi-step analysis tasks. Claude Sonnet 4 balances capability and speed, making it the default choice for most production applications including code generation, content creation, and conversational AI. Claude Haiku 3.5 is the fastest and most economical model, optimized for high-volume tasks such as text classification, entity extraction, and real-time chat applications where latency matters more than depth of reasoning.

The system prompt establishes Claude's behavior and personality before the conversation begins. Think of it as background instructions that frame every response. Effective system prompts are specific about output format, tone, and constraints. For example, a system prompt like "You are a senior Python developer. Always include type hints and docstrings. Explain your reasoning before writing code." produces noticeably different results than a generic "You are a helpful assistant." The system prompt is counted as input tokens and affects pricing accordingly.

The temperature parameter controls randomness in token selection. At 0.0, Claude always selects the highest-probability token, producing deterministic and focused output. At 1.0, the selection is more random, producing creative and varied responses. For factual queries, code generation, and data extraction, use 0.0 to 0.3. For creative writing, brainstorming, and conversational applications, use 0.7 to 1.0. The default of 1.0 works for general-purpose use, but most production applications benefit from explicitly setting this value. If you need to compare how different Claude models respond to the same prompt and temperature, LockML provides side-by-side model comparison tools.

The max_tokens parameter sets an upper bound on the number of tokens Claude can generate in its response. One token is approximately four characters of English text, so 1,024 tokens is roughly 750 to 800 words. If Claude reaches the max_tokens limit, the response is truncated and the stop_reason in the response will be "max_tokens" instead of "end_turn". Set this high enough to avoid truncation but low enough to control costs, since you pay for actual tokens generated, not the maximum.

The top_p parameter (nucleus sampling) provides an alternative way to control output diversity. It limits token selection to the smallest set of tokens whose cumulative probability exceeds the threshold. A top_p of 0.9 means Claude considers only the tokens that collectively make up 90% of the probability distribution, discarding the long tail of unlikely tokens. In practice, most developers use temperature alone and leave top_p at the default of 1.0. If you adjust both simultaneously, the effects compound, so it is best to tune one at a time.

How Token Counting Works

The token estimator in this tool uses a rough approximation of one token per four characters of English text. This is a reasonable estimate for typical English prose, but actual token counts can vary based on language, code syntax, whitespace, and special characters. For precise token counting before sending a request, Anthropic provides a token counting endpoint and the Python SDK includes a count_tokens method. The estimator here is designed for quick cost projections, not exact billing calculations.

Input tokens include everything you send to the API: the system prompt, all messages in the conversation, and structural JSON overhead (role labels, array brackets, and so on). The overhead is typically 10 to 20 tokens for a simple single-turn request. Output tokens are the tokens Claude actually generates, bounded by your max_tokens parameter. You are billed for actual output tokens, not the max_tokens value. So if you set max_tokens to 4096 but Claude finishes its response in 200 tokens, you only pay for 200 output tokens. For tracking costs across multiple API calls and models, KickLLM offers comprehensive LLM cost monitoring dashboards.

Cost Estimation Methodology

The cost estimate shown in the builder calculates the worst-case scenario: the full input token count plus the full max_tokens value as output. This gives you a ceiling on what a single request could cost. In practice, most responses use fewer output tokens than the maximum, so actual costs are usually lower than the estimate. For production budgeting, track your actual token usage over a sample period and calculate the average output-to-max ratio for your specific use case.

Current pricing as of May 2026 is: Claude Haiku 3.5 at $0.80 per million input tokens and $4.00 per million output tokens; Claude Sonnet 4 at $3.00 per million input and $15.00 per million output; Claude Opus 4 at $15.00 per million input and $75.00 per million output. Anthropic also offers prompt caching for repeated system prompts at reduced rates, and a Batch API for non-time-sensitive workloads at 50% of standard pricing.

Using the Generated Code

The curl output is designed for terminal testing and CI/CD pipelines. It assumes your API key is stored in the ANTHROPIC_API_KEY environment variable. Copy the command, paste it into your terminal, and you will receive a JSON response from Claude within seconds. The curl format is also useful for sharing reproducible API calls in bug reports, documentation, and team communication.

The Python output uses the official anthropic Python SDK, which handles authentication, retries, and response parsing automatically. Install it with pip install anthropic. The SDK reads the ANTHROPIC_API_KEY environment variable by default, so you do not need to pass the key explicitly. The generated code includes proper method calls with named parameters and produces clean, idiomatic Python that you can drop directly into your project.

The JavaScript output uses the official @anthropic-ai/sdk package. Install it with npm install @anthropic-ai/sdk. The generated code uses ES module imports and async/await syntax compatible with Node.js 18+ and modern bundlers. Like the Python SDK, it reads the ANTHROPIC_API_KEY environment variable automatically. For webhook-driven architectures where API calls are triggered by events, InvokeBot provides webhook management tools that complement this request builder.

Best Practices for API Requests

Start with a specific system prompt. Generic system prompts like "You are a helpful assistant" waste tokens and produce generic responses. Instead, specify the exact role, output format, constraints, and tone. A well-crafted system prompt of 50 to 100 tokens can dramatically improve response quality and reduce the need for follow-up requests, saving both time and money.

Set temperature intentionally. For any task where consistency matters (code generation, data extraction, classification), use temperature 0.0 to 0.2. For tasks where variety is valuable (creative writing, brainstorming, conversation), use 0.7 to 1.0. Avoid the trap of using the default temperature for all tasks. The difference in output quality between a well-tuned temperature and the default is often significant.

Right-size your max_tokens. Setting max_tokens to the model maximum for every request is wasteful if most responses are short. Analyze your typical response length and set max_tokens to 1.5 times that value. This provides headroom for longer responses while keeping cost estimates realistic. For the Claude API playground with more presets and live testing, try the ClaudKit homepage tool.

Use the JSON body preview to debug issues. When you receive unexpected errors from the API, compare the raw JSON body from this tool against the error message. Common issues include misspelled model names, max_tokens exceeding the model limit, and malformed message arrays. The JSON preview makes these issues visible before you send the request. For a comprehensive reference of every API error code and how to fix it, see the ClaudKit API Error Guide.

Frequently Asked Questions

How do I build a Claude API request?

A Claude API request requires a POST to api.anthropic.com/v1/messages with headers for content-type, x-api-key, and anthropic-version. The JSON body needs a model name (e.g., claude-opus-4-20250514), a max_tokens integer, and a messages array with user/assistant role objects. Optional parameters include system prompt, temperature (0-1), and top_p (0-1). Use this visual builder to configure all parameters and generate ready-to-use code in curl, Python, or JavaScript.

What is the difference between temperature and top_p in the Claude API?

Temperature controls randomness in Claude's output: 0.0 produces deterministic responses while 1.0 produces creative, varied responses. Top_p (nucleus sampling) limits token selection to a cumulative probability threshold. A top_p of 0.9 means Claude only considers tokens whose cumulative probability reaches 90%. For most use cases, adjust temperature alone and leave top_p at the default of 1.0. Setting both low produces very predictable output; both high produces very creative output.

How much does a single Claude API request cost?

Cost depends on the model and token count. Claude Haiku 3.5 costs $0.80 per million input tokens and $4 per million output tokens. Claude Sonnet 4 costs $3 per million input and $15 per million output. Claude Opus 4 costs $15 per million input and $75 per million output. A typical 500-token input and 1,024-token output on Sonnet costs about $0.017. Use the built-in cost estimator above to calculate exact costs for your configuration.

How do I estimate token count for a Claude API request?

A rough estimate is 1 token per 4 characters of English text. For precise counts, use Anthropic's tokenizer endpoint or the Python SDK's count_tokens method. This request builder provides real-time token estimates as you type your system prompt and user message, factoring in JSON overhead from the request structure. Input tokens include both the system prompt and user message; output tokens are bounded by your max_tokens parameter.

Can I use the generated code in production?

Yes. The generated code is production-ready and uses the official Anthropic API format and SDK methods. The curl output works in any terminal or CI/CD pipeline. The Python output uses the official anthropic SDK. The JavaScript output uses the @anthropic-ai/sdk package. Replace the placeholder API key with your actual key from console.anthropic.com. Always store your API key in environment variables, never in source code.

Developer and creator of the Zovo Tools network. Building free, privacy-first developer tools that run entirely in the browser. No tracking, no sign-ups, no server-side processing. Open source on GitHub.