How does prompt caching reduce costs?

The system prompt is cached after the first turn. Subsequent turns reuse the cache, reducing input tokens billed. Only the first turn pays full system prompt cost.

What is the cache TTL?

Cache entries expire after 5 minutes of inactivity. Each new request resets the TTL.

What is the minimum cacheable size?

The system prompt must be at least 1024 tokens to be eligible for caching.

Claude Prompt Caching Calculator — Estimate Cache Savings

System prompt tokens

Conversation turns

Avg user message tokens

Avg assistant response tokens

Model

Total tokens (no cache)

—

Total tokens (with cache)

—

Cost without cache

—

Cost with cache

—

Savings

—

Savings %

—

📋 Cost breakdown per turn

Turn	Input cached?	Input tokens	Output tokens	Cost (USD)

💻 Enable prompt caching (API example)

// Claude API request with cache control (system prompt)
const response = await fetch("https://api.anthropic.com/v1/messages", {
  method: "POST",
  headers: { "x-api-key": "sk-...", "anthropic-version": "2023-06-01" },
  body: JSON.stringify({
    model: "claude-sonnet-4-20250514",
    system: [
      { "type": "text", "text": "You are a helpful assistant.", "cache_control": { "type": "ephemeral" } }
    ],
    messages: [{ "role": "user", "content": "Hello" }]
  })
});
      

⏳ Cache TTL 5 minutes inactivity

📏 Min. cacheable size 1024 tokens (system prompt)

⚡ First turn full system prompt cost

🔁 Subsequent turns cached system prompt (reduced cost)

* Caching only applies to system prompt. User & assistant tokens are billed normally each turn.