Claude Code's Extended Thinking Output Raises Authenticity Questions

An investigation reveals potential reliability concerns with Claude Code's Extended Thinking feature, questioning whether its output accurately represents the model's reasoning process.

Claude Code's Extended Thinking feature does not show you the model's actual reasoning process. It shows you something that looks like reasoning, which is a different thing entirely.

What Patrick McCanna found in the thinking stream

"The text in Claude Code's Extended Thinking output is not authentic... it's generated after the fact, shaped by the same output pressures as the final response." - Patrick McCanna, patrickmccanna.net

McCanna's investigation into Extended Thinking focuses on a specific claim Anthropic makes about the feature: that the expanded, visible output represents the model reasoning through a problem before answering. His argument is that this framing is misleading. The thinking text is generated as part of the same forward pass, subject to the same token prediction pressures as the final response. There is no separate, unconstrained reasoning phase happening before the answer. What you see is not a window into a prior process. It is the process, shaped by the expectation that a final answer follows.

This matters more for Claude Code than for a chatbot. When a developer watches Extended Thinking output scroll by while the model works through a complex refactoring task, there is a strong implicit signal: the model is showing its work, and the work is real. That signal drives trust in the output. If the thinking stream is not an authentic prior step but rather a co-generated narrative, the trust is built on a shaky foundation.

Visualization of an AI model's token prediction process — How language models generate text, including visible "reasoning" traces

The number that anchors the problem

32K

Extended Thinking token budget available in Claude's API - tokens that generate reasoning traces, not verified internal states

Anthropic allows up to 32,000 tokens for the Extended Thinking budget in the Claude API. That is a large surface area for text that users are likely to treat as diagnostic. If even a fraction of developers are using those traces to debug model behavior, audit decisions, or build confidence before accepting a large code change, then the interpretation gap matters at scale. Cut that budget in half and the thinking text is shorter, but no more authentic. Double it and you get more words, not more insight into what the model is actually doing.

The issue is not the budget size. The issue is what the budget is buying. Developers paying attention to the token cost of Extended Thinking - it adds latency and API spend - deserve a clear answer to the question: what am I paying for? What you are paying for is a plausible-looking account of reasoning, not a verified transcript of it.

Why this happened and why the framing stuck

There is a version of the Extended Thinking problem that is purely a marketing framing issue, and there is a version that is a deeper architectural one. McCanna's post is mostly about the framing, but the architectural point is what makes the framing sticky and hard to correct.

Language models do not have a two-stage process where thinking happens and then output is generated. The entire sequence, including anything labeled "thinking," is token prediction. When Anthropic introduced Extended Thinking, the product frame they chose - visible reasoning, chain-of-thought you can inspect - drew on a long tradition in AI research where scratchpad reasoning demonstrably improves final outputs. That research is real. Chain-of-thought prompting works. The performance improvement from giving models space to "think" before answering is documented and replicable.

But there is a gap between "giving the model space to generate intermediate tokens improves output quality" and "what you are seeing is the model's authentic internal reasoning." The first is a technical result. The second is an interpretability claim. Anthropic slid between those two framings in how Extended Thinking is described, probably because the second framing is what makes the feature feel useful to a developer audience. Nobody gets excited about "we increased the intermediate token budget." People get excited about "you can watch the model think."

The result is that a real performance improvement - Extended Thinking does produce better outputs on complex tasks in many benchmarks - got packaged with an interpretability claim it cannot fully support. That combination is difficult to walk back, because pulling the interpretability framing means admitting the feature is less useful for debugging and auditing than it appeared.

How the thinking trace fails when approving production code changes

The failure mode is specific: a developer uses the thinking trace to decide whether to accept a large, automated code change. They read through the reasoning, see the model correctly identify the risk surface, acknowledge edge cases, and propose a solution. They approve the change. The code ships. The bug the model appeared to reason about is present anyway.

This is not hypothetical. It is the natural outcome of treating a co-generated narrative as an audit trail. The thinking text and the final code share the same optimization target. If the model is going to produce a subtly wrong implementation, it will also produce a subtly wrong justification for that implementation. The reasoning trace does not act as a check on the output. It is downstream of the same process that produced the output.

The class of developers most exposed to this are those running Claude Code in agentic loops with minimal human review, specifically because they adopted Extended Thinking as a substitute for that review. The thinking trace gave them something to point to. It looked like oversight. It was not oversight. Compare this with how Cursor and GitHub Copilot handle code suggestions: neither pretends to show you reasoning, which means users calibrate their trust differently - they read the code, not a story about the code.

What "thinking" means in a transformer architecture

Here is the plain version. A transformer model like Claude generates text one token at a time. Each token is predicted based on everything that came before it - the prompt, the conversation history, and every token the model has already generated in the current response. There is no separate module that thinks and then hands off to a module that speaks. It is one process, start to finish.

Chain-of-thought prompting improves results because generating intermediate tokens gives the model more context for later tokens. If the model writes "let me check whether the input could be null" before writing the code that handles null input, the null-handling code is generated with that prior token in context. That is the mechanism. It is real and useful.

But here is what it is not: it is not the model having a thought and then reporting that thought to you. The text "let me check whether the input could be null" is generated by the same prediction process as the code. It can be wrong. It can be inconsistent with the code. It can confidently describe a reasoning step the model did not "take" in any meaningful sense. The model is not introspecting on its own process and writing it down. It is predicting what a model's written reasoning would look like, and then predicting what code follows from that written reasoning.

This distinction matters for anyone using Extended Thinking as a debugging tool. You are not reading a log file. You are reading generated text that describes a process that does not exist in the form described. Anthropic's own interpretability research on tracing model internals makes this clear: what models do internally and what they say they do are not the same representation. Extended Thinking lives in the "what they say" column, not the "what they do" column.

Diagram representing transformer token generation sequence — Language models generate tokens sequentially - there is no separate reasoning phase

How to calibrate your use of Extended Thinking

Use case	Extended Thinking helps	Extended Thinking does not help	Better alternative
Complex multi-step coding problems	Yes - intermediate tokens improve output quality	Tracing the "real" logic path	Test the output directly
Auditing an agentic code change	No	The trace is not an audit trail	Human code review or static analysis
Debugging unexpected model behavior	Marginally - may surface edge cases in the text	Diagnosing why the model went wrong	Prompt iteration, smaller task scope
Building trust before accepting large refactors	No	The reasoning and the code share the same failure modes	Incremental changes, test coverage
Getting better answers on hard logical questions	Yes - measurably on benchmarks	Understanding how the answer was reached	Verify the answer independently
Using local model alternatives with similar features	Same caveats apply to all chain-of-thought implementations	Any model's visible reasoning is not a verified internal trace	Output validation over reasoning inspection

Extended Thinking produces better code on hard problems. That is real. Treat it as a performance feature, not a transparency feature, and it earns its cost. Treat it as a window into what the model is doing, and you will make worse decisions than if you had read the output directly.