ai-codehow-toai-automation

CodeBurn: Monitor Claude Code Token Spending by Task

New open-source tool gives developers granular visibility into token consumption across Claude Code agents, solving cost tracking problems for teams spending $1400+ weekly on AI-powered coding.

April 20, 2026

CodeBurn: Monitor Claude Code Token Spending by Task

Six months ago, running Claude agents cost a few hundred dollars a month for teams doing serious work. Now teams routinely spend $1,400 or more per week, and most of them can't tell you which tasks are driving that number. The spend scaled. The visibility didn't.

CodeBurn is an open-source tool that solves exactly this: it tracks token consumption at the task level instead of the session or API-call level, giving developers the granularity to understand where the money is going and whether it's being spent on work that's worth it.

Why API dashboards aren't enough

Anthropic gives you data. Total tokens this month. Daily usage over time. Cumulative spend. What it doesn't give you is the breakdown you actually need for cost management decisions.

Running Claude as a coding agent is like hiring a contractor who sends one weekly invoice with no line items. You see the total. You have no idea whether the "code review" agent used 8,000 tokens or 80,000. You can't tell whether a batch of 10 tasks was expensive because each was legitimately complex or because something went wrong and an agent looped endlessly before failing.

Without task-level visibility, cost optimization is guesswork. You can try to optimize prompts without knowing which prompts are the expensive ones. You can try to reduce agent usage without knowing which agents are consuming disproportionate resources.

$1,400+

weekly token spend for teams running Claude Code agents without task-level cost visibility

What CodeBurn actually tracks

The tool instruments your Claude Code calls and surfaces four categories of metrics that matter for operational cost management:

  • Token cost per individual task - not per API call, but per logical unit of work you define
  • Breakdown by agent, workflow, or any custom grouping you attach to your requests
  • Input versus output token ratios, which helps identify tasks that are ballooning unexpectedly
  • Historical patterns across runs so you can spot cost drift before it becomes a budget problem

The output is comparative data. Task A used 8,000 tokens. Task B used 24,000 tokens for similar work. You can now ask why Task B cost three times as much and investigate with actual numbers rather than assumptions.

# Query token usage broken down by task type
codeburn --group-by task --time-range last-7-days --format json

You filter by date range, agent name, task type, or any metadata you attached when the request was logged. The analysis tooling respects how real teams actually structure their Claude usage.

Why this tool had to be built by the community

The Hacker News discussion around CodeBurn included multiple developers describing internal tracking scripts they'd already built because Anthropic's dashboards weren't sufficient. These weren't people who wanted a polished analytics platform. They were people solving a painful operational problem with whatever they had available.

"We're using Claude Code heavily and our bill is $2000/week. I have no idea if that's reasonable for what we're doing. This solves that problem."

That comment captures the actual value proposition. CodeBurn isn't a competitor to Claude. It's boring observability infrastructure - the kind that should have existed from day one and didn't. The pattern is familiar from other platform maturation stories: database monitoring, infrastructure metrics, API cost tracking. When first-party platforms don't provide enough observability, communities build the gap. Then the platform eventually builds it in-house, or the community tool becomes essential enough that everyone standardizes on it anyway.

Technical integration

CodeBurn works by injecting itself into your Claude Code workflow. You point it at your API interactions. It captures token counts, request metadata, task identifiers, and any custom labels you've added. Then it aggregates everything into reports segmented by whatever dimensions you care about.

The integration is additive. You don't restructure your agents or rewrite workflows. You add logging. The tool reads the logs. Claude's API already exposes the token usage data - CodeBurn just organizes it into a format that supports actual cost decisions rather than just producing a total.

If you're already running monitoring infrastructure like CloudWatch or Datadog, CodeBurn integrates with those rather than requiring a separate dashboard. That matters for teams that want cost data in the same place as the rest of their operational metrics.

When to implement this and when to skip it

The rough threshold: if you're spending over $500 monthly on Claude agents, task-level cost tracking saves money faster than it takes to set up. Below that, the overhead probably isn't worth it and your intuition about where costs are coming from is probably close enough.

Above $500 monthly - and certainly at $1,400 weekly - not knowing your cost breakdown is expensive in itself. You're optimizing blind. Every prompt refinement, every architectural decision about agent structure, every choice about which tasks to route through Claude versus a cheaper model - all of those decisions get worse without granular cost data.

On ROI

CodeBurn is free and open source. The cost is the setup time. For teams spending serious money on Claude agents, the first week of optimization decisions it enables typically recovers that time immediately.

Verification checklist before implementing

  • Confirm your monthly Claude agent spend exceeds $500 - below that threshold the setup overhead may not be justified
  • Identify the specific agents or workflows you want to track first, so you start with meaningful segmentation rather than one large undifferentiated bucket
  • Decide upfront how you want to structure task labels - consistent labeling is what makes the data actually comparable across runs
  • Check whether your existing monitoring infrastructure (CloudWatch, Datadog, etc.) can receive CodeBurn output so cost data lives alongside your other operational metrics
  • Run CodeBurn for at least two weeks before making optimization decisions - you need enough data to distinguish normal variance from actual cost drivers
  • Set a baseline measurement before any changes, so you can verify whether prompt refinements or architectural adjustments actually reduce costs or just shift them

Comments

Leave a comment

Some links in this article are affiliate links. Learn more.