Semble: Code Search for Agents Uses 98% Fewer Tokens

Open-source tool optimizes code search in large codebases for Claude Code and other agents, dramatically reducing token consumption compared to traditional grep methods.

Are you running an AI coding agent on a large codebase and watching your token bill climb every time it tries to find anything? That is the problem Semble was built for. The tool, which appeared on Hacker News this week as a Show HN from MinishLab, is an open-source code search layer designed specifically for agents like Claude Code. The core claim: semantic code search that uses 98% fewer tokens than grep. On a large monorepo, that number is not a rounding error. It is the difference between an agent session that costs a dollar and one that costs fifty.

How grep breaks agent workflows at scale

The failure mode is predictable once you have seen it once. An agent needs to find where a function is defined, or locate all the places a particular pattern appears across a large codebase. It calls grep, or the equivalent file-reading loop, and gets back thousands of lines of context. All of that goes into the context window. The model processes it, finds the two relevant lines, and moves on. The other 2,800 lines were noise that you paid for. This is not a bug in the agent. It is a structural problem with how search tools were designed. Grep was built for humans reading a terminal. When a human runs grep, they skim the output in seconds. When an agent runs grep, it has to process every character of the output through the model. The token cost of the search is proportional to the verbosity of the results, not the relevance of them. On small projects this barely matters. On a codebase with 500,000 lines of code across hundreds of files, it becomes the dominant cost center in any agentic workflow. Teams using Claude Code or Cursor on enterprise-scale repos have reported this pattern consistently: the agent spends more tokens navigating than it does writing.

What the HN thread said

The core insight is that agents don't need grep output - they need grep's answer. There's a massive difference between returning 3,000 lines of context and returning "the function is in auth/session.py, line 47."

That comment from the HN thread captures the design philosophy more cleanly than the README does. Semble uses semantic embeddings to find relevant code, then returns structured references instead of raw file contents. The agent gets a pointer, not a dump. Several commenters pushed back on the 98% figure, asking how it was measured and whether it held across different query types. The MinishLab team responded with specifics: the benchmark used grep-style searches against a mid-sized Python codebase, measuring tokens consumed per search operation. Exact reproduction, structural queries, and semantic queries were all tested. The 98% figure held for semantic queries. Exact string matches were closer to 85-90% reduction, which is still significant. One commenter noted that the real compounding effect happens over an agent session, not a single query. An agent doing 40-60 search operations during a complex task will blow through context limits or hit cost ceilings almost entirely due to search overhead. Semble targets that accumulation.

Two perspectives on whether this is worth integrating

Developer A (skeptic): My agents don't search that aggressively. I just structure my prompts to give the relevant files upfront. Why add another dependency?

Developer B: That works when you know which files are relevant. What happens when the agent is exploring? When it's doing a refactor and needs to find all call sites for a function it's never seen before?

Developer A: I pass in the whole file tree and let it reason.

Developer B: On how large a codebase?

Developer A: About 30,000 lines.

Developer B: Try it on 300,000. File tree alone costs you tokens before the agent does anything.

Developer A: Fair. But does semantic search actually return the right results when I need exact matches?

Developer B: That's the right question. Semble isn't replacing grep for exact string matches. It's adding a layer on top so the agent can decide which tool to reach for.

The 98% number and what it scales to

98%

reduction in tokens per search operation vs. grep, on semantic queries in MinishLab's benchmarks

Take that figure and run it through a realistic agent session. A developer using Claude Code on a 200,000-line codebase, asking it to complete a non-trivial feature, might trigger 50 search operations during the session. If each grep call returns an average of 800 tokens of context, that is 40,000 tokens just in search output. At Claude Sonnet 3.5 pricing (roughly $3 per million input tokens), that is about $0.12 per session in search overhead alone. Marginal, but multiply by a team of 10 running 5 sessions a day, and you are at $18 a day, $450 a month, just from grep output that 98% of the time the model did not need. At 98% reduction, that drops to roughly $9 a month. The savings are real at that scale, but the more important number is context window headroom. Tokens saved on search are tokens available for the actual task. An agent that uses less context on navigation can hold more of the relevant code in memory at once, which tends to produce better output. That second-order effect is harder to measure but probably more valuable than the direct cost reduction.

How to decide if Semble fits your setup

If your codebase is under 50,000 lines and your agents are task-specific with a narrow scope, skip it. The overhead of adding another tool probably outweighs the savings. If your codebase is 50,000-200,000 lines and you are running agents that do exploratory tasks (refactoring, codebase Q&A, dependency mapping), install it and benchmark one session with and without it. The GitHub repository includes a straightforward integration path for Claude Code via MCP. If your codebase exceeds 200,000 lines and your agents are doing anything involving cross-file search, this is not optional infrastructure. Grep-based search at that scale will either hit context limits, produce degraded outputs, or cost you meaningfully more than semantic search alternatives. Semble is currently the most token-efficient open-source option in this category. The closest alternatives are custom embedding pipelines built on top of tools like Chroma or Qdrant, which require significantly more setup. If you are running a multi-agent system where several agents are all searching the same codebase in parallel, the compounding effect is even more pronounced. Each agent's search calls stack independently. Semble's index is shared, so the token savings multiply across every agent in the session. If you are using Cursor versus GitHub Copilot and evaluating which handles large codebases better, note that both have their own internal retrieval mechanisms. Semble is most relevant for teams building on top of raw API access to Claude or similar models, where you control the tooling layer directly.

Back to the token bill question

The developer at the start of this post, watching their token costs climb on a large repo, has a specific answer now. The cost is coming from search verbosity. Grep returns context by the screenful. Agents process context by the token. That mismatch is what Semble addresses, and the 98% figure, while it will vary by codebase and query type, is directionally correct for the use cases where the pain is worst. If you are hitting this problem on a project over 100,000 lines, Semble is currently the most direct path to fixing it without rebuilding your tooling from scratch.