Semble: Code Search for Agents Uses 98% Fewer Tokens
Open-source tool optimizes code search in large codebases for Claude Code and other agents, dramatically reducing token consumption compared to traditional grep methods.
May 19, 2026

How grep breaks agent workflows at scale
The failure mode is predictable once you have seen it once. An agent needs to find where a function is defined, or locate all the places a particular pattern appears across a large codebase. It calls grep, or the equivalent file-reading loop, and gets back thousands of lines of context. All of that goes into the context window. The model processes it, finds the two relevant lines, and moves on. The other 2,800 lines were noise that you paid for. This is not a bug in the agent. It is a structural problem with how search tools were designed. Grep was built for humans reading a terminal. When a human runs grep, they skim the output in seconds. When an agent runs grep, it has to process every character of the output through the model. The token cost of the search is proportional to the verbosity of the results, not the relevance of them. On small projects this barely matters. On a codebase with 500,000 lines of code across hundreds of files, it becomes the dominant cost center in any agentic workflow. Teams using Claude Code or Cursor on enterprise-scale repos have reported this pattern consistently: the agent spends more tokens navigating than it does writing.
What the HN thread said
The core insight is that agents don't need grep output - they need grep's answer. There's a massive difference between returning 3,000 lines of context and returning "the function is in auth/session.py, line 47."That comment from the HN thread captures the design philosophy more cleanly than the README does. Semble uses semantic embeddings to find relevant code, then returns structured references instead of raw file contents. The agent gets a pointer, not a dump. Several commenters pushed back on the 98% figure, asking how it was measured and whether it held across different query types. The MinishLab team responded with specifics: the benchmark used grep-style searches against a mid-sized Python codebase, measuring tokens consumed per search operation. Exact reproduction, structural queries, and semantic queries were all tested. The 98% figure held for semantic queries. Exact string matches were closer to 85-90% reduction, which is still significant. One commenter noted that the real compounding effect happens over an agent session, not a single query. An agent doing 40-60 search operations during a complex task will blow through context limits or hit cost ceilings almost entirely due to search overhead. Semble targets that accumulation.
Two perspectives on whether this is worth integrating
Developer A (skeptic): My agents don't search that aggressively. I just structure my prompts to give the relevant files upfront. Why add another dependency?
Developer B: That works when you know which files are relevant. What happens when the agent is exploring? When it's doing a refactor and needs to find all call sites for a function it's never seen before?
Developer A: I pass in the whole file tree and let it reason.
Developer B: On how large a codebase?
Developer A: About 30,000 lines.
Developer B: Try it on 300,000. File tree alone costs you tokens before the agent does anything.
Developer A: Fair. But does semantic search actually return the right results when I need exact matches?
Developer B: That's the right question. Semble isn't replacing grep for exact string matches. It's adding a layer on top so the agent can decide which tool to reach for.
The 98% number and what it scales to
98%
reduction in tokens per search operation vs. grep, on semantic queries in MinishLab's benchmarks
How to decide if Semble fits your setup
If your codebase is under 50,000 lines and your agents are task-specific with a narrow scope, skip it. The overhead of adding another tool probably outweighs the savings. If your codebase is 50,000-200,000 lines and you are running agents that do exploratory tasks (refactoring, codebase Q&A, dependency mapping), install it and benchmark one session with and without it. The GitHub repository includes a straightforward integration path for Claude Code via MCP. If your codebase exceeds 200,000 lines and your agents are doing anything involving cross-file search, this is not optional infrastructure. Grep-based search at that scale will either hit context limits, produce degraded outputs, or cost you meaningfully more than semantic search alternatives. Semble is currently the most token-efficient open-source option in this category. The closest alternatives are custom embedding pipelines built on top of tools like Chroma or Qdrant, which require significantly more setup. If you are running a multi-agent system where several agents are all searching the same codebase in parallel, the compounding effect is even more pronounced. Each agent's search calls stack independently. Semble's index is shared, so the token savings multiply across every agent in the session. If you are using Cursor versus GitHub Copilot and evaluating which handles large codebases better, note that both have their own internal retrieval mechanisms. Semble is most relevant for teams building on top of raw API access to Claude or similar models, where you control the tooling layer directly.Back to the token bill question
The developer at the start of this post, watching their token costs climb on a large repo, has a specific answer now. The cost is coming from search verbosity. Grep returns context by the screenful. Agents process context by the token. That mismatch is what Semble addresses, and the 98% figure, while it will vary by codebase and query type, is directionally correct for the use cases where the pain is worst. If you are hitting this problem on a project over 100,000 lines, Semble is currently the most direct path to fixing it without rebuilding your tooling from scratch.Comments
Leave a comment
Some links in this article are affiliate links. Learn more.