Broccoli: Open-Source AI Coding Agent for Cloud Tasks

Broccoli is an open-source harness that automates coding tasks from Linear, executes code in isolated cloud sandboxes, and generates pull requests for review.

TL;DR

Broccoli is an open-source coding agent that reads tickets from Linear, executes code in isolated cloud sandboxes, and opens pull requests without a human in the loop. It is worth setting up if your team already uses Linear and wants to automate first-pass implementation on well-scoped tasks. It is not a replacement for a senior engineer's judgment on architecture or ambiguous requirements.

The most underrated bottleneck in software teams is not writing code - it is the gap between "ticket created" and "first commit." Broccoli is an open-source project that targets that exact window, and the approach is more pragmatic than most agent frameworks that have shipped in the last six months. The Broccoli repository describes itself as a "one shot coding agent." You point it at a Linear workspace, it picks up a task, runs the code in an isolated cloud sandbox, and opens a pull request. The loop is: read ticket, write code, verify it runs, ship PR for human review. No persistent memory between tasks, no multi-agent coordination, no elaborate planning phase. One shot, one PR. That constraint is not a limitation. It is a design choice, and it is a good one.

The pattern this resembles

Continuous integration changed how teams thought about the cost of being wrong. Before CI, a broken build was discovered in review or, worse, in production. CI made the feedback loop fast enough that "try it and see" became a reasonable strategy for a certain class of changes. Automated coding agents are doing something structurally similar to what CI did, but one step earlier in the pipeline. The question CI answered was "does this code work?" The question an agent like Broccoli is trying to answer is "can this ticket become a working first draft without a human starting it?" History suggests this kind of shift happens faster than people expect for small, well-defined tasks and much slower than people expect for anything requiring judgment. CI did not replace code review. It made code review more valuable by filtering out the obvious failures. Coding agents are likely to follow the same arc. They will handle the mechanical work, and that will make the remaining human judgment more visible and more expensive to get wrong. The closest prior analogy in the agent space is Devin, which launched in early 2024 with significant hype and then landed in a quieter place: useful for some tasks, unreliable for others, and most valuable when the task specification was precise. Broccoli is open-source and narrower in scope, which is probably the right lesson learned from that cycle.

Where teams go sideways with this

The failure mode with one-shot agents is almost always the ticket, not the agent. A ticket that says "fix the login bug" will produce a PR that either does nothing useful or changes the wrong thing. A ticket that says "the password reset email is not sent when a user has a Google OAuth account linked - the send_reset_email function in auth/mailer.py skips the call when provider is not null, remove that condition and add a test" will produce something reviewable. That is not a criticism of Broccoli specifically. It is the central problem with any agent that reads natural language task descriptions. Garbage in, garbage out is more consequential when the output is a diff that touches production code. The second common mistake is treating the sandbox isolation as a substitute for a proper test suite. The sandbox prevents Broccoli from breaking your production environment, which is important. But if your repository has weak test coverage, the agent has no signal for whether its output is correct. It will produce code that runs without errors and is still wrong in ways that only show up under real conditions. A clean CI run is not the same as correct behavior. The third mistake is scope creep in ticket writing. Teams often try to fit too much into a single agent task once they see it working on small ones. The "one shot" framing is intentional. Broccoli is not designed to orchestrate a multi-file refactor across subsystems. Use it for contained, well-specified work.

The skeptic's case

The strongest objection to Broccoli and tools like it is not that they fail - it is that they introduce a new review burden that cancels out the time saved. Here is the argument in its best form: before the agent, a junior developer writes a PR. You review it, leave comments, they revise. The cognitive load is distributed because the junior dev did the first-pass thinking. With an agent, you review a PR that was written by something that has no stake in being right, no context from the standup last Tuesday, and no memory of the last three times someone touched that file. You may spend as much time checking the agent's work as you would have spent writing it yourself - or more, because you cannot ask the agent follow-up questions in the same way. This objection is real and it is not fully answered yet. The best counter is that agent-generated PRs are most valuable when a human would have otherwise procrastinated on the task for two days before starting. If the choice is "agent PR in four minutes that takes thirty minutes to review" versus "human PR in two days," the agent wins. If the choice is "agent PR in four minutes" versus "experienced developer who already knows the codebase writes it in twenty minutes," the math is less clear.

The mechanism under the hood

Broccoli's core structure is worth understanding because it tells you where the tool will and will not generalize. The Linear integration pulls ticket data and treats it as the task specification. The agent then runs in a cloud sandbox - an isolated environment that has access to your codebase but is separated from your production systems. This is the part that makes "one shot" viable: the agent can run arbitrary code without you worrying about it touching a live database. The PR generation step is where most of the review value is created. Broccoli is not just running code and dumping a diff. The PR is the artifact that your team reviews, and how it is structured determines whether the review is fast or painful. The "one shot" framing also means there is no retry loop with human feedback mid-task. The agent makes its best attempt, opens the PR, and stops. This is different from Claude Code's interactive mode or Cursor's in-editor loop, where a developer can nudge the model during the session. Broccoli is closer to a batch job than an interactive assistant. That is a tradeoff: you lose fine-grained control, you gain full automation and the ability to run it unattended. The cloud sandbox model also sidesteps the local environment problem that plagues most open-source agent setups. You do not need to configure the agent to match your machine. The sandbox is consistent. That alone removes a category of failure that burns time in most self-hosted agent experiments.

On the Linear integration

Broccoli: Open-Source AI Coding Agent for Cloud Tasks — Source: Hacker News

Linear's ticket format is well-structured enough to serve as agent input, which is part of why this integration makes sense. Jira's more variable formatting would introduce more noise. If your team uses GitHub Issues or Notion for task tracking, Broccoli in its current form is not set up for you.

Specific workflows where Broccoli earns its place

If you are triaging a backlog of small, well-defined bugs and the tickets are specific enough to include file paths and expected behavior, Broccoli can clear those faster than any human would prioritize them. If you are maintaining an open-source library and contributors regularly open issues with clear reproduction steps, you can run Broccoli against those issues to generate candidate fixes before a maintainer looks at them. The maintainer reviews a diff instead of starting from scratch. If your team writes Linear tickets with acceptance criteria that can be verified by running tests, this is close to the ideal use case. The sandbox runs your test suite, the PR either passes or fails, and your review time is minimal for the passing cases. If you are doing greenfield architecture work, evaluating design tradeoffs, or working in a codebase where the right answer requires reading six files of context before touching a seventh, skip it. The one-shot constraint is a hard ceiling on the complexity Broccoli can handle well. For that kind of work, an interactive tool like Cursor or GitHub Copilot in an agentic mode will keep you in the loop at the right moments. If your team does not use Linear, the integration question is a blocker today. That may change if the project adds connectors for other task managers, but as of the current repository state, Linear is the entry point. For teams evaluating open-source agent options, it is also worth comparing Broccoli's one-shot model against Goose, which takes a more interactive approach to agent sessions with broader tool support. They are solving adjacent problems rather than the same one. The earliest you could run a meaningful test of Broccoli is this week, assuming you have a Linear workspace with a handful of small, specific tickets and a codebase with reasonable test coverage. Set up the sandbox, point it at three or four tickets, review the PRs it produces, and measure the review time against what you would have spent writing the code yourself. That data will tell you faster than any benchmark whether the one-shot tradeoff works for your team's ticket quality.