Andon Labs Lets AI Agents Fully Control Radio Stations

Andon Labs conducted an experiment giving AI agents autonomous control of radio stations without human oversight. The project explores both the real-world potential and risks of deploying fully autonomous AI systems in live broadcasting environments.

24 hours of uninterrupted AI-controlled radio broadcasts, zero human interventions. That is what Andon Labs ran with their Andon FM experiment, handing AI agents full autonomous control over programming, music selection, and on-air chatter for entire broadcast cycles with no human in the loop. That number is less impressive as a technical feat and more interesting as a stress test. Radio is not a forgiving medium. Dead air is immediately audible. A bad song transition at 2am on a human-staffed station gets fixed in seconds. A bad AI decision at 2am with no human oversight gets broadcast to whoever is listening, then archived.

The skeptic and the builder

Skeptic: Radio is a solved problem. Why does AI autonomy matter here?

Builder: Because radio is a real-time system with no undo. You can not roll back a broadcast. The agent has to sequence content, manage transitions, respond to the clock, and handle failure states with no human fallback. That is a better test of autonomous reliability than most chat demos.

Skeptic: But the stakes are low. Nobody cares if an AI radio station plays two sad songs in a row.

Builder: That is exactly the point. You want to observe failure modes where the consequences are low before you deploy the same architecture where they are not. AI-controlled broadcast is a sandboxed version of AI-controlled anything-with-a-real-time-output-stream.

Skeptic: So this is rehearsal infrastructure, not product.

Builder: Yes. That is the honest read.

What Andon Labs said about why they built this

"We wanted to see what happens when you remove the human from the loop entirely, not just automate individual tasks but give an agent genuine end-to-end ownership of a running system."

That framing is doing a lot of work. The distinction between automating tasks and giving an agent "end-to-end ownership" is the actual research question here. Most AI deployments in 2024 and 2025 are task-level: the human decides what to do, the AI does the specific thing, the human checks the output. Andon FM inverts that. The agent decides what to do, does it, and monitors itself. The risk that surfaces in that inversion is not the agent playing bad music. It is the agent encountering an ambiguous situation - a content rights conflict, a technical failure in the audio pipeline, an unexpected API response - and making a decision that compounds rather than recovers. Human operators handle ambiguity by escalating. Autonomous agents handle it by choosing. Whether those choices are recoverable depends entirely on how the failure states were designed, and most failure state design is discovered by breaking things in production.

How the options compare

Approach	Human oversight	Failure recovery speed	Content quality consistency	Operational cost
Traditional radio (human staff)	Continuous	Seconds	High variance by shift	High
Automated radio (scheduled playlists)	Setup only	Next scheduled check-in	Predictable but static	Low
AI agent radio (Andon FM model)	None by design	Depends on agent error handling	Variable, context-dependent	Low to medium

The comparison that matters most is column three. Scheduled playlists are boring but reliable. An AI agent can respond to context - time of day, listener feedback signals, content gaps - but that responsiveness introduces variance that a playlist never has. For a radio station, variance in content quality is acceptable. For a customer-facing agent managing anything consequential, the same variance profile looks different. If you want the cheapest possible 24/7 audio stream with no creativity requirements, scheduled automation still wins. If you want adaptive programming with zero staff, the Andon FM model is a credible direction. If you need guaranteed content standards and legal compliance on every broadcast, neither autonomous option is ready without additional guardrails.

Where autonomous audio agents break

The failure mode that matters here is not the dramatic one. It is not the AI saying something offensive or playing a three-hour loop by accident. Those are recoverable and visible. The failure mode that is harder to catch is content rights. Radio broadcasting involves performance licenses, and those licenses have conditions. A human programmer knows, roughly, which tracks are cleared and which require additional clearance for certain broadcast types. An AI agent operating without that institutional knowledge can queue content that is technically a rights violation, and because the violation is not immediately audible in the output, it does not surface until a rights holder's monitoring system flags it. By then the broadcast has already happened. Tools like ElevenLabs have spent significant engineering time on exactly this problem for generated audio: building licensing frameworks into the generation layer itself. Andon FM is working with existing music, which shifts the problem to selection and clearance rather than generation, but the failure class is the same. The agent does not know what it does not know about rights status, and no one is watching to catch the gap.

The number that defines the experiment

human interventions across 24+ hours of live broadcast

Zero interventions is the number Andon Labs is leading with, and it is worth being precise about what it measures. It measures that the system did not break in a way that required a human to fix it. It does not measure that every decision the agent made was optimal, or that no rights issues occurred, or that the output was better than what a competent human programmer would have produced. It measures survival. Survival is a real milestone. A lot of agentic systems that look solid in demos break within the first hour of unsupervised operation on a live system. Surviving 24 hours means the error handling is good enough to not catastrophically fail, which is a prerequisite for everything else. If that number were 2 hours instead of 24, the experiment would read as a proof-of-concept that needs more work. If it were 240 hours with logged decision quality metrics, it would start to look like evidence that the architecture is production-ready. At 24 hours with zero interventions but without detailed quality audits published, it sits in the middle: promising as a stability demonstration, insufficient as an argument for deploying the same pattern in higher-stakes contexts. The teams watching this experiment most carefully are probably not in media. They are building autonomous agents for customer service, logistics coordination, and infrastructure monitoring, all of which share the same core challenge: a system that runs continuously, makes decisions without escalation, and has to handle the full distribution of inputs including the weird ones. Infrastructure-scale agentic deployment is attracting serious investment precisely because the stability question is the hard part, and radio is a cheap place to learn how to answer it. If this experiment is on your radar, the one thing worth doing this week is pulling the Andon Labs post and reading specifically for how they handled failure states - not the success path, but what the agent does when a queued track fails to load, when an API call times out, or when the content pipeline produces an unexpected format. That section will tell you more about whether this architecture is transferable to your context than anything else in the writeup. If the failure handling is not documented, ask. Autonomous systems are only as reliable as their worst-case behavior, and the worst case is what the post probably does not lead with.