A voice agent that speaks still needs to listen.
That sounds obvious until the first live test.
The agent starts answering. A person interrupts. Another person adds context. Someone says “stop.” Someone else changes the question while the agent is halfway through its response.
Now the system has a harder problem than speech generation. It needs a duplex policy: how to listen while speaking, when to yield, when to stop, when to ignore noise, and when to hand the floor back to a person.
Without that policy, voice agents become socially awkward at best and operationally risky at worst.
The Demo Hides The Duplex Problem
Most demos are polite. The user asks. The agent answers. The user waits.
Meetings are not that tidy.
People interrupt because the answer is wrong, because they already got what they needed, because a more urgent point arrived, or because the agent should not be answering at all.
If the system treats speaking as a locked state, it misses those signals. If it treats every interruption as a new command, it becomes chaotic.
The system needs to decide what kind of interruption it is hearing.
Interruption Is Not One Behavior
Interruption can mean several different things:
- stop talking
- correct the previous context
- ask a follow-up
- change topic
- revoke consent
- hand the question to a human
- add a note without speaking
Those cases should not route through the same handler.
A business voice agent needs a policy layer that maps interruption types to allowed actions.
class InterruptionPolicy(BaseModel): interruption_type: Literal["stop", "correction", "follow_up", "topic_shift", "opt_out", "human_handoff"] stop_speech: bool update_artifact: bool answer_allowed: bool requires_human: boolAgain, the important part is not the specific schema. It is the decision to make interruption behavior explicit.
Yield Rules Create Trust
The agent should not compete for the floor.
In human meetings, yielding is a social signal. It shows that the speaker understands the room. For voice agents, yielding is also a safety signal.
The system should yield when:
- a participant starts speaking over it
- the request touches a human-owned decision
- context is incomplete
- the agent is asked to stop
- the conversation moves away from the agent’s task
A voice agent that yields cleanly feels controlled. An agent that keeps talking feels like a liability.
Human-Owned Decisions Need Harder Stops
The duplex problem becomes more serious when the interruption is about authority.
If someone asks the agent to confirm a price, approve scope, interpret contract language, or commit to delivery timing, the system should not improvise.
The correct response may be:
- stop speaking
- write the question into the artifact
- identify the human owner
- ask that owner to answer
- mark the point for follow-up
This is why voice-agent readiness belongs next to AI agent permission design, not just next to speech model selection.
Duplex Behavior Needs Observability
If the agent behaves poorly, the team needs to know why.
Was it speaking? Did it detect interruption? Did it classify the interruption correctly? Did it stop output? Did it update the artifact? Did it miss an opt-out command?
Those events should be logged as first-class system events, not discovered by watching a recording and guessing.
The same logic applies to broader agent observability for production audits: if the system can affect a workflow, its control decisions need traces.
What To Test
A real duplex test should include:
- interrupting the agent mid-answer
- correcting a fact while it speaks
- asking an unrelated follow-up
- asking it to stop
- asking it to leave the call
- changing from a safe topic to a human-owned decision
- resuming after interruption without losing the artifact thread
One smooth demo call does not prove this behavior. The system needs repeatable tests that create pressure on the turn-state layer.
The Architecture Smell
The warning sign is simple: if duplex behavior is described only in the prompt, the architecture is probably too soft.
The agent needs policy outside the prompt:
- stop conditions
- interruption classes
- allowed response modes
- opt-out handling
- human-handoff rules
- artifact update rules
That is what makes the system governable.
The decision rule
Do not test a real-time voice agent only on clean turn-taking. Test interruption policy, yield rules, opt-out handling, and reviewable traces before real calls. Duplex behavior is where a smooth demo becomes an operational system or a liability.