What is the duplex problem in voice agents?

The duplex problem is the need for a voice agent to manage speaking and listening at the same time, including interruption, user barge-in, pause handling, and yield behavior.

Why is duplex behavior risky?

If the agent keeps speaking through interruption, misses opt-out commands, or treats every interruption as a new request, it can become intrusive or unsafe in live workflows.

Should voice agents allow interruption?

Usually yes, but with policy. The system should know which interruptions stop speech, which update the artifact, which require clarification, and which trigger opt-out or human handoff.

How should teams test duplex behavior?

Use scripted calls that interrupt the agent, change topic mid-answer, request opt-out, and ask boundary-crossing questions while the agent is speaking.

The Hidden Duplex Problem In Realtime Voice Agents

A voice agent that speaks still needs to listen.

That sounds obvious until the first live test.

The agent starts answering. A person interrupts. Another person adds context. Someone says “stop.” Someone else changes the question while the agent is halfway through its response.

Now the system has a harder problem than speech generation. It needs a duplex policy: how to listen while speaking, when to yield, when to stop, when to ignore noise, and when to hand the floor back to a person.

Without that policy, voice agents become socially awkward at best and operationally risky at worst.

The Demo Hides The Duplex Problem

Most demos are polite. The user asks. The agent answers. The user waits.

Meetings are not that tidy.

People interrupt because the answer is wrong, because they already got what they needed, because a more urgent point arrived, or because the agent should not be answering at all.

If the system treats speaking as a locked state, it misses those signals. If it treats every interruption as a new command, it becomes chaotic.

The system needs to decide what kind of interruption it is hearing.

Interruption Is Not One Behavior

Interruption can mean several different things:

stop talking
correct the previous context
ask a follow-up
change topic
revoke consent
hand the question to a human
add a note without speaking

Those cases should not route through the same handler.

A business voice agent needs a policy layer that maps interruption types to allowed actions.

class InterruptionPolicy(BaseModel):
    interruption_type: Literal["stop", "correction", "follow_up", "topic_shift", "opt_out", "human_handoff"]
    stop_speech: bool
    update_artifact: bool
    answer_allowed: bool
    requires_human: bool

Again, the important part is not the specific schema. It is the decision to make interruption behavior explicit.

Yield Rules Create Trust

The agent should not compete for the floor.

In human meetings, yielding is a social signal. It shows that the speaker understands the room. For voice agents, yielding is also a safety signal.

The system should yield when:

a participant starts speaking over it
the request touches a human-owned decision
context is incomplete
the agent is asked to stop
the conversation moves away from the agent’s task

A voice agent that yields cleanly feels controlled. An agent that keeps talking feels like a liability.

Human-Owned Decisions Need Harder Stops

The duplex problem becomes more serious when the interruption is about authority.

If someone asks the agent to confirm a price, approve scope, interpret contract language, or commit to delivery timing, the system should not improvise.

The correct response may be:

stop speaking
write the question into the artifact
identify the human owner
ask that owner to answer
mark the point for follow-up

This is why voice-agent readiness belongs next to AI agent permission design, not just next to speech model selection.

Duplex Behavior Needs Observability

If the agent behaves poorly, the team needs to know why.

Was it speaking? Did it detect interruption? Did it classify the interruption correctly? Did it stop output? Did it update the artifact? Did it miss an opt-out command?

Those events should be logged as first-class system events, not discovered by watching a recording and guessing.

The same logic applies to broader agent observability for production audits: if the system can affect a workflow, its control decisions need traces.

What To Test

A real duplex test should include:

interrupting the agent mid-answer
correcting a fact while it speaks
asking an unrelated follow-up
asking it to stop
asking it to leave the call
changing from a safe topic to a human-owned decision
resuming after interruption without losing the artifact thread

One smooth demo call does not prove this behavior. The system needs repeatable tests that create pressure on the turn-state layer.

The Architecture Smell

The warning sign is simple: if duplex behavior is described only in the prompt, the architecture is probably too soft.

The agent needs policy outside the prompt:

stop conditions
interruption classes
allowed response modes
opt-out handling
human-handoff rules
artifact update rules

That is what makes the system governable.

The decision rule

Do not test a real-time voice agent only on clean turn-taking. Test interruption policy, yield rules, opt-out handling, and reviewable traces before real calls. Duplex behavior is where a smooth demo becomes an operational system or a liability.

The Hidden Duplex Problem in Realtime Voice Agents

The Demo Hides The Duplex Problem

Interruption Is Not One Behavior

Yield Rules Create Trust

Human-Owned Decisions Need Harder Stops

Duplex Behavior Needs Observability

What To Test

The Architecture Smell

The decision rule

Bring the system under review

Igor Bobriakov

AI Agents & Autonomous Systems

Aporia: Governed Threat Intelligence Research Assistant

Building a Governed Voice Agent for Real Business Meetings

Related Articles

A Smoke Test Is Not a Product Gate

Your Voice Agent Does Not Hear Sentences: It Hears Fragments

Voice Is the Interface. The Artifact Is the Product.