Case study

Hope 3.0: Evaluating AdventHealth's Next-Generation Care Chatbot

Hope is AdventHealth's care advocacy chatbot — designed to handle patient questions, navigate support requests, and reduce the need for human escalation. Hope 3.0 introduced generative capabilities meant to handle more nuanced conversations and resolve more issues without a handoff. This study evaluated whether it delivered on that promise: testing real task scenarios, measuring containment and satisfaction, and identifying exactly where the experience broke down.

Participants
24–26
Study type
Mixed-methods usability evaluation
Methods
Moderated scenario-based usability (think-aloud) · Unmoderated survey · Conversation quality rubric scoring
Decision area
Chatbot containment, escalation timing, task completion quality, and conversation experience across high-volume patient tasks.
Questions explored
Does Hope 3.0 improve containment and reduce escalations compared to its predecessor? How well does it handle nuanced, multi-step, or vaguely phrased requests? Where does the experience break trust, and what would restore it?

tl;dr

Hope 3.0 was built to do more — handle more nuanced patient requests, resolve more issues in-bot, and reduce the volume of conversations that required a human care advocate. This study put that premise to the test across the top patient-facing tasks by volume: account access, billing questions, price estimation, care navigation, and more. The study combined moderated sessions with real participants thinking aloud through task scenarios and an unmoderated survey measuring satisfaction and task completion at scale.

The findings were mixed in the right direction, but with clear gaps. The conversational tone worked. Directory lookups and labeled contact information were genuine wins. But the high-value flows patients cared about most — Price Estimator, billing guidance, escalation to a human — fell short in consistent and specific ways. Hope was pointing patients outward instead of resolving their needs inward. That distinction, between a bot that helps and a bot that redirects, is what the research made actionable.

  • Overall task completion rate was 57.6% and CSAT came in at 45% — numbers that point to a bot that's moving in the right direction but hasn't yet closed the gap between capability and patient experience.
  • The Price Estimator was high-value but fragile. Patients didn't know what information they'd need going in, and results didn't feel personalized to their specific plan or location.
  • Billing guidance needed plain language and a single recommended action — not multiple parallel links that left patients deciding where to go next.
  • Escalation to a human arrived too late and too generically. By the time patients saw a human option, they'd already lost confidence in the bot.
  • Containment reached 63% — a meaningful baseline, with a defined path to improvement across a small number of high-impact flow changes.

What We Learned

Hope Was Pointing Outward Instead of Resolving Inward

The most consistent pattern across tasks was that Hope responded to patient questions by directing them somewhere else — a link, a page, a menu of options — rather than answering the question directly. For patients who came to the chatbot specifically to avoid having to navigate elsewhere, this felt like the bot was adding work rather than reducing it. Open-text feedback called it out repeatedly: generic link-outs without synthesis felt like "more work," not help.

This pattern was most pronounced in billing and financial assistance tasks. Patients wanted Hope to explain what their bill meant and tell them one clear thing to do next. Instead, they received multiple similar links pointing toward different parts of the same problem. The intent to be helpful was there — but the execution created choice paralysis at the moment patients were most likely to disengage.

Outcome metrics
Where Hope 3.0 stands today
57.6%
Task completion
45%
CSAT
63%
Containment
A representative pattern surfaced across testing: patients who asked specific questions often received multiple parallel options rather than a direct answer — a pattern that consistently reduced satisfaction and confidence scores.

The fix isn't complex, but it requires a deliberate design shift: treat responses as outcomes, not menus. When a patient asks about their bill, Hope should explain it in plain language and present a single recommended action. When a patient needs a phone number, it should come with labeled hours and facility context — not just a link to a directory page.

High value, low success

The Price Estimator was one of the most sought-after capabilities in the study. Patients actively wanted to use it, tried it early, and expressed appreciation that the feature existed. That made its failure modes more consequential — they didn't know what information they'd need before starting, and the results didn't feel personalized to their specific plan or location.

The Price Estimator Had High Value and Low Success

The Price Estimator was one of the most sought-after capabilities in the study. Patients actively wanted to use it, tried it early, and expressed genuine appreciation that the feature existed. That made its failure modes more consequential. Participants consistently ran into the same wall: they didn't know what information they'd need before starting, and when they reached results, they weren't confident those results reflected their specific insurance plan or facility.

Two fixes would close most of the gap. A pre-flow primer — a brief "here's what you'll need" card before the estimator flow begins — would set expectations and reduce mid-flow abandonment. And a personalization callout — explicitly stating which plan and location Hope is using to generate an estimate — would address the confidence problem directly. Patients aren't asking for guarantees. They're asking for transparency.

Three Patterns That Surfaced Across the Study

Escalation Was Arriving Too Late

When patients hit dead ends — when rephrasing didn't work, when the bot returned low-confidence responses, when the same question produced the same unhelpful answer — they wanted a clear path to a human. That path existed in Hope 3.0, but it arrived too late in the conversation and too generically. A "Talk to a person" option that appears automatically after two rephrases, a low-confidence response, or a drop in sentiment — and that routes to the right department rather than a general queue — would meaningfully improve the experience at its most frustrating moments.

Directory and Contact Lookups Were a Genuine Win

Not everything was a gap. When Hope surfaced labeled phone numbers with hours and location context, participants responded positively and moved quickly toward resolution. This was the experience at its best: specific, actionable, and complete in a single exchange. The pattern that made directory lookups work — labeled, contextualized, one clear next step — is also the pattern that needs to be applied to every other task type.

Chat Discoverability and Reliability Undercut Trust Early

Several participants had difficulty finding the chat entry point, particularly on mobile. In some sessions, the icon was hidden or disappeared during navigation. Occasional reliability issues — "we're having some trouble" states with no recovery path — reduced trust in the bot before the conversation had a chance to succeed. These aren't chatbot problems. They're experience infrastructure problems. But they affect satisfaction scores and perception of the bot's overall quality.

Outcome of the Research

The study produced a prioritized set of recommendations organized around the flows with the highest volume and the widest gap between patient expectation and actual experience. Price Estimator, billing guidance, and escalation were the three areas where targeted changes would have the most immediate impact on containment, satisfaction, and confidence. The research also identified a set of instrumentation gaps — places where logging and analytics weren't yet capturing the signal needed to monitor improvement over time.

The broader framing the research offered was this: Hope 3.0 has the conversational capability to be genuinely useful. The barrier isn't the technology. It's the design decisions around how responses are constructed, when escalation is offered, and whether the experience treats patients as people who need answers or as navigators who need directions. Those decisions are correctable — and the study gave the team specific, evidence-backed direction on where to start.

Implications
  • Turn responses into outcomes. Every task should end with a plain-language answer and one clear recommended action — not a menu of links that asks patients to keep deciding.
  • Make escalation proactive, not reactive. Show a human handoff option automatically when intent confidence is low, when rephrases accumulate, or when sentiment drops — and route it to the right queue.
  • Instrument before the next release. Log fallback events, rephrase count, turns to resolution, and intent confidence so that containment and satisfaction improvements can be measured, not just estimated.

Continue Exploring

Contact

Want to talk through the Hope 3.0 study?

The containment numbers, the Price Estimator gap, or the escalation timing finding — happy to get into any of it.

A good conversation is usually the best start.