The Conversational AI Technology Landscape: Version 5.0
This market architecture map, originating from research focused on chatbot development frameworks, serves as the foundation of my original work.
Over time, the chatbot market has undergone five significant disruptions…
The architecture map has been updated to cover a broader array of technologies, such as LLMs, search, Voicebots, testing, NLU tooling, and beyond.
Should there be any other pertinent products that have inadvertently been omitted, please do not hesitate to bring them to my attention so that I may update the chart.
Three Waves Of Disruption
As I have alluded in previous articles, chatbot IDE’s was very much settled in their basic architecture in terms of:
NLU (intents & entities),
Flow,
Message abstraction &
In some instances a knowledge base search option for fallback.
This settled architecture was disrupted by three waves which was caused by technology innovation and customer demand:
Voicebots
Agent Desktops
Cognitive Search (RAG)
LLMs
Voicebots
A natural progression from chatbots was to voice enable them and introduce voicebots. Voicebots can be app based, but the holy grail of customer experience automation is having a voicebot which front-ends a contact centre.
Thus having customers phone in, and have a natural conversation, in voice, with a voicebot as they would with a live agent.
Creating a voicebot is seen by some as a simple equation:
Voicebot = ASR + chatbot + TTS
However, this is not the case, and voicebots introduce a whole host of challenges, which include:
Latency: voicebots are much more susceptible to latency than chatbots, and silence on a voice call often leads to a disconnect. Silence on a phone call cannot be longer than 500 milliseconds. Users are also less tolerant as a voice call is synchronous, as apposed to a chatbot session which is asynchronous.
Added Complexity: voicebots are just inherently more complex than chatbots with additional technology like ASR and TTS added, and a flow which demand resilience and complexity which is not required for chatbots.
Added Translation Layer: Even-though some voicebot technology providers say they have solved for this extra step. But in most cases and on a practical level, the user voice input needs to be converted to text from voice by making use of ASR technology. This conversion from voice to text introduces errors, which can be managed by measuring Word Error Rate (WER) at regular intervals.
Digression: Users are more prone to digress in a voice conversation as apposed to a text conversation.
Interruption: During a voice session, users are more prone to interrupt themselves in order to rephrase their statement or question, self-correct or be interrupted by a third party.
Background Noice: Background noise plays a big part in accurately translating sound to text. Dedicated devices like Google Home or Alexa have advantages which a telephone call does not have. A normal phone call does not have the luxury of a dedicated array of microphones waiting to capture the users utterances, in a quiet setting in a home or office.
Agent Desktop
The agent desktop needs to be integrated to the chatbot for a seamless transition from a user perspective. And agent experience (AX) has become as important as customer experience (CX).
Agent Desktops should provide an AI-powered hub for agents to manage customer interactions across multiple digital channels, offering real-time help to agents and integrating with virtual assistants for better service.
Elements of AX should include:
Centralise Information Access
Suggestions on the best actions and responses based on the customer’s history, mood, and goals
Conversation suggestions, sentiment tracking, adherence checks, and actionable insights empower agents to deliver outstanding customer experiences.
Playbooks are pre-defined strategies and workflows navigate agents through complex scenarios, ensuring consistency, accuracy, and superior customer interactions.
Cognitive Search (RAG)
One of the challenges of chatbots has been the fact that chatbots cover a finite and definite domain. Added to this is the challenge that users often first choose to explore chatbot functionality with rather random and diverse questions and conversations.
One way of broadening a chatbot’s ambit is finding ways to leverage existing documents and other organised sources of data in a fast and efficient way.
Obviously RAG is becoming a common approach for cognitive search and imbuing conversational UIs with data. However, more than three years ago I wrote a few articles on how to add search skills to chatbots by uploading documents.
LLMs
LLMs have disrupted the chatbot IDE ecosystem from design time all the way through to run time. It has changed the way we develop and run conversational UI’s in production.
I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.