Five Stages Of LLM Implementation [Updated]
Large language models have significantly improved the capabilities of conversational AI systems, enabling more natural and context-aware interactions. This has led to increased adoption of AI-powered
Introduction
What is clear from market adoption in terms of LLMs, the LLM tooling development is far ahead of LLM implementation; with real-world customer facing execution lagging…for obvious reasons.
From a tooling perspective most of the focus and consideration is given to LLM Stage 4 but soon organisations will learn that any successful AI implementation requires a successful data strategy. And Hence LLM Stage 5 will receive much more focus.
When LLM implementations moved from Stage 1 to Stage 2, from design-time use to run-time use, there was a realisation that data needs to be delivered to the LLM at inference.
The importance of In-Context Learning (ICL) has been highlighted by numerous studies, and hence the importance of injecting prompts with highly succinct, concise and contextually relevant data.
Taking LLMs To Production
I recently asked on LinkedIn what are the challenges in taking LLMs to production, below are the top five concerns raised. These concerns all exist due to the fact LLMs are primarily hosted by LLM providers and made available via an API.
Making use of commercially available APIs introduces an operational component which is near impossible to manage.
The ideal would be for an organisation to have a local installation of an LLM they can make use of. But this comes with challenges most organisations cannot address, like hosting, processing power and other technical demands.
Yes, there are “raw” open-sourced models available, but again the impediment here is hosting, fine-tuning, technical expertise etc.
These problems can be solved for, by making use of a Small Language Model, which in most cases are more than sufficient for Conversational AI implementations.
LLM Disruption: Stage One
AI Assisted & Accelerated NLU Development
Stage 1 of LLM implementations were focussed on the bot development process, and in specific accelerating NLU development.
What really enabled the LLM Stage 1 disruption was the fact that LLM functionality was introduced at design time as opposed to run time.
This meant that elements like inference latency, high volume use, cost and LLM response aberrations could be confined to development and not exposed to customers in a production setting.
In the example above, a complex sentence is submitted to the LLM, the LLM is able to extract all the relevant entities from the sentence and calculate the total number of people traveling.
LLMs, were introduced to assist with the development of NLU in the form of clustering existing customer utterances in semantically similar groupings for intent detection. Once intent labels and descriptions were defined, intent training utterances could be defined.
LLMs could also be used for entity detection and more.
Considering the image below, Conversational AI really only requires the five elements shown below. And a traditional NLU engine can be used in conjunction with a SLM.
Since the advent of chatbots, the dream was to have reliable, succinct, coherent and affordable NLG functionality. Together with basic built-in logic and common-sense reasoning ability.
Add to this a flexible avenue to manage dialog context and state, and a more knowledge intensive solution than NLU, and SLMs seem like the perfect fit.
Small Language Models (SLMs) serves as a good companion technology for NLU. The SLM can run locally, open-source models can be used to perform tasks like Natural Language Generation (NLG), dialog & context management, common-sense reasoning, small talk, etc.
It is highly feasible to use NLU in conjunction with a SLM to underpin a chatbot development framework.
Running a SLM locally and using an augmented generation approach with in-context learning can solve for impediments like inference latency, token cost, model drift, data privacy, data governance and more.
AI Assisted & Accelerated Chatbot Development
Copy Writing & Personas
The next phase of LLM disruption was to use LLMs / Generative AI for chatbot & Voicebot copy writing and to improve bot response messages.
This approach again was introduced at design time as opposed to run time, acting as an assistant to bot developers in crafting and improving their bot response copy.
Designers could also describe to the LLM a persona, tone and other personality traits of the bot in order to craft a consistent and succinct UI.
This is the tipping point where LLM assistance extended from design time to run time.
The LLM was used to generate responses on the fly and present it to the user. The first implementations used LLMs to answer out-of-domain questions, or craft succinct responses from document search and QnA.
LLMs were leveraged for the first time for:
Data & context augmented responses.
Natural Language Generation (NLG)
Dialog management; even though only for one or two dialog turns.
Stage 1 was very much focused on leveraging LLMs and Gen-AI at design time which has a number of advantages in terms of mitigating bad UX, cost, latency and any aberrations at inference.
The introduction of LLMs at design time was a safe avenue in terms of the risk of customer facing aberrations or UX failures. It was also a way to mitigate cost and not face the challenges of customer and PII data being sent into the cloud.
Flow Generation
What followed was a more advanced implementation of LLMs and Generative AI (Gen-AI) with a developer describing to the bot how to develop a UI and what the requirements are for a specific piece of functionality.
And subsequently the development UI went off, leveraging LLMs and Generative AI, it generated the flow, with API place holders, variables required and NLU components.
LLM Disruption: Stage Two
Text Editing
Stage two saw LLMs being used to edit text prior to sending the bot response to the user. For instance, on different chatbot mediums the appropriate message size differs. Hence bot responses could be easily controlled by asking the LLM to summarise, extract key points and change the tone of the response based on user sentiment.
This meant that the hard requirement for a message abstraction layer was deprecated to some degree. In any chatbot / Conversational AI development framework the job of the message abstraction layer is to hold a whole array of bot response messages.
These bot response messages had placeholders which needed to be filled with context specific data to respond to the user with.
Different sets of responses had to be defined for each modality and medium. LLMs made the crafting of responses on the fly easier. This was the NLG (Natural Language Generation) tool we were all waiting for.
Document Search & Document Chat
Chatbots can be given a document, piece of information at inference, this allowed for the LLM to have a frame of reference for the conversation.
Scaling this approach had two impediments, the first is the impediment of limited LLM context windows, and also scaling this approach.
RAG
Rag served as a solution to the problems mentioned above. Read more about RAG here.
Prompt Chaining
Prompt Chaining found its way into Conversational AI development UIs, with the ability to create flow nodes consisting of one or more prompts being passed to a LLM.
Longer dialog turns could be strung together with a sequence of prompts, where the output of one prompt serves as the input for another prompt.
Between these prompt nodes are decision and data processing nodes…so prompt nodes are very much analogous to traditional dialog flow creation, but with the flexibility so long yearned for.
LLM Disruption: Stage Three
Custom Playgrounds
Technology suppliers started creating their own custom playgrounds with extra features and acting as an IDE and collaboration space.
This moved users beyond using LLM-based playgrounds only. Custom playgrounds offered access to multiple models for experimentation, collaboration and various starter code generation options.
Prompt Hubs
Both Haystack and LangChain have launched open community-based prompt hubs.
Prompt hubs help to encode and aggregate best practices for different approaches to Prompt Engineering. The vision is for Gen-Apps to become LLM agnostic where different models are to be used at different stages in the application.
No-Code Fine-Tuning
While fine-tuning changes the behaviour of the LLM and RAG provides a contextual reference for inference, fine-tuning has not received the attention it should have in the recent past. One can argue that this is due to a few reasons…read more here…
LLM Disruption: Stage Four
Prompt Pipelines
In Machine Learning a pipeline can be described as an end-to-end construct, which orchestrates a flow of events and data.
The pipeline is kicked-off or initiated by a trigger; and based on certain events and parameters, a flow is followed which results in an output.
In the case of a prompt pipeline, the flow is in most cases initiated by a user request. The request is directed to a specific prompt template.
Read more here.
Autonomous Agents
Agents make use of pre-assigned tools in an autonomous fashion to perform one or more actions. Agents follow a chain-of-thought reasoning approach.
The concept of autonomous agents can be daunting at first, read more here…
Orchestration
From this point onwards, the market has not really caught up…orchestration refers to orchestrating multiple LLMs for an application.
LLM Hosting
Most of the ailments plaguing LLM implementations are related to LLMs not being self-hosted, or hosted in a private data centre / cloud.
Delayed responses at inference, model drift, data governance and more are all factors which are solved if LLMs are self-hosted and managed.
LLM Disruption: Stage Five
Data Discovery
Data Discovery is the process of identifying any data within an enterprise which can be used for LLM fine-tuning. The best place to start is with existing customer conversations from the contact centre which can be voice or text based. Other good sources of data to discover are customer emails, previous chats and more.
This example from Kore XO platform shows how different sources of information can be imported and clustered according to semantic similarity.
This data should be discovered via an AI accelerated data productivity tool (latent space) where customer utterances are grouped according to semantic similarity, These clusters can be visually represented as seen below, which are really intents or classification; and classifications are still important for LLMs.
Data Design
Data design is the next step where the discovered data is transformed into the format required for LLM fine-tuning. The data needs to be structured and formatted in a specific way to serve as optional training data. The design phase compliments the discovery phase, at this state we know what data is important and will have the most significant impact on the users and customers.
Hence data design has two sides, the actual technical formatting of data and also the actual content and semantics of the training data.
Data Development
This step entails the operational side of continuous monitoring and observing of customer behaviour and data performance. Data can be developed by augmenting training data with observed vulnerabilities in the model.
Data Delivery
Data Delivery can be best described as the process of imbuing one or more models with data relevant to the use-case, industry and specific user context at inference.
The contextual chunk of data injected into the prompt, is referenced by the LLM to deliver accurate responses in each and every instance.
Often the various methods of data delivery are considered as mutually exclusive with one approach being considered as the ultimate solution.
This point of view is often driven by ignorance, a lack of understanding, organisation searching for a stop-gap solution or a vendor pushing their specific product as the silver bullet.
The truth is that for an enterprise implementation flexibility and manageability will necessitate complexity.
This holds true for any LLM implementation and the approach followed to deliver data to the LLM. The answer is not one specific approach, for instance RAG, or Prompt Chaining; but rather a balanced multi-pronged approach.
I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.