AI Agents: Exploring Agentic Applications
Applications based on LLMs are evolving into Agentic Applications. Agentic applications still have a Foundation Model as their backbone, but have more agency.
Introduction
Agentic applications are AI-driven systems designed to autonomously perform tasks and make decisions based on user inputs and environmental context.
These applications leverage advanced models and tools to plan, execute, and adapt their actions dynamically.
By integrating capabilities like tool access, multi-step reasoning, and real-time adjustments, agentic applications can generate and complete complex workflows and provide intelligent solutions.
I must add that while many theories and future projections are based on speculation, I prioritise prototyping and creating working examples. This approach grounds commentary in practical experience, leading to more accurate future projections.
Some Background
Generative and Language related AI are moving at a tremendous pace, as recent as 2018 the first notion of prompt engineering was introduced to combine NLP tasks and cast those as one question answering problem, within a specific context.
AS recent as Apr 2021, the term RAG as coined by a researcher, which was described as Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.
Only in January 2022 the chain-of-thought prompting technique was proposed by Google researchers.
September 2022 OpenAI introduced Whisper, an open-source acoustic model which approaches human level robustness and accuracy on speech recognition.
In 2023 we saw the progression of Large Language Models from a text-only interface, by introducing image processing and audio.
The term Foundation Model was an apt new reference to Large Language Models which, apart from generating compelling text, can also generate images, videos, speech, music, and more.
The term Foundation Model was coined by Stanford University Human-Centered Artificial Intelligence already in August 2021.
Also in 2023 we saw the rise of Small Language Models (SLMs). And even-though SLMs have a small footprint, they have advanced capabilities in reasoning, Natural Language Generation (NLG), context and dialog management, and more.
In 2023 we also saw the rise of Agents. Agents have as their backbone an LLM, while agents also have access to one or more tools to perform specific tasks.
Agents are able to answer highly ambiguous and complex questions…
Agents leverage LLMs to make a decision on which Action to take. After an Action is completed, the Agent enters the Observation step.
From Observation step, the Agent shares a Thought; if a final answer is not reached, the Agent cycles back to another Action in order to move closer to a Final Answer.
Agents are empowered by tools, these tools can include math libraries, web search, Weather APIs, and other integration points.
Agentic Applications can be seen as the next step in this progression where the agent application have more agency due to being able to browse and interpret the web, have mobile understanding and are capable of accessing multiple modalities.
Contextual Reference
For applications to truly have agency within a given ecosystem, integration and communication are required. Take for instance the research from Apple in terms of Ferrit-UI, where the phone screen, shown below on the left, is defined by bounding boxes, with names and descriptions.
These descriptions of the screen, with the coordinates, can be used to guide the user regarding a specific question. For instance, the user can ask, How do I crate a new shortcut?
. And the Agentic application will be able to highlight and guide the user to the appropriate place within the GUI.
This type of natural language integration allows for a deeper level of comprehension with understanding user intent, supplemented with vital relevant information, UI location and interactive information.
Increased Agency
A recent study focussed on how Large Language Models can be utilised more extensively by transitioning to a more dynamic , interactive system in broader domain implementations.
Current language agent frameworks focus on facilitating the construction of proof-of-concept language agents, but they often overlook accessibility for non-expert users and pay little attention to application-level design.
The framework envisioned in this study introduces OpenAgents, an open platform designed for using and hosting language agents in everyday life.
OpenAgents includes three main agents:
Data Agent
Handles data analysis using Python/SQL and various data tools.
Plugins Agent
Integrates with over 200 daily API tools.
Web Agent
Facilitates autonomous web browsing.
OpenAgents allows general users to interact with these agents through a web interface optimised for swift responses and common failures. It also provides developers and researchers with a seamless deployment experience on local setups, laying the groundwork for creating innovative language agents and enabling real-world evaluations.
The OpenAgents platform caters to general users, developers, and researchers:
General Users
Can interact with the agents through an online web interface, eliminating the need for programmer-oriented consoles or packages.
Developers
Can effortlessly deploy the frontend and backend for further developments using the provided codebase.
Researchers
Can build new language agents or agent-related methods using the examples and shared components, and evaluate their performance with the web UI.
Three Essential Components
Language Model
Tool Interface
Environment
Challenges
For User Interface Implementation:
Ensuring intuitive and user-friendly interaction for non-expert users.
Optimising response times to provide swift feedback.
Handling common errors gracefully to enhance user experience.
For Language Agents:
Seamless integration of diverse tools and APIs.
Efficient and reliable execution of complex tasks.
Real-world applicability and robustness in varied environments.
In Conclusion
Some aspects of agents as we know them are carried forward into the concept of agentic applications.
These include a backbone language or foundation model integrated with defined tools that can be accessed.
The user interface is crucial for broad adoption by non-technical users.
Additionally, a web browser plays a vital role in executing tasks, providing the agentic application with a level of autonomy.
I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.