What’s Your Definition Of An AI Agent?
About 18 months ago I wrote my first article on AI Agents based on AI Agent frameworks by LangChain. Fast Forward to the last few weeks & AI Agents are in the news like RAG was a few months ago...
And this prompts a question with me, what defines an AI Agent, and what is required to delivery enterprise ready Agentic Implementations?
AI Agents 101
The Basics
Considering the graphic above, an AI Agent can be defined as a piece of software, with one or more Language Models as its backbone.
For the model to have visual capabilities a Language Model / Foundation Model with vision capabilities is a requirement.
Task Decomposition
An agent at this stage primarily has a conversational input approach, hence unstructured data is used for user input.
And the response from the AI Agent is also most often in natural language, leveraging the Natural Language Generation (NLG) capabilities of Language Models.
I often use this example, you should be able to ask an AI Agent, the following question:
What is the square root of the year of birth of the man commonly regarded as the father of the iPhone.
This is a very hard question for any traditional Conversational UI to answer, but for an AI Agent it is easy.
Way Of Work
The AI Agent starts of by decomposing this compound and slightly ambiguous question into sub-steps, and then sets of solving for each of these sub-sets or steps.
Each of these steps can be seen or considered as an action.
Agents leverage LLMs to make a decision on which Action to take next.
After an Action is completed, the Agent enters the Observation step.
From the Observation step, the AI Agent shares a Thought; if a final answer is not reached, the AI Agent cycles back to another Action in order to move closer to a Final Answer.
Level of Autonomy
The level of autonomy of an AI Agent is determined by the number of iterations the AI Agent can go through. This is important from a cost perspective, overhead and latency.
Secondly, if the AI Agent is unable to reach a conclusion or solve a task, one of the tools (we’ll look at tools in a bit) at the AI Agent’s disposal can be a human which can be pinged for guidance.
Number of tools at the disposal of the AI Agent is another determining factor in terms of the AI Agent’s autonomy.
Tools
Tools can be considered as integration points or touch points to external systems or API’s. The number and nature of tools at the disposal of the AI Agent really determines what the AI Agent is capable of.
Tools are described in natural language, and can range from a web search API, OS GUI navigation, maths library, weather API, CRM integration, etc. etc.
As the AI Agent decomposes a problem into sub-steps or actions, solving for each of these actions or steps will most probably involve the use of a tool.
Observability
Considering the image below, it is evident the level of observeabiltty which can be achieved with regards the internal workings of AI Agents. Notice how the AI Agent steps through the Thought, Action, Observation, and so on.
Symbolic Reasoning
An element which is easily overlooked is Symbolic Reasoning…
Symbolic reasoning is crucial within language models, particularly for AI Agents with vision capabilities, as it enables these systems to understand and manipulate abstract concepts alongside visual inputs.
By integrating symbolic reasoning, AI Agents can interpret and link symbols, rules and logical relationships, helping them to perform complex tasks that require more than simple pattern recognition.
For instance, in visual environments, symbolic reasoning allows an AI Agent to deduce spatial relationships, understand object properties, and make informed decisions based on both images and inferred knowledge.
This becomes essential for tasks such as scene understanding or problem-solving, where the agent must reason about objects and their potential interactions rather than just identify them.
Ultimately, symbolic reasoning enhances the model’s ability to deliver reliable, interpretable outcomes, making AI agents more versatile and effective in dynamic, real-world contexts.
Ecosystem
Considering the image below, AI Agents require an ecosystem, a place to live.
Ideally this place to live is an AI Productivity Suite, where Language Models can be deployed into a private instance.
Elements like assertions and guardrails can be hosted in this suite / ecosystem, model fine-tuning is also accommodated here.
Orchestration
A level of orchestration is required, where tasks, models, other technologies like legacy conversational flows and data for automation can be managed.
Agentic X
Even though AI Agents are often viewed as standalone entities, Agentic X represents an “AI Inside” approach, where AI capabilities are embedded seamlessly within a broader system.
Rather than functioning independently, Agentic X empowers existing environments, devices, or platforms with intelligent functionalities, allowing the AI to operate in the background while enhancing user interactions.
This approach shifts the focus from interacting directly with an AI Agent to engaging naturally with an AI-enhanced environment.
By embedding intelligence within familiar interfaces, Agentic X provides users with an intuitive, augmented experience, bridging the gap between traditional tools and advanced AI capabilities.
The graphic below shows the advent of ACI — AI Agent as a Computer Interface.
Finally
A crucial component will be the AI Agent builder UI, which currently supports agent creation in a pro-code format.
However, to truly scale the development and management of AI agents, a flexible, intuitive no-code to low-code solution would be transformative.
Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.