Agent AI: Agentic Applications Are Software Systems With A Foundation Model AI Backbone & Defined Autonomy via Tools
MindSearch is a framework designed to mimic the human mind by breaking down complex questions, creating a sequence of events, and solving problems through web search.
Introduction
Since the introduction of Large Language Models (LLMs) there have been two big progressions. The first shift is model related, and the second flow engineering related.
Model
From an LLM only environment, we have moved to the introduction of Small Language Models. Models with exceptional capabilities in reasoning, context dialog turn & history management but without the burden of being knowledge intensive.
Quantisation software for local/edge/offline inference is easily accessible, many highly capable models are open-sourced and readily available via no-code model deployment and hosting options.
Models are also becoming multi-modal, with image ingesting and processing playing a big role in making agents (agentic applications) more autonomous in terms of navigating screens.
Flow Engineering
Prompt Engineering alone was not enough and we had to find a way of re-using prompts; hence templates were introduced where key data fields could be populated at inference. This was followed by prompts being chained to create longer flows and more complex applications.
Chaining was supplemented with highly contextual information and inference, giving rise to an approach leveraging the In-Context Learning (ICL) via Retrieval Augmented Generation (RAG).
The next step in this evolution is Agentic Applications (AI Agents) where a certain level of agency (autonomy) is given to the application. LlamaIndex combined advanced RAG capabilities with an Agent approach to coin Agentic RAG.
Autonomy
For Agentic Applications to have an increased level of agency, more modalities need to be introduced. MindSearch can explore the web via a textinterface. Where OmniParser, Ferrit-UI and WebVoyager enable agentic applications to be able define a graphic interface, and navigate the GUI.
The image above is from Microsoft is called OmniParser, where a similar approach is followed to Apple with FerritUI & WebVoyager. Screen elements are detected, mapped with bounding boxes and named. From here a natural language layer can be created between a UI and any conversational AI system.
MindSearch
MindSearch is premised on the problem that complex requests often cannot be accurately and completely retrieved by the search engine via a single instance.
Corresponding information which needs to be integrated into solving a problem or a question, is spread over multiple web pages along with significant noise.
Also, a large number of web pages with long contents may quickly exceed the maximum context length of LLMs.
The WebPlanner models the human mind of multi-step information seeking as a dynamic graph construction process.
It decomposes the user query into atomic sub-questions as nodes in the graph and progressively extends the graph based on the search result from WebSearcher; using either GPT-4o or InternLM2.5–7B models.
MindSearch Framework
MindSearch consists of two main ingredients: WebPlanner and WebSearcher.
WebPlanner acts as a high-level planner, orchestrating the reasoning steps and multiple WebSearchers.
WebSearcher conducts fine-grained web searches and summarises valuable information back to the planner, formalising a simple yet effective multi-agent framework.
A concrete example of how WebPlanner addresses the question step by step via planning as coding. During each turn, WebPlanner outputs a series of thoughts along with the generated code. The code will be executed and yield the search results to the planner. At the last turn, the WebPlanner directly provides the final response.
WebSearcher acts as a sophisticated RAG (Retrieve-and-Generate) agent with internet access, summarising valuable responses based on search results.
Conclusion
The MindSearch Framework introduces a novel LLM-based multi-agent framework designed for complex web information-seeking and integration tasks.
It utilises effective decomposition of complex queries and hierarchical information retrieval, modelling the problem-solving process as an iterative graph construction.
It has a Multi-Agent Design, which distributes cognitive load among specialised agents, enhancing the handling of complex and lengthy contexts.
I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.