Orchestrating Parallel AI Agents

When implementing AI agents, the tasks they’re designed to handle are absolutely key. How critical is low latency to your setup?

Cobus Greyling

Jul 31, 2025

And would running multiple processes in parallel help speed things up or create complications instead?

Let’s break down the two scenarios clearly, based on the timelines in the two graphs below.

Scenario A (Parallel Execution with Multiple Agents)

In Figure A, four specialised AI Agents —

FeaturesAgent,
ProsConsAgent,
SentimentAgent, and
RecommendationAgent —

tackle their tasks simultaneously.

A central Meta Agent (or Orchestration Agent) manages the process by distributing the work to these AI Agents (fanning out) and then pulling their outputs together (fanning in) to form a final answer.

The parallel AI Agents wrap up in just over 6 seconds, and the Meta Agent then spends about 6 more seconds synthesising everything, for a total runtime of around 12 seconds.

Scenario B (Sequential Execution with One Agent)

In Figure B, there’s just a single Meta AI Agent equipped with four tools that handle the same tasks as the specialised AI Agents in Scenario A.

But instead of running in parallel, these tools execute one after another in sequence.

This stretches the total runtime to about 30 seconds, as each tool must wait for the previous one to finish.

In short, you’ll need to triage your tasks upfront…

Get stuck into understanding them fully, figure out if they’re interdependent (meaning they must run in sequence) or independent (allowing for parallel execution to save time).

This kind of analysis is what determines whether you prioritise speed through parallelism or stick to a linear flow for reliability.

This reminds me strongly of insights from a Yale study on AI in the workplace, which stresses that truly grasping the workers and their core work processes is vital for any successful AI implementation — otherwise, you risk inefficiency, ethical pitfalls, and uneven outcomes.

Cost & Latency

Implementing AI Agents comes with some key challenges, including high costs, difficulties in clarifying ambiguous tasks and their tendency to run for extended periods.

AI Agents operate through an enhanced process that breaks down tasks, reasons through them, and executes them step by step.

The ideal approach would involve using multiple smaller AI Agents, each equipped with a specific set of tools, working on sub-tasks in parallel, guided by an orchestrator or meta AI agent to manage the workflow.

That said, while these principles are widely discussed, finding a practical notebook or example that demonstrates their feasibility and brings the concept to life can be tough.

OpenAI SDK

Below the basic architecture of running the OpenAI SDK, I normally run it in a Colab Notebook (Python code) and results from benchmarking, logs etc are all visible in the console.

In general, it seems like OpenAI is pushing their SDK, along with other model providers who want users to become more granular users of their environments.

Finding the right Balance with AI Agent Configuration

Convenience vs. Customisation

For tasks that is not latency sensitive, having a single AI Agent with multiple tools is convenient.

For greater control and customisation, make use of multiple orchestrated AI Agents that branch out and converge across various layers.

Planning vs. Determinism

For scenarios where your Orchestration AI Agent should dynamically plan, select and sequence the tasks, AI Agent sare the preferred method. On the other hand, predefined, parallel operations are more suitable when you require a fixed, predictable execution order.

Latency Sensitivity

If minimising latency is critical for your application, leveraging a fixed, predictable execution can help bypass the extra initial overhead of planning parallel tools, as well as the added burden from tool outputs and expanded context windows.

The working Python Notebook

The code below is like setting up a toolbox for a special project where you’re using smart helpers (AI Agents) to do tasks at the same time.

%pip install openai-agents asyncio matplotlib nest_asyncio

import time

import asyncio
import matplotlib.pyplot as plt
import nest_asyncio

from agents import Agent, Runner

nest_asyncio.apply()

The different AI Agents are defined, four in total, with a meta-AI Agent.

# Agent focusing on product features
features_agent = Agent(
    name="FeaturesAgent",
    instructions="Extract the key product features from the review."
)

# Agent focusing on pros & cons
pros_cons_agent = Agent(
    name="ProsConsAgent",
    instructions="List the pros and cons mentioned in the review."
)

# Agent focusing on sentiment analysis
sentiment_agent = Agent(
    name="SentimentAgent",
    instructions="Summarize the overall user sentiment from the review."
)

# Agent focusing on recommendation summary
recommend_agent = Agent(
    name="RecommendAgent",
    instructions="State whether you would recommend this product and why."
)

parallel_agents = [
    features_agent,
    pros_cons_agent,
    sentiment_agent,
    recommend_agent
]

# Meta-agent to combine outputs
meta_agent = Agent(
    name="MetaAgent",
    instructions="You are given multiple summaries labeled with Features, ProsCons, Sentiment, and a Recommendation."
    " Combine them into a concise executive summary of the product review with a 1-5 star rating for each summary area."
)

This code defines an asynchronous function , it starts by recording the agent’s name and the time it begins working, storing these details in a starts list…

starts, ends = [], []
async def run_agent(agent, review_text: str):
    agent_name = agent.name

    start = time.time()
    starts.append((agent_name, start))

    result = await Runner.run(agent, review_text)

    end = time.time()
    ends.append((agent_name, end))

    return result

This code defines an asynchronous function run_agents that handles multiple AI agents working on a piece of text…

async def run_agents(review_text: str):
    responses = await asyncio.gather(
        *(run_agent(agent, review_text) for agent in parallel_agents)
    )

    labeled_summaries = [
        f"### {resp.last_agent.name}\n{resp.final_output}"
        for resp in responses
    ]

    collected_summaries = "\n".join(labeled_summaries)
    final_summary = await run_agent(meta_agent, collected_summaries)


    print('Final summary:', final_summary.final_output)

    return

async def run_agents(review_text: str):
    responses = await asyncio.gather(
        *(run_agent(agent, review_text) for agent in parallel_agents)
    )

    labeled_summaries = [
        f"### {resp.last_agent.name}\n{resp.final_output}"
        for resp in responses
    ]

    collected_summaries = "\n".join(labeled_summaries)
    final_summary = await run_agent(meta_agent, collected_summaries)


    print('Final summary:', final_summary.final_output)

    # Plot the timeline after all agents have finished
    plot_timeline(starts, ends)

    return

Here you define your OpenAI API Key…

os.environ["OPENAI_API_KEY"] = "<Your API Key>"

Running and plotting out the run-time of the AI Agents…

review_text = """
I recently upgraded to the AuroraSound X2 wireless noise-cancelling headphones, and after two weeks of daily use I have quite a bit to share. First off, the design feels premium without being flashy: the matte‐finish ear cups are softly padded and rotate smoothly for storage, while the headband’s memory‐foam cushion barely presses on my temples even after marathon work calls. Connectivity is seamless—pairing with my laptop and phone took under five seconds each time, and the Bluetooth 5.2 link held rock-solid through walls and down the hallway.

The noise-cancelling performance is genuinely impressive. In a busy café with music and chatter swirling around, flipping on ANC immediately quiets low-level ambient hums, and it even attenuates sudden noises—like the barista’s milk frother—without sounding distorted. The “Transparency” mode is equally well‐tuned: voices come through clearly, but the world outside isn’t overwhelmingly loud. Audio quality in standard mode is rich and balanced, with tight bass, clear mids, and a hint of sparkle in the highs. There’s also a dedicated EQ app, where you can toggle between “Podcast,” “Bass Boost,” and “Concert Hall” presets or craft your own curve.

On the control front, intuitive touch panels let you play/pause, skip tracks, and adjust volume with a simple swipe or tap. One neat trick: holding down on the right ear cup invokes your phone’s voice assistant. Battery life lives up to the hype, too—over 30 hours with ANC on, and the quick‐charge feature delivers 2 hours of playtime from just a 10-minute top-up.

That said, it isn’t perfect. For one, the carrying case is a bit bulky, so it doesn’t slip easily into a slim bag. And while the touch interface is mostly reliable, I occasionally trigger a pause when trying to adjust the cup position. The headphones also come in only two colorways—black or white—which feels limiting given the premium price point.
"""

asyncio.get_event_loop().run_until_complete(run_agents(review_text))

def plot_timeline(starts, ends):

    # Plot the timeline of the agents
    # normalize times to zero
    base = min(t for _, t in starts)
    labels = [n for n, _ in starts]
    start_offsets = [t - base for _, t in starts]
    lengths = [ends[i][1] - starts[i][1] for i in range(len(starts))]

    plt.figure(figsize=(8, 3))
    plt.barh(labels, lengths, left=start_offsets)
    plt.xlabel("Seconds since kickoff")
    plt.title("Agent Execution Timeline")
    plt.show()

The result…

And running one AI Agent with four tools…

from agents import ModelSettings


meta_agent_parallel_tools = Agent(
    name="MetaAgent",
    instructions="You are given multiple summaries labeled with Features, ProsCons, Sentiment, and a Recommendation."
    " Combine them into a concise executive summary of the product review with a 1-5 star rating for each summary area.",
   model_settings=ModelSettings(
       parallel_tool_calls=True
   ),
    tools=[
        features_agent.as_tool(
            tool_name="features",
            tool_description="Extract the key product features from the review.",
        ),
        pros_cons_agent.as_tool(
            tool_name="pros_cons",
            tool_description="List the pros and cons mentioned in the review.",
        ),
        sentiment_agent.as_tool(
            tool_name="sentiment",
            tool_description="Summarize the overall user sentiment from the review.",
        ),
        recommend_agent.as_tool(
            tool_name="recommend",
            tool_description="State whether you would recommend this product and why.",
        ),
    ],
)

starts, ends = [], []
result = await run_agent(meta_agent_parallel_tools, review_text)

print('Final summary:', result.final_output)

plot_timeline(starts, ends)

And the result…

So…

Understanding the specific use-case and operational environment is paramount when implementing AI Agents, as it directly influences their design, efficiency and overall success.

Without a deep grasp of the tasks at hand — such as whether they require low latency, parallel processing, or sequential execution — developers risk creating systems that are either overly complex, cost-ineffective or prone to errors like misinterpreting ambiguous instructions.

Also, factoring in environmental constraints, including computational resources, integration with existing tools and potential interdependencies between subtasks, ensures that the agents are not only functional but also scalable and adaptable, ultimately leading to more reliable and impactful AI solutions.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. Language Models, AI Agents, Agentic Apps, Dev Frameworks & Data-Driven Tools shaping tomorrow.

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots

Discussion about this post