From Inference Dominance to Developer-Centric Ecosystems

The graph from Menlo Ventures’ 2025 report illustrates a clear trend…

Aug 27, 2025

Across both enterprises and startups, compute spend is shifting away from pure inference (running models for predictions) toward a more balanced or development-heavy focus by 2025.

This aligns with broader industry observations where AI is maturing from experimental inference workloads to production-grade application building, including software development as a key use case.

Deep research, chain-of-thought (CoT) integrations and model providers are expanding into coding tools.

Providers have indeed anticipated and acted on this shift, positioning themselves to capture the growing market for developer-centric AI infrastructure.

The Observed Shift in Focus

The data in the graph shows enterprises moving from almost entirely inference (90–100% in 2024) toward more balanced (40–60%) or mostly development (60–90%) allocations by 2025.

With year-over-year changes like a 2.1x increase in development spend for enterprises in the almost entirely inferencecategory.

Startups follow a similar pattern, though with steeper drops in pure inference.

This reflects a broader pivot… as LLMs become commoditized, the value migrates upstream to tools that enable developers to build, orchestrate, and integrate AI into real-world apps efficiently.

Industry analyses confirm this.

For instance, by 2025, over 750 million apps are projected to rely heavily on LLMs, automating ~50% of digital work, with a focus on chaining models and tools rather than isolated inference.

LLMOps (operations for large language models) has emerged as a distinct field, emphasising pipeline optimisation over novel model creation, with tools for deployment, monitoring, and governance.

Model Providers Capturing the Development Market

Major LLM providers have ramped up investments in developer tools, SDKs, CLIs, and frameworks to facilitate this shift.

They’re not just offering inference APIs anymore; they’re building ecosystems that make it easier to embed AI into codebases, reduce orchestration overhead and tie models closely to development workflows.

This includes stronger support for chaining operations, agentic behaviours, and interoperability.

OpenAI’s Advancements

OpenAI has been at the forefront, evolving from basic function calling (introduced in 2023) to more sophisticated chain-of-thought function calling in 2025.

This allows models to reason step-by-step while invoking tools, collapsing complex orchestration into the model itself .

For example, their Responses API now supports multiple tool calls with reasoning, boosting performance on benchmarks by enabling models to loop over tools autonomously.

New reasoning models like o3 and o4-mini (launched April 2025) enhance agentic tool use, making it simpler for developers to build apps without heavy external logic.

Their SDKs integrate seamlessly with frameworks like Vercel AI SDK and LangChain, emphasising app-building over raw inference.

This is part of a broader trend where OpenAI-compatible APIs reduce vendor lock-in, allowing easy switching while prioritising development velocity.

Anthropic & Google DeepMind

Anthropic partnered with Microsoft in April 2025 to release a C# SDK for the Model Context Protocol (MCP), an open-source standard that enhances AI app development by integrating data more flexibly into models.

MCP allows models to remember context across sessions, reducing the need for custom orchestration and tying closely to developer tools.

Similarly, Google DeepMind updated its Gemini SDKs (Python and TypeScript) in May 2025 to natively support MCP, with automatic tool calling and looping — streamlining multi-turn interactions for app builders.

These moves show providers branching into frameworks that support agent-to-agent communication and composable agents, like NANDA and MCP, which make interoperability easier and close the window on generic wrappers.

Other Providers & Open-Source Trends

Hugging Face emphasises deploying open-source models on custom infra with optimised, scalable tools — urging developers to move beyond API calls to providers for security and control.

Meta’s Llama ecosystem, along with models like Gemma 2 and Command R+, includes SDKs for integration into mobile and enterprise apps.

Emerging Ecosystems

Tools like OpenCoder’s open-sourced pipeline (November 2024) provide datasets, models, and eval frameworks for code generation, echoing your thoughts on software dev as a use-case. Providers are also accruing developer productivity data, with predictions that by mid-2025, they’ll offer enterprise analytics based on it. Innovations like AI model brokers (e.g., OpenRouter) route and optimise models as services, with APIs for task-specific selection and SLAs for inference — further enabling development at scale.

Lastly

Model providers have proactively leaned into this shift, using SDKs, CLIs, and frameworks to make LLMs a seamless part of the software development stack.

This not only captures the growing app-dev market but also reduces barriers like orchestration complexity, as seen in CoT tool integrations.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. Language Models, AI Agents, Agentic Apps, Dev Frameworks & Data-Driven Tools shaping tomorrow.

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots

Discussion about this post