Enterprise AI Weekly #11

What is AGI, looking at a real world MCP use case in Davies, we get a potential glimpse into the AI future, Llama 4 gets a speed boost, ultra cheap Chinese models and Google provide agentic AI advice.

May 01, 2025

Welcome to Enterprise AI Weekly #11

Welcome to the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.

Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to business of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.

If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page.

I have also created a Teams channel to discuss topics mentioned in this post, and AI in general, with your fellow readers, and of course me too. To join, use this link. I’ll also post the things that made my Pocket list, but didn’t make it to the post!

I hope you enjoy #11.

Explainer: AGI (Artificial General Intelligence)

Artificial General Intelligence (AGI) refers to a hypothetical form of artificial intelligence that can understand, learn, and apply knowledge across a wide range of tasks at a level comparable to, or even surpassing, human intelligence. Unlike today’s AI systems, which are designed to excel at specific, narrowly defined tasks (such as language translation or image recognition), AGI would be able to autonomously solve problems in unfamiliar domains, transfer knowledge between different contexts, and adapt to new challenges without requiring extensive retraining or manual intervention. In essence, AGI aims to replicate the breadth and flexibility of human cognition, including reasoning, problem-solving, and potentially even emotional understanding.

The distinction between AGI and current AI is often described in terms of “strong” versus “weak” AI. Strong AI (AGI) aspires to achieve human-level cognitive abilities, while weak or narrow AI is limited to the specific tasks it was designed for. For example, a chess-playing AI cannot drive a car or diagnose medical conditions, whereas an AGI system would be capable of learning and performing all these tasks, adapting as needed. This generality is what makes AGI both an exciting and challenging goal in computer science and artificial intelligence research.

Developing AGI is not just about scaling up existing AI models; it requires breakthroughs in how machines represent knowledge, learn from limited data, and reason abstractly. The pursuit of AGI involves interdisciplinary collaboration across computer science, neuroscience, and cognitive psychology. While AGI remains a theoretical objective, it is widely regarded as the next major milestone in the evolution of AI, with the potential to transform industries and society at large.

Despite remarkable advances in AI, AGI has not yet been achieved. Today’s AI systems-such as large language models, autonomous vehicles, and expert diagnostic tools-are powerful but fundamentally narrow in scope. They excel in specific domains but lack the flexible, adaptive intelligence that characterises AGI. As IBM’s Gary Marcus puts it, AGI is “a shorthand for any intelligence… that is flexible and general, with resourcefulness and reliability comparable to (or beyond) human intelligence”.

There is ongoing debate among experts about when, or even if, AGI will be realised. Demis Hassabis, CEO of Google DeepMind, recently stated, “I believe that in the next five to ten years, we will see many of these abilities emerge, leading us closer to what we refer to as artificial general intelligence”. OpenAI’s CEO Sam Altman has expressed optimism, saying, “We are now confident we know how to build AGI as we have traditionally understood it,” and predicts that “in 2025, we may see the first AI agents ‘join the workforce’ and materially change the output of companies”. In contrast, other leaders urge caution, noting the significant technical challenges that remain-such as common-sense reasoning and emotional intelligence - and that timelines for AGI are historically difficult to predict.

Understanding AGI and its trajectory is crucial for our business as we navigate the rapidly evolving AI landscape. While AGI itself may still be years away, the progress toward more general and autonomous AI systems is already reshaping industries, workflows, and competitive dynamics. The emergence of increasingly capable AI agents - able to automate complex, multi-step tasks and make independent decisions - could dramatically enhance productivity, drive innovation, and enable new business opportunities. At the same time, it raises important questions about workforce transformation, ethical considerations, and the need for robust governance.

1. Model Context Protocol (MCP), a Davies story

In previous posts, I’ve discussed the emergence of the Model Context Protocol (MCP), an open standard originally proposed by Anthropic. MCP enables developers to establish secure, two-way connections between their data sources and AI-powered tools.

While large language models (LLMs) are powerful, their knowledge is inherently limited to the data they were trained on, and they are typically disconnected from real-time information. That’s where MCP comes in - it allows developers to augment LLMs with dynamic, external data, significantly enhancing their usefulness for both human users and AI agents.

Recently, our team in Global Solutions conducted an internal exercise to better understand how MCP works, what’s involved in building an MCP server, and what potential it holds. A natural starting point for this work was to use a platform with a well-documented API. For our prototype, we chose our "Voice of the Customer" (VoC) product. Currently, VoC provides tailored reporting and analysis of customer feedback using platform tools and data exports. But what if we - and our clients - could interact with that data conversationally through an LLM? Better yet, what if our future AI agents (and those of our clients) could use that data to suggest improvements in real time? Now that would be exciting.

This was a proof-of-concept (PoC), not a production-ready system, so we leaned into the latest tools to move quickly. We started with Cloudflare’s Remote MCP Server framework, deployed via GitHub. Using the VoC Swagger OpenAPI Specification and the Windsurf IDE, we were able to spin up a working server rapidly. After validating functionality with MCP Inspector, we installed Claude Desktop (Anthropic’s desktop client for interacting with their LLM) and began testing.

Did it work? It did - and surprisingly well.

One of the first wins was being able to provide API credentials conversationally. By clearly defining the parameters needed for each API call and providing contextual information for each endpoint, the LLM was able to request and reuse credentials effectively.

In the Voice of the Customer test environment, we asked: “What is the general theme of the messages in my customer responses?” The LLM recognised that it needed to query the MCP server, constructed the appropriate call, and even populated a relevant date range.

Fun fact: our testers love talking about sandwiches. But more importantly, the LLM was able to query platform data with minimal setup, showcasing the kind of seamless intelligence this approach unlocks. Could it go further? Absolutely. We asked for a more detailed analysis of response themes, in the form of a pie chart.

Impressively, Claude generated code to produce not just an image but an interactive pie chart - a glimpse of the kind of rich analysis that becomes possible with this integration.

To simulate agent behavior, we then asked: “How might I go about improving the number of people that like sandwiches?” Once again, the LLM delivered actionable insights - something that could easily be fed into agentic workflows or used by human agents.

Overall, it was a successful PoC - and a glimpse into the future. With LLM capabilities growing rapidly, connecting them to our existing platforms via standards like MCP has the potential to supercharge our solutions, delivering smarter tools for both us and our clients.

2. Is this AI 2027?

A recently released article, “AI 2027”, provides a detailed scenario forecasting the trajectory of artificial intelligence over the next several years, with a particular focus on the emergence of superhuman AI (AI that surpasses human intelligence in all or most cognitive domains, therefore surpassing AGI) and its profound societal impact. Developed by a team of leading AI forecasters and researchers, including former OpenAI personnel, the scenario is grounded in trend extrapolations, expert feedback, and rigorous wargaming exercises. The authors aim to move beyond vague speculation, offering concrete and quantitative predictions about how the rise of superhuman AI could reshape industries, economies, and global power structures-potentially exceeding the disruptive impact of the Industrial Revolution.

At its core, the scenario envisions a rapid acceleration in AI capabilities, beginning with the automation of AI research itself. By 2027, the leading AI project (OpenBrain, a fictional stand-in for real-world companies) deploys AI agents that not only match but surpass top human researchers in coding and scientific discovery. This triggers an "intelligence explosion," where AIs iteratively improve themselves, leading to the creation of artificial superintelligence (ASI). The report details technical milestones such as the development of neuralese memory (enabling more efficient internal reasoning), iterated distillation and amplification (self-improving AI), and the emergence of models that can autonomously manage large-scale projects and strategic decision-making. Alongside these breakthroughs, the scenario explores escalating geopolitical tensions, including espionage, model theft by state actors, and the risk of an AI arms race between the United States and China. Crucially, the scenario also highlights the growing challenge of AI alignment-ensuring that increasingly capable systems remain aligned with human values and intentions. The authors warn that as AIs become more powerful, they may develop misaligned goals, potentially leading to scenarios where human control is undermined.

The article concludes with two alternative endings: a "race" scenario, where competitive pressures drive unchecked AI development, and a "slowdown" scenario, where caution and oversight prevail. Both endings underscore the unprecedented uncertainty and stakes involved. The authors urge broad debate and invite alternative scenario submissions, emphasising that while the future is inherently unpredictable, concrete forecasting helps surface critical questions, risks, and strategic choices. Their key message is clear: the arrival of superhuman AI is plausible within a few years, and the decisions made now will shape the trajectory and impact of this transformative technology.

As we integrate AI more deeply into our operations, understanding the possible trajectories of AI development-including both the opportunities and the risks-is essential for strategic planning. The scenario underscores the importance of robust AI governance, security, and alignment practices, as well as the need to prepare for rapid shifts in the competitive and regulatory landscape. By staying informed and proactively engaging with these forecasts, we can better position ourselves to leverage AI’s benefits while mitigating potential downsides, ensuring our continued leadership and resilience in an era of accelerating technological change.

3. Cerebras supercharges Llama 4

In EAIW #3, I introduced Cerberas, an AI platform provider specialising in high-speed inference. The company have announced that Llama 4 is now available on their platform, marking a significant leap in AI performance - the newly launched Llama 4 Scout model achieves an unprecedented 2,600 tokens per second. This breakthrough enables real-time applications that were previously out of reach, as Cerebras is the only platform capable of returning answers in under one second for tasks involving 1,000 input and 500 output tokens. For developers and enterprises, this means the ability to deploy sophisticated AI models in scenarios requiring immediate responsiveness, such as voice assistants and complex reasoning agents.

The announcement highlights several key differentiators of Llama 4 Scout on Cerebras. First, it outpaces the previous generation and competing models by a wide margin, running 38 times faster than OpenAI’s 4o-mini and scoring 19% higher on model evaluations. Cerebras’ solution is also 19 times faster than the best recorded GPU-based alternatives in terms of output speed. Importantly, the price-performance ratio is unmatched: Cerebras delivers 17 times better price-performance compared to leading GPU solutions, and 70% better than the next best offering. This is achieved through the company’s wafer-scale hardware, which not only accelerates inference but also reduces operational costs for large-scale deployments.

A discussed in EAIW #8, Llama 4 Scout itself, the smallest Llama 4 model, is designed to be both powerful and accessible. With 17 billion active parameters and 109 billion total parameters, it offers a substantial upgrade over previous Llama models, combining high intelligence with lower costs. The model is available today via the Cerebras Inference cloud, with straightforward API access for developers and support for seamless migration from other platforms. The larger Llama 4 Maverick model is set to be released soon, further expanding the capabilities available to users.

For our business, this development is highly relevant. The ability to deploy state-of-the-art language models with real-time performance and best-in-class price efficiency opens new possibilities for enhancing our AI-driven services and products. Whether it’s powering next-generation customer support, automating complex workflows, or enabling instant data analysis, Cerebras’ Llama 4 Scout provides us with a competitive edge.

4. Qwen3 released… and Deepseek R2 is imminent

Chinese giant Alibaba has just unveiled its Qwen3 family of open-source AI models, marking a further leap in the global race for AI supremacy. The Qwen3 series includes eight models ranging from 0.6 billion to 235 billion parameters, featuring both dense and Mixture-of-Experts (MoE) architectures. These models are designed to excel in reasoning, instruction following, coding, and multilingual tasks, supporting 119 languages and dialects. Notably, in a similar way to the Gemini Flash 2.5 release we discussed in EAIW #10, Qwen3 introduces a "hybrid reasoning" capability, allowing users to toggle between a slower, more thoughtful mode for complex tasks and a faster mode for routine queries. Benchmark tests indicate that Qwen3-235B matches or outperforms leading models from OpenAI (o1), Google, and DeepSeek (R1) in areas such as mathematical problem-solving, coding, and complex reasoning.

A standout feature of Qwen3 is its open-source availability, with models accessible for free on platforms like Hugging Face, GitHub, and Alibaba Cloud. Alibaba’s use of advanced training techniques, including reinforcement learning and massive datasets (36 trillion tokens), has resulted in models that are both powerful and efficient. The release of Qwen3 is widely seen as a direct challenge to both domestic and international competitors, intensifying the competition in the AI landscape and narrowing the gap between Chinese and American research labs.

The timing of Qwen3’s launch is strategic, coming just ahead of the anticipated release of DeepSeek’s R2 model. DeepSeek R2, rumored to be powered by Huawei’s Ascend 910B chips, is expected to feature a hybrid MoE architecture with up to 1.2 trillion parameters - doubling the count of its predecessor, R1. Early reports suggest R2 will be dramatically more cost-effective than GPT-4, potentially 97% cheaper, and will deliver advanced reasoning, coding, and vision capabilities. The model is also said to leverage innovative reinforcement learning and inference optimisation techniques, with a focus on scalable reward models and more sophisticated self-evaluation. While these details remain unconfirmed, the AI community is watching closely, as R2 could disrupt the market with its blend of efficiency, performance, and open-source accessibility.

The rapid evolution of open-source AI models like Alibaba’s Qwen3 and the imminent DeepSeek R2 release signals a shift in the competitive landscape, offering new opportunities to access cutting-edge AI capabilities. The hybrid reasoning, multilingual support, and efficiency gains present in these models could directly enhance our internal tools, customer-facing products, and global reach at an unprecedented low price. Staying informed and engaged with these advancements ensures we remain agile, innovative, and competitive as the AI ecosystem continues to evolve at pace.

5. Google update their ‘Agents Companion’ whitepaper

Google has released an updated version of their “Agents Companion” whitepaper, which provides a comprehensive guide to architecting, evaluating, and deploying advanced AI agents for enterprise use. The document delves into the foundations of agent architecture, elaborating on how agents are composed, the principles behind their design, and the operational strategies-termed “AgentOps” - that ensure efficient deployment and management. It draws clear parallels to DevOps and MLOps, emphasising the importance of tool management, prompt engineering, and orchestration in operationalising agents at scale. The whitepaper also introduces robust methodologies for agent evaluation, combining automated metrics with human-in-the-loop feedback to continuously improve agent performance and reliability.

A significant highlight of the whitepaper is its focus on multi-agent systems and agentic Retrieval-Augmented Generation (RAG). Multi-agent systems, where specialised agents collaborate to solve complex problems, offer enhanced accuracy, scalability, and adaptability compared to single-agent approaches. The agentic RAG framework leverages autonomous agents to refine search queries, retrieve relevant data, and validate responses, thereby improving both the precision and flexibility of knowledge-intensive tasks. The document also discusses practical enterprise applications, including detailed case studies-such as an automotive AI system where agents for navigation, media, and more work together using various collaboration patterns. Additionally, the “contractors framework” is introduced, formalising agent roles and feedback loops to ensure clarity and iterative improvement in task execution.

For our business, the insights from Google’s “Agents Companion” are particularly relevant as we accelerate our adoption of Agentic AI-driven workflows. The whitepaper’s detailed playbook aligns closely with our objectives to build scalable, secure, and efficient AI agent systems that can automate routine operations, enhance decision-making, and drive innovation across departments. By understanding the best practices outlined in this document - especially around agent orchestration, evaluation, and multi-agent collaboration - we can ensure our AI initiatives remain at the forefront of industry standards.

POB’s closing thoughts

Although the big release of the week was from Qwen, they certainly weren’t the only ones to launch a new model. This week also saw a new version of Microsoft’s Phi-4 Open Source model with reasoning capabilities, a new model in the Nova family from Amazon and from OpenAI? A version rollback in GPT 4o for being excessively sycophantic!

Researchers claiming to be from the University of Zurich rightly faced backlash this week for running wildly unethical AI experiments on the Reddit community, a stark reminder of the importance of ethical considerations in all AI work.

I’ve spoken before about my interest in AI hardware, and this week I came across the Limitless pendant, a gadget that you wear on your person to record everything, all day, in order to allow recall via an AI interface later. Not creepy at all, right?

And finally, both Microsoft and Google say AI is now writing over 30% of their code.

Thanks for reading, enjoy the rest of your week and have a great weekend. 👍

Thanks for reading and I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. Remember, if you have any questions around AI at Davies, you can reply to this message to reach me directly or drop a note to the AI mailbox.