Enterprise AI Weekly #14

It's the Microsoft and Google show, with Build and I/O taking place this week, plus what is Explainable AI (XAI), here come the agentic coding wars, is vibe coding real and views on the future of AI.

May 22, 2025

Welcome to Enterprise AI Weekly #14

Welcome to the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.

Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to business of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.

If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page.

What a week it’s been. Somehow, either accidentally or deliberately, Microsoft Build and Google I/O, the respective companies’ developer conferences, got scheduled for the same week. This has led to an onslaught of announcements from both companies, elevating what is already an incredibly fast pace of change to a whole new level. How are we mere mortals meant to keep up? It’s a genuine challenge!

This issue is therefore something of a “Microsoft and Google” special. I’ll pull out some key announcements, as well as providing links to the announcement streams. Enjoy!

Explainerception: What is Explainable AI (XAI)?

Explainable AI (XAI) refers to artificial intelligence systems designed so that humans can understand their decision-making processes and outputs. Unlike “black box” models where the internal workings are opaque even to their creators, XAI aims to provide transparency and insight into how an AI arrives at a specific conclusion or prediction. The core goal is to build trust and confidence in AI systems, enabling users to understand, appropriately manage, and effectively utilise these powerful technologies. By making AI less inscrutable, XAI facilitates debugging, helps identify potential biases, and supports accountability for AI-driven outcomes (for example, in decision making).

The application of XAI principles to Large Language Models (LLMs) presents a significant challenge due to their immense complexity, often involving billions of parameters and intricate neural network architectures. Traditional XAI techniques, which might involve visualising decision trees or identifying key features in simpler models, are harder to apply directly to these vast systems. For LLMs, researchers are exploring methods such as analysing attention mechanisms (to see which parts of the input text most influenced the output), identifying influential training data points, or prompting the LLM itself to provide a rationale for its responses. However, these approaches often provide only partial or surface-level explanations.

Achieving comprehensive XAI for LLMs remains an open question and a highly active area of research. The core difficulty lies in the way LLMs generate responses: they learn complex statistical patterns from enormous datasets and generate text that is probabilistically likely, rather than reasoning in a human-like, step-by-step logical manner. While an LLM can be prompted to generate an explanation for its output, this explanation is itself a generated piece of text and may not accurately reflect the true computational drivers behind its decision. It could be a plausible-sounding justification created after the fact, rather than a genuine insight into its internal state. Thus, distinguishing between a generated rationalisation and a true, mechanistic explanation is a key hurdle in the ongoing "explainability challenge" for current LLMs.

Anthropic are shaping up to be a leader when it comes to XAI for LLMs, with their recently published (and previously EAIW mentioned) papers on “Tracing the thoughts of a large language model” and “Mapping the mind of a large language model”. I also found “The Explainability Challenge of Generative AI and LLMs” by Lee Ditmar of OCEO an informative read.

For businesses, the pursuit of XAI, especially in the context of LLMs, is highly relevant for several reasons. Firstly, transparency fosters trust among employees using AI tools and customers interacting with AI-driven services, which is crucial for adoption and acceptance. Secondly, in regulated industries, being able to explain AI decisions is increasingly important for compliance, risk management, and demonstrating fairness, helping to avoid biases and meet legal obligations like the “right to explanation”, as seen in Article 22 of the GDPR. Understanding why an AI system produces a certain output also allows businesses to debug errors more effectively, improve model performance, and ensure that AI tools are aligned with strategic objectives and ethical guidelines, leading to more responsible and effective AI integration.

1. The agentic coding wars start here

You know how you wait ages for a coding agent, then they all come along at once?

This week has marked a further step change in AI-driven software development with the launch of several advanced agentic coding solutions. Agentic coding refers to AI systems built on large language models that can autonomously understand requirements, write, test, debug, and even deploy software with minimal human guidance. Following Anthropic’s release of Claude Code (as covered in EAIW #3), we've seen several key releases in this domain within this space of 7 days: OpenAI introduced Codex on May 16, an AI coding agent within ChatGPT designed to handle multiple software engineering tasks concurrently. Following this, GitHub unveiled its new Copilot Coding Agent on May 19, an AI assistant that can take on specific programming tasks, operating like an additional team member within the GitHub environment. Google then announced the global public beta of its asynchronous AI coding assistant, Jules on May 20, designed to manage a broad suite of time-consuming coding tasks for developers.

The primary achievement of these agentic coding tools is to fundamentally reshape the software development lifecycle. These AI agents go beyond mere code completion; they can interpret natural language prompts, analyse existing codebases, and produce high-quality, context-aware code tailored to specific needs. Their capabilities extend to managing tasks of low to medium complexity, including adding new features, fixing bugs, remediating security issues, expanding automated tests, refactoring existing code, and enhancing documentation. By automating these often-labour-intensive processes, agentic coding aims to significantly accelerate development timelines, potentially reduce development costs, and improve the overall quality and consistency of code. This allows human developers to shift their focus towards more creative, complex problem-solving and architectural design.

Dashboard asking ‘What should we code next?’ with a prompt box, repo/branch selectors, and a task list on a pastel code-themed backdrop.

For our business, the emergence of these sophisticated agentic coding solutions presents a compelling opportunity to enhance our operational efficiency and innovation capacity. By strategically integrating these AI-powered tools into our development workflows, we can expect to see a notable increase in productivity among our engineering teams. This acceleration in software creation can directly translate to a faster time-to-market for new products, features, and services, thereby strengthening our competitive position. Moreover, by automating more routine and repetitive coding tasks, our skilled developers will have greater bandwidth to dedicate to pioneering solutions, tackling more intricate technical challenges, and delivering greater value to our organisation and clients.

2. Insights from Google Cloud’s “Future of AI” report

Google has posted a report entitled “Future of AI: Perspectives for Startups”, and there are some valuable insights in there for both startups and established businesses. Featuring twenty-three leading voices in AI, the document details their perspectives on the most critical AI and Cloud AI innovations, trends, opportunities, and challenges that leaders should consider when building and scaling AI in 2025.

Several key findings are highlighted as shaping the AI landscape. First, multimodal AI models - capable of processing text, images, audio, and video - are expected to revolutionise how we interact with technology, making digital experiences more seamless and reducing dependence on traditional devices like computers and smartphones. These advanced models, as we’ve discussed previously, also incorporate long context windows and reasoning capabilities, enabling more complex problem-solving and with real-time tool use, such as web browsing.

It is also noted that widespread enterprise adoption will take longer than anticipated due to “last mile” challenges such as integration, reliability, and user experience. The emergence of “ambient agents” - AI systems running continuously in the background and proactively assisting users - is a major trend, and it’s noted that there are market opportunities to build specialised, ROI-driven AI solutions that address real-world problems effectively.

The document offers substantial advice, primarily for startups, but with broad applicability. A core recommendation is to view AI not merely as a cost-cutting tool but as a driver for top-line revenue growth and innovation. Businesses are encouraged to focus on developing specific, high-value AI features rather than attempting to build all-encompassing products from the outset, and to move quickly in the rapidly evolving AI market. Crucially, the advice emphasises the importance of establishing clear evaluation metrics for AI systems early on and aligning pricing models with the value delivered to customers. A strong data strategy, prioritising the collection of diverse, high-quality data while ensuring security and privacy, is presented as fundamental. Furthermore, building with “agnostic” infrastructure is suggested to allow flexibility in adopting the best available models and databases as they change.

The perspectives collated in the document point towards significant future shifts, particularly in AI infrastructure, which is expected to become faster, cheaper, and more reliable to meet the demands of generative AI. AI agents are predicted to become more integrated into daily life and business workflows, performing tasks and interacting in a more human-like manner, though the optimal balance between autonomous agents and human-in-the-loop systems remains a point of discussion. As the costs of infrastructure and foundational models decrease, the focus of innovation and value creation is expected to shift towards the application layer, where AI-powered products and services can solve real-world problems. This necessitates a move from simple “prompt and pray” approaches to developing robust AI systems that orchestrate multiple models and tools to deliver reliable and valuable outcomes for enterprises, emphasising human-AI collaboration to enhance productivity.

Understanding the strategies and technological trajectories being pursued by agile startups provides crucial foresight into potential market disruptions and new competitive pressures. Many principles advocated for startups, such as leveraging AI for revenue growth, focusing on specific high-value applications, iterating rapidly, and developing strong data strategies, are directly transferable and vital for large enterprises aiming to innovate, enhance efficiency, and maintain a competitive edge. The discussions on AI infrastructure evolution, the rise of agentic AI, and the importance of "product-algo fit" can inform an established company's AI adoption roadmap, helping to avoid common pitfalls and ensure that investments are future-proofed. Moreover, the emphasis on fostering AI fluency across teams and the exploration of new AI-driven business and monetisation models are critical considerations for any business looking to thrive in the evolving technological landscape.

3. Scott Hanselman on “Is Vibe Coding real?”

Scott Hanselman, Vice President at Microsoft, Developer Community, and highly respected host of the “Hanselminutes Technology Podcast”, recently explored the concept of "vibe coding" in an episode featuring developer James Montemagno. The discussion centers on whether using AI agents to "vibe entire applications," as Montemagno did by generating a 17,000-line application with only twenty lines of his own code, is a valid approach for production-ready software or merely suitable for prototyping. Hanselman adopts a skeptical stance, questioning if this represents the future of programming or if Montemagno's success was an isolated incident.

Montemagno describes his interpretation of “vibe coding” not as blindly accepting AI-generated code or “being one with the AI”, but as an interactive, “in the flow” process where the developer works closely with the AI, actively reviewing and modifying code in real-time. This contrasts with AI researcher Simon Willison's view that if a developer reviews and understands all AI-generated code, the AI is merely a “typing assistant”. Montemagno elaborates on his method, which involves using detailed prompt files that can include decision trees to guide the AI according to specific best practices, emphasising that the AI learns from the context of the existing project. While he found the process “different faster” and enabled more functionality than he could have achieved alone, both he and Hanselman stress the critical importance of maintaining human oversight, with the developer always having a “finger on the steering wheel”.

The conversation further delves into the practicalities and nuances of this AI-assisted development. Montemagno underscores the significance of providing rich context to the AI, including existing codebases and preferred architectural patterns, to ensure the generated output aligns with project standards. He shares examples like using AI to build a prototype React application despite having no prior React experience, or to handle tedious tasks such as comprehensive CSS restyling through iterative prompting, thereby removing “toil”. Both speakers acknowledge that the landscape of AI coding tools is rapidly evolving, with Montemagno advocating for developers to continually experiment with new features and updates, as capabilities can change significantly even week to week. He also emphasises that developers retain control, customising their interaction with AI tools much like they would their development environment.

For our business, these insights into "vibe coding" and AI-assisted development help us shape our future use of the technology. While we may not be "vibing entire applications" into production imminently, the underlying principles can significantly enhance our development processes. By leveraging AI tools as sophisticated assistants, our developers can potentially accelerate coding tasks, overcome creative blocks, and automate the generation of boilerplate or repetitive code, freeing them to focus on more complex problem-solving and innovation. Montemagno's approach to embedding best practices into AI prompts and providing project context also offers a pathway to improving code consistency and quality across teams. Embracing these evolving AI capabilities, with a continued emphasis on "human in the loop" governance, can lead to increased productivity, faster project cycles, and the ability to explore new technological avenues, all while maintaining the standards and reliability crucial for an established enterprise.

4. Microsoft @ Build - top ten

The veritable onslaught of news this week means a slightly different approach to my coverage. I’m going to list out my top ten most interesting announcements, each with a short narrative, in the style of EAIW Extra (which you can read every week in our Teams group). You can view the full list of announcements from Build in Microsoft’s “Book of News”.

Azure AI Foundry Agent Service - this new service simplifies the operation of agents across development, deployment, and production via a single control plane.
Azure AI Foundry Observability - continuous evaluation capabilities for Agents have been added to further enhance the Foundry Observability dashboard, with visibility into additional critical quality and safety metrics.
Microsoft Entra Agent ID - this expansion to Entra ID will help tackle the AI agent sprawl problem by assigning a unique identifier to every agent in an environment, providing the ability to see all AI agents in one place and to know what those agents can access inside the organisation.
Model Router - this is a deployable AI chat model that is trained to select the best LLM to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model, delivering high performance while saving on compute costs where possible.
Agent Store - this is a store to make agents available to Microsoft 365 Copilot and Copilot Chat customers, and to ease consumption of agents from third parties. The store already features agents from Microsoft, partners, and customer organisations.
Microsoft Dynamics 365 data in Microsoft 365 Copilot - this will enable Microsoft 365 Copilot users to find Dynamics 365 CRM insights across sales, service, supply chain and marketing to drive their business. Initially, this will be for a scoped set of Dynamics 365 entities, including Contact, Opportunity, Lead and Account. This is in private preview.
Microsoft 365 Copilot Tuning - this is a new, low-code capability in Microsoft Copilot Studio to allow every organisation to tune AI models using their own company data, workflows, and processes, without needing a team of data scientists or weeks of work.
Multi-agent orchestration in Copilot Studio - multi-agent orchestration in Copilot Studio enables agents to exchange data, collaborate on tasks, and divide their work based on each agent’s expertise. For example, multiple agents can collaborate across HR, IT, and marketing to help onboard a new employee.
Microsoft is indeed hosting Grok - as rumoured, Grok is now on Azure AI Foundry.
Manus powered by Azure AI Foundry - we’ve talked about Manus a lot in previous posts, including the data sovereignty and compliance concerns, but it’s now been announced Manus is going to be powered by AI Foundry.

5. Google @ I/O - top ten

And now onto Google’s announcements. You can view the full list of announcements from I/O on the Google Blog, I think you’ll agree it’s a super-strong showing by Google this year.

Gemini Diffusion - In EAIW #3 I talked about fast inference with Cerebras, and how text diffusion models also look to up the ante. Google now have their own diffusion model, albeit behind a waitlist.
Gemini 2.5 updates - the latest Gemini model has been updated with even higher performance. 2.5 Pro and 2.5 Flash now have native audio output, and 2.5 Pro has gained deep think capabilities. Both will move to general availability in June.
Imagen 4 and Veo 3 - Google’s already impressive media models are being upgraded. The new models push the frontier of image and video generation with their groundbreaking new capabilities. Google are also expanding access to Lyria 2, giving musicians more tools to create music. Finally, they’re inviting visual storytellers to try Flow, a new AI filmmaking tool.
Learn LM - last year, Google introduced LearnLM, a family of models and capabilities fine-tuned for learning. They have now announced that they are infusing LearnLM directly into Gemini 2.5, making it the world’s leading model for learning.
Android XR - I love my Quest 3 and my Meta Smart Glasses, so I’m excited to see what Google can do with Android XR. Partnerships with Gentle Monster and Warby Parker will make the products “stylish”. Apparently.
Gemini Code Assist - following on from the public previews of the free AI-coding assistant, Google has announced that Gemini Code Assist for Individuals and Gemini Code Assist for GitHub are Generally Available, powered by Gemini 2.5.
Gemma 3n - this is a new model optimised for on-device (mobile) performance and efficiency. Gemma 3n starts responding approximately 1.5x faster on mobile with significantly better quality (compared to Gemma 3 4B) and a reduced memory footprint. I’m excited to see what local LLMs can bring to the market.
Stitch - this is a new experiment from Google Labs that allows you to turn simple prompt and image inputs into complex UI designs and frontend code in minutes, with direct image, Figma and code exports from the tool.
Computer Use API - this is a new feature for developers to build applications that can browse the web or use other software tools. It’s available today in the Gemini API to Trusted Testers and will roll out to more developers later this year.
NotebookLM mobile - you can now use audio overviews, explore content, and share to NotebookLM directly from your Android or iOS device.

POB’s closing thoughts

Well done for getting to my closing thoughts, there’s a lot to digest this week!

It hasn’t gone unnoticed that this week’s post is quite “developer-y’” It’s what happens when you have two developer events in a week!

It’s been a quiet one for OpenAI this week, although they’ve released some significant API updates, including MCP compatibility! Mistral chose the wrong week to release their new “Devstral” model, an Open-Source model that outperforms Gemma 3 27B and DeepSeek V3 according to initial benchmarks. Finally, Meta engineer Saranyan Vigraham shared his own personal experiments with Vibe Coding, identifying a sweet spot of 40–55% AI involvement in the development process.

We’ll do a “not development” special next week. I’ll leave you with this TechCrunch article, following on from our Klarna mention in EAIW #13, which speaks to how their efficiency push has led to revenue per employee soaring to $1m.

Thanks for reading, enjoy the rest of your week and have a great weekend. 👍

Regular readers will know I have created a Teams channel to discuss topics mentioned in this post, and AI in general, with your fellow readers, and of course me too. To join, use this link.

I also post the things that made my Pocket list, but didn’t make it to the post- “EAIW Extra” - to Teams.

I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. What AI topics are you most interested in for future explainers? Are there any specific AI tools or developments you'd like to see covered? Remember, if you have any questions around AI at Davies, you can reply to this message to reach me directly or drop a note to the AI mailbox.

Finally, remember that while I may mention interesting new services in this post, you shouldn’t upload or enter business data into any external web service or application without ensuring it has been explicitly approved for use.

Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.

Enterprise AI Weekly