Enterprise AI Weekly #22

We explain Mixture of Experts (MoE), Kimi K2 rocks the AI world, the Windsurf debacle is resolved, AWS launches Bedrock AgentCore and Kiro, Claude + MVP + Canva and is AI slowing some devs down?

Jul 19, 2025

Welcome to Enterprise AI Weekly #22

You’re reading the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.

Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to businesses of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.

If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page. Enterprise AI Weekly is now available for anyone to sign up at https://enterpriseaiweekly.com! Please share the link and encourage others who might find it interesting to sign up.

Last week we had our first “Ask POB” (that’s me), where I answer questions sent in by you, dear readers, relating to AI. Alongside this, I will continue to write explainers. I mentioned last week that we’re all going to work on a collaborative “pet project” together. I’m going to build an app, Vibe Coding style, to demonstrate the process and to give us a test bed for technologies we talk about in future issues. I’m still looking for input from you folks on what the app should do! I have a few ideas, but I’d like to hear some suggestions from you, too. I want to build something that is interesting, useful, and provides a suitable platform for us to iterate on and build AI into together. Hit reply, and get your suggestions in now.

Enjoy EAIW #22!

Explainer: Mixture of Experts (MoE) in LLMs

The Mixture of Experts (MoE) architecture is a clever design used in some of the latest and largest Large Language Models (LLMs), including in Kimi K2 that I’ll tell you about in a moment! Instead of relying on a single, densely connected neural network, MoE divides a model into multiple specialist subnetworks called “experts”. For each piece of input, a gating mechanism dynamically selects which experts are most relevant and only activates a subset, meaning the model does not have to use all its computational muscle every time. In practice, the gate routes each token (or small token batch) to its two best experts. This targeted approach delivers higher efficiency and scalability, as you can add more expertise without a linear increase in computational costs.

The primary benefits of MoE in LLMs are threefold. First, efficiency: MoE allows models with hundreds of billions, or even trillions, of parameters to be deployed more efficiently, as only a fraction of the overall network is used for any given task. Second, specialisation: different experts can excel at different types of language or tasks (think of a tax expert vs a poet!), enhancing the model’s versatility and quality. Third, scalability: MoE architectures make it feasible to train and serve enormous models, bringing advanced capabilities within reach of enterprise deployments where resource efficiency and throughput matter.

However, MoE does not come without its issues. The dynamic routing of queries to different experts can lead to challenges in ensuring consistent quality (what if the wrong expert gets picked or some experts are rarely used). Training MoE models can be more complex, sometimes resulting in “load imbalance” where certain experts are underutilised. There are also practical deployment headaches: maintaining many experts, each with their own weights, can make model serving and updates trickier compared to dense models.

MoE has proven successful in several of the latest LLMs. Alongside Kimi’s K2, Meta’s flagship Llama 4 models - including Scout, Maverick, and the in-training Behemoth - are all built on MoE backbones and have set new records for context length, efficiency, and multilingual support. MiniMax-M1, a fast-rising contender in open-source AI, also leverages a hybrid architecture combining MoE with novel attention mechanisms to achieve record-breaking context windows and industry-leading efficiency. Alibaba’s Qwen3-235B and DeepSeek’s R1 are further examples, with all these models consistently ranking among the world’s top performers for reasoning, coding, and document analysis.

For enterprise use, MoE models offer the chance to unlock truly large-scale, high-quality AI without breaking the bank on hardware or incurring runaway cloud costs. With models like Llama 4 and MiniMax-M1 now deployable on efficient hardware and offering performance on par with (or even superior to) closed-source rivals, we can consider self-hosted options and maintain greater control over data and compliance. The efficiency gains can directly translate to faster agentic processing, real-time analytics on huge data sets, and more cost-effective digital transformation. The caveat is that, as with all cutting-edge tech, there will be bumps in the road regarding stability and integration. That said, staying ahead of the curve on trends like MoE is exactly what keeps us competitive in the ever-evolving AI landscape.

1. Kimi K2: China pushes the frontier forward again

China is pushing the AI frontier forward again, but not with a new DeepSeek version! Instead, Moonshot AI's release of Kimi K2 brings another entrant to the world of open-source large language models. Hot on the heels of DeepSeek's R1 debut earlier this year, Kimi K2 pushes the boundaries further, boasting one trillion total parameters and a sparse Mixture-of-Experts (MoE) design with thirty-two billion activated parameters. This hefty architecture rivals - if not surpasses - proprietary offerings from Western tech giants, while maintaining a razor-sharp focus on affordability and openness.

Early reactions in both English and Chinese tech circles have been resoundingly positive. Researchers and developers are hailing Kimi K2 as another true “DeepSeek moment” - a reference to the paradigm shift when DeepSeek first made state-of-the-art AI freely accessible to all. Within 24 hours of release, Kimi K2 rocketed to the top of download charts on AI platforms. Industry commentary highlights outstanding performance in code generation and agentic tasks, placing it in line with, or even ahead of, leading models like Anthropic’s Claude Sonnet and OpenAI’s GPT-4.5, all at a fraction of the cost. The open-weight licence empowers organisations to experiment and build upon the technology, fuelling a flurry of adaptation, quantisation, and toolchain porting efforts worldwide.

Just as DeepSeek R1 changed expectations for open-source AI, Kimi K2 demonstrates that China’s Moonshot AI can compete - if not lead - at the international frontier. Rather than chasing mere novelty, the Kimi team has built on DeepSeek’s proven architecture, delivering tangible improvements in both efficiency and usability. Its agentic features are purpose-built, enabling it to orchestrate complex workflows, use tools, and deliver “artifact-first” outputs like presentations and diagrams - all aligned with the future direction of enterprise automation.

For those eager to dive in, Kimi K2 is refreshingly accessible. You can interact with it for free via Kimi’s web and app portals or integrate it into coding tools and assistants using OpenRouter, as discussed in EAIW #17. Setting up involves a straightforward process: sign up for an OpenRouter account, generate an API key, and configure your preferred environment - be that Cursor, Claude Code Router, or your own in-house application - to point at the moonshotai/kimi-k2 model. This approach not only ensures operational safety and compliance but lets you gain the benefits of cutting-edge AI at a fraction of the price of closed alternatives.

For enterprise AI teams, Kimi K2 offers a model that’s affordable, adaptable, and built for real-world agentic use. Whether it’s automating code production, enhancing internal research, or exploring advanced interaction models beyond chatbots, this release drastically lowers the barrier to adopting state-of-the-art AI within strict compliance and security frameworks. With the precedent now firmly set for major open-source breakthroughs coming from new corners of the globe, it’s essential we stay attuned and ready to integrate these advancements - securing our competitive edge for tomorrow’s intelligent enterprise.

2. The Windsurf ownership drama is finally resolved

The world of AI coding startups has rarely seen as much drama as it has this month, when Windsurf - the much-hyped AI coding tool - became the football in a high-stakes tussle between OpenAI, Google, and the makers of Devin (Cognition AI).

As we first suggested in EAIW #9, OpenAI was deep in negotiations to acquire Windsurf for a rumoured $3 billion. However, complications over IP access - primarily Microsoft’s refusal to carve out Windsurf technology from OpenAI’s own IP-sharing agreement - derailed the deal at the eleventh hour. Microsoft, as OpenAI’s largest backer and partner on Azure, insisted on maintaining its right to any technology acquired by OpenAI, effectively killing the mega-deal and laying bare the friction that sometimes lurks beneath even the most public technology alliances.

With OpenAI out of the game, Google swooped in with a “reverse-acquihire” manoeuvre. Rather than buying Windsurf outright, Google DeepMind secured a non-exclusive license to some of Windsurf’s technology for $2.4 billion, while hiring the startup’s CEO Varun Mohan, co-founder Douglas Chen, and top R&D talent. This controversial play allowed Google to supercharge its agentic coding ambitions, particularly for its Gemini unit, whilst sidestepping regulatory complexities of a full acquisition. However, Google did not take any controlling interest or stake in the company, leaving Windsurf’s legal entity - and its remaining workforce - independent.

Enter Devin, or more formally, Cognition AI: Within days of the Google shake-up, Cognition (makers of the Devin AI agent) acquired Windsurf’s remaining assets, IP, and the bulk of its staff. For Windsurf, this marked the end of a rollercoaster 72 hours, as its leadership departed for Google and its business found a new home with a direct competitor. Staff who remained with Windsurf, now under interim CEO Jeff Wang, expressed excitement about joining Cognition, a team they “respected the most”. The consensus is that Cognition’s acquisition ensures continuity for Windsurf’s customers and investors - and users have already seen a benefit through the return of Anthropic Claude models to the tool - though the AI world will be watching to see how integration with Devin unfolds and how market dynamics realign.

This saga should give every enterprise thinking about AI partnerships or acquisitions pause for thought. Major platform dependencies (like OpenAI’s ties to Microsoft) can introduce sovereignty risks even at late negotiation stages, blocking what would seem like straightforward strategic deals. Reverse-acquihire strategies - where a company nabs talent and licensed tech, but leaves the legal entity independent - are becoming more common as Big Tech tries to dodge regulatory glare while accelerating innovation. This underlines the importance of due diligence in partner relationships, a strong focus on IP structure, and an appreciation that even the most celebrated AI brands face immense flux.

3. AWS unveils Bedrock AgentCore and Kiro

Amazon has just launched two new AI offerings - Bedrock AgentCore and Kiro - aimed squarely at the enterprise sector, and both look set to shake up how large organisations, particularly those that already work closely with AWS, build and run intelligent systems. These tools signal a maturing approach to generative AI, shifting the focus from showy demos to robust, scalable, and secure production applications.

Bedrock AgentCore: Enterprise AI Agents, ready for lift-off

Bedrock AgentCore is built to enable companies to deploy, scale, and manage advanced AI agents safely and efficiently. Think of it as an all-in-one kit for moving from AI prototypes to battle-ready, production-grade solutions. It supports any underlying model, integrates with existing agent frameworks, and prioritises security and observability at every step - absolutely vital for regulated industries or mission-critical workloads.

Key features include:

AgentCore Runtime: Session isolation and industry-leading support for long-running jobs (up to 8 hours), making it ideal for complex, multi-step processes.
AgentCore Memory: Manages both short-term and long-term context, so your agents can hold a conversation across multiple interactions.
AgentCore Identity: Seamlessly links up with your current identity providers for secure, delegated access to internal resources.
Built-in observability: Real-time dashboards and OpenTelemetry compatibility mean you’ll always know what your agents are up to.

Kiro: Bringing order to “Vibe Coding” mayhem

Kiro is Amazon’s new specification-driven, agentic integrated development environment (IDE) designed to take you from a “vibe-coded” prototype to robust, maintainable production software. The IDE leverages structured “spec coding,” automatically generating specs, technical blueprints, and actionable test criteria from your prompts - so you get more than just hastily cut-and-pasted snippets.

Highlights include:

Structured agentic loop: Kiro uses planning, reasoning, and action-evaluation cycles to handle complex multi-step tasks and integrates directly with your local environment.
Specification management: Prompts aren’t just transcribed into code; they’re translated into specs, user stories, data diagrams, and sequenced tasks, helping move AI-generated ideas into the properly documented and tested realm.
Automation hooks: Developers can create automated tasks for QA, ensuring the code is production-ready from day one.
Transparent and private execution: Local code runs by default, with full control over what gets sent to the cloud.

Following the announcement, Geoffrey Huntley has conducted a detailed independent analysis of Amazon Kiro’s source code, offering valuable behind-the-scenes insight into one of the year’s most ambitious AI development tools. Readers will find an expert breakdown of Kiro’s architecture, Visual Studio Code integration, bundled extensions, and multi-provider model support. The analysis is particularly useful for technical leaders and developers keen to understand how Kiro tackles the very real challenges of plugin ecosystem fragmentation, language support (including C++, .NET, and Python), and specification-driven development flows. (Thanks Scott for the heads up!)

The analysis offers a candid look at design tradeoffs, system prompts, configuration options, and practical developer experience - whether you are evaluating Kiro for your organisation or just want a real-world sense of what “AI-first” coding assistants look like in practice, this analysis is a standout resource for informed decision making.

Bedrock AgentCore means we can safely accelerate deployment of smart, autonomous agents without the traditional headaches of bespoke infrastructure, patchwork security, or endless integration woes. Kiro, meanwhile, could help rein in the chaos of fast-moving AI development, making code easier to review, govern, and hand over as teams grow or projects pivot. The combination should empower us to innovate faster while keeping risk and technical debt firmly in check.

4. Claude and Canva bring MCP enabled design to chat

If you’ve ever wished you could whip up a pitch deck or tweak a presentation simply by chatting to your AI assistant, the new integration between Anthropic’s Claude AI and Canva is set to brighten your workday. Thanks to the Model Context Protocol (MCP), Claude now allows users to create, edit, and manage Canva designs using plain English prompts without ever leaving the chat window. Whether you need to resize images, autofill branded templates, or summarise content from Canva Docs, it all happens effortlessly with a few words in Claude’s interface.

Pictured: The same MCP connectivity being used in ChatGPT.

What sets this partnership apart isn’t just the power to design at the speed of conversation. MCP, often described as the “USB-C port of AI applications” (I’m not a fan of that comparison, mind) takes centre stage here. It acts as an open-standard bridge, securely connecting Claude not only to Canva but to a growing array of third-party platforms including Figma, Notion, and Slack. All you need is a paid subscription for both services, and suddenly, your AI assistant is a full stack design collaborator as well as a researcher and strategist. Toggling the integration on takes just a moment in settings, helping to cement a future where chat-driven automation blurs the line between creative thinking and execution.

For enterprise teams looking to enhance productivity without the faff of switching tabs or tools, this integration marks a leap in AI-powered workflow. Imagine briefing Claude to generate a branded slide deck for a client, filling it with the latest sales figures, then exporting to PDF - all in the same chat. It’s not just about speed; it’s also about reducing context-switching fatigue and empowering teams to produce higher quality outputs with less friction. As MCP is open and secure, it offers us an appealing pathway for future integrations, making it a strategic consideration as we continue to invest in smart, scalable digital tools for our organisation.

5. Do AI coding tools slow experienced developers?

A recent study by METR has thrown a curveball at expectations for AI-assisted coding, especially for enterprises eyeing “vibe coding” or large-scale adoption of tools like Cursor paired with state-of-the-art models. The results: experienced open-source developers actually took 19% longer to complete issues when using AI tools compared to coding unaided. This finding is particularly striking, given that the developers themselves predicted they’d be 24% faster with AI and continued to believe they were more productive even after the measured slowdown.

Why did AI - heralded as a productivity game-changer - slow down some of the most skilled developers? The study recruited highly experienced contributors (averaging over 22,000 stars and one million lines of code per repo), allowing them to use AI (“Cursor Pro” with Claude 3.5/3.7 Sonnet) for half of their routine bug, feature, or refactor tasks. The researchers carefully controlled for variables such as AI experience and task difficulty, finding that the slowdown wasn’t due to teething issues with the tools or lack of skill. Interestingly, despite the decrease in speed, developers believed their experience had improved, hinting that perceived “vibe” or workflow enjoyment might at times mask a dip in actual productivity.

Enterprises should not assume this result applies across the board. The study’s implications are nuanced. The effects observed in elite, context-rich teams won’t automatically translate to less experienced developers, greenfield projects, or teams onboarding new team members. In fact, earlier research suggests AI can offer significant speed-ups for less experienced coders, those learning unfamiliar codebases, or when large-scale refactoring is required.

For enterprises considering the roll-out of AI-assisted or “vibe coding” approaches, there are valuable lessons. It’s now clear that positive benchmarks and anecdotal evidence don’t always reflect the real-world impact on code velocity, particularly in high-quality, complex environments with exacting demands on documentation, testing, and style. While AI can absolutely shine for onboarding, skilling-up novices, or when tackling unfamiliar codebases, experienced hands working on established, mission-critical repositories may not see the time savings promised - or could be at risk of a hidden productivity drag.

With AI coding tools surging in popularity, forward-thinking enterprises like ours need robust evidence before adopting “vibe coding” or wholesale AI integration across engineering teams. This study is a timely reminder to prioritise measured pilots over hype, focus on specific business contexts, and remain curious but sceptical of blanket claims. If we do experiment with AI-assisted coding, close monitoring of productivity and quality outcomes - not just sentiment - will ensure we make the right calls for our unique challenges and goals.

POB’s closing thoughts

If you’re wondering whether there’s money to be made in AI (OK, I know you’re not) you’ll be interested to hear that Lovable took a $200m Series A round to make it a unicorn (that is to say, it has a valuation of over $1bn) only 8 months after launch. Impressive. I’ve read a lot about when we’ll see the first 1-person unicorn, I feel like it’s going to happen.

French AI startup Mistral introduced Voxtral this week, its first voice model, continuing its steady drumbeat of progress. “When evaluated across languages in FLEURS, Voxtral Small outperforms Whisper on every task, achieving state-of-the-art performance in a number of European languages.” Impressive.

Finally, I noted an article this week related to ‘on-device LLM support arriving in React Native’. While this is aligned against Apple Intelligence on iOS devices initially, I think a standardised way to access on device AI capability across platforms will be incredibly important going forward, and I look forward to seeing how this initiative evolves.

Thanks for reading, I hope you have a great weekend! 👍

I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. What AI topics are you most interested in for future explainers? Are there any specific AI tools or developments you'd like to see covered? Remember, if you have any questions around this Substack, AI or how Davies can help your business, you can reply to this message to reach me directly.

Finally, remember that while I may mention interesting new services in this post, you shouldn’t upload or enter business data into any external web service or application without ensuring it has been explicitly approved for use.

Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.