Enterprise AI Weekly #37

Github, Windsurf and Cursor level up AIED, Mistral's AI Studio aims to be a production enabler, Airbnb stands behind Qwen open source, the AI bubble, AI in Windows 11 and Minimax M2 arrives

Oct 31, 2025

Welcome to Enterprise AI Weekly #37

You’re reading the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.

Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to businesses of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.

Alongside the main newsletter, I’m also exploring the capabilities of AI-Enhanced Development (AIED) and Vibe Coding. I set aside a bit of time in my week for keeping up with tech and doing this sort of thing - usually a Sunday morning - so I’m reserving an hour each week to create some interesting things with AI. Our first project was Boring Expenses, created to demonstrate the Vibe / AIED process and to show what can be achieved in only one virtual working day. If you haven’t seen the finished product yet, head on over to our “Boring demo”. 😊

If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page. Enterprise AI Weekly is now available for anyone to sign up at https://enterpriseaiweekly.com! Please share the link and encourage others who might find it interesting to sign up.

Enterprise AI Weekly Mini - fail fast!

Acknowledging the newsletter has grown quite long and quite technical, for the last few weeks I have published an (AI generated) “Mini” version - a quicker, and more accessible read. Although around 9% of you clicked through to the mini version in the first week, readership dropped off a cliff in week two. In today’s world it’s important to ‘fail fast’ when experimenting, and on that basis EAIW Mini won’t be returning this week. Don’t cry because it’s over. Smile because it happened. Not a Dr Seuss quote, as it turns out! 😊

Now, onto the rest of the news. Enjoy EAIW #37!

1. GitHub, Windsurf and Cursor level up their AIED

Yes, this item is long. AND, I’ve been quite selective in a busy week for AI Enhanced Development! 😁

This week’s GitHub Universe event saw a host of major announcements that are set to shape the way enterprises approach software development, collaboration, and code quality. From the introduction of a unified mission control for agents, to deeper integrations and metrics for Copilot, and a marked shift in how codebases are governed and improved, GitHub is staking its claim as the home for developer productivity, agentic workflows, and enterprise assurance. I’ve never been excited for a GitHub event before, yet here we are!

Heading image with the words 'Agent HQ' across the center, surrounded by 'Mission control,' 'GitHub Copilot CLI,' '3rd-party agents,' 'Plan mode,' 'AI controls,' and 'Copilot integrations.'

Front and centre was Agent HQ, which promises developers a single place to orchestrate and manage coding agents from across the ecosystem - including OpenAI, Anthropic, Google, xAI, Cognition and GitHub’s own Copilot. This “mission control” allows teams to assign, steer, and keep tabs on agents and their tasks, whether working via the GitHub website, mobile app, VS Code or the command line. For enterprise teams, it means coding agents run within GitHub’s identity controls, branch permissions, and audit logs, rather than a fragmented array of standalone tools with patchy governance. Developers can now tailor agents through custom prompts and sharing, ensuring AI agents fit the specific context and standards of the business.

Ui of GitHub Copilot code review showing Copilot suggesting changes

Copilot continues to strengthen its role in enterprise workflows. You can now assign tasks to Copilot from tools like Slack, Microsoft Teams, Linear, and Azure Boards, bringing greater automation and flow to collaborative environments. Agentic code reviews take a leap forward by combining Copilot with CodeQL - flagging security issues, applying fixes, and ensuring PRs are up to scratch before crossing the finish line. Enterprises get a new metrics dashboard and API for tracking Copilot adoption, feature usage, user activity, and model distribution across organisations.

Of particular interest for tech teams is the MCP Registry’s integration within VS Code - developers can discover and install Model Context Protocol servers (used for agentic and AI tool integration) in just a few clicks, with improved security and transparent metadata. This further accelerates adoption of agentic workflows and makes AI integration smoother and safer.

Companies now have access to GitHub Code Quality - a dashboard and governance toolkit for code maintainability, reliability and coverage across every repository. Organisations can audit, set security policies, manage agent workflows, and create allowlists for MCP servers, all from a central location. The platform now makes it easier for enterprises to set what agents can access, locking down key routes and providing audit trails for compliance.

A graphic showing key Octoverse 2025 metrics: 630 million total projects on GitHub, over 180 million developers, 1.12 billion total contributions, 4.3 million AI projects, 43.2 million pull requests merged per month (up 23% year over year), and TypeScript and Python as the top two languages used in 2025.

This year’s Octoverse report revealed some interesting trends too: more than 180 million developers now using GitHub, and TypeScript overtaking Python and JavaScript as the most popular language on the platform - a signal that typed languages and agent-powered coding are coming to dominate modern stacks. Rapid growth in Jupyter Notebooks and performance-oriented languages mirrors the surge in data-driven, exploratory AI work.

Diagram titled “AARDVARK — Vulnerability Discovery Agent Workflow” showing a process flow from Git repository to threat modeling, vulnerability discovery, validation sandbox, patching with Codex, and human review leading to a pull request.

GitHub isn’t the only company making waves this week. OpenAI has introduced Aardvark, an agentic security researcher powered by GPT-5, designed to help developers and security teams discover, validate, and fix vulnerabilities at scale. Unlike traditional security tools, Aardvark uses large language model reasoning and tool use to analyse code behaviour much like a human security researcher would - reading code, running tests, and using available tools to identify risks. It continuously monitors commits and changes in repositories, building a threat model, scanning history and new commits, validating exploits in sandbox environments, and generating precise patch fixes through OpenAI Codex for easy human review and one-click application.

Having already found dozens of vulnerabilities in both internal and open-source codebases, Aardvark aims to shift the balance in favour of defenders by offering proactive, integrated, and scalable security insights directly within development workflows - strengthening software security without slowing innovation. The tool is currently in private beta with plans to expand access as it matures.

Cognition, makers of Devin and recent acquirers of Windsurf, have announced the release of SWE-1.5 - a new model for software development that blends cutting-edge scale with remarkable speed. Built as a frontier-size model with hundreds of billions of parameters, SWE-1.5 achieves near state-of-the-art coding performance while delivering speeds up to 950 tokens per second - about 13 times faster than comparable models like Sonnet 4.5. This performance leap is enabled by a close partnership with Cerebras, whose ultra-high-performance inference hardware and custom optimisations allow SWE-1.5 to operate at unprecedented speeds in real-world developer workflows, exemplified by its integration within Cognition’s Windsurf platform. SWE-1.5 is not merely a model but a tightly integrated system where model, inference, and agent harness co-evolve, optimised for seamless developer experience, rapid context engineering, and robust coding results. Early adoption by senior engineers attests to its ability to keep tasks within developers’ flow windows, significantly accelerating routine coding tasks without compromising quality.

Not wanting to be left out, Cursor has announced major updates to their stack too. With Cursor 2.0, the introduction of Composer - another fast, frontier model optimised for low latency - enables most coding task turns to complete within 30 seconds, enable users to deliver rapid iteration and confident multi-step task execution. The new interface focuses on managing multiple agents in parallel, improving productivity by allowing several models to tackle the same problem simultaneously and selecting the best output. Cursor’s Cloud Agents extend this further by enabling agents to work autonomously in the cloud, freeing developers from device constraints and speeding up workflows such as bug fixes, task execution, and feature implementation directly from popular tools like Slack and GitHub. On the enterprise side, Cursor has introduced Hooks for observability and control over agent actions, Team Rules to enforce consistent coding practices across organisations, enhanced analytics for detailed AI usage insights, audit logs for full platform event visibility, and a Sandbox Mode for safer command execution. Together, these features help Cursor compete with GitHub, and empower enterprise teams to harness AI agents more securely, transparently, and effectively throughout the software lifecycle, driving faster delivery and reducing friction in code review and testing cycles.

A desktop code review and task management application beside its mobile counterpart.

The enterprise impact of these developments in AI Enhanced Development (AIED) is considerable, as the field is quickly becoming a fiercely competitive battleground. With GitHub Universe setting new standards for agent orchestration and governance, alongside OpenAI’s Aardvark redefining security automation, and Cognition’s SWE-1.5 and Cursor 2.0 advancing fast, intelligent coding agents, enterprises are equipped with unprecedented tools to boost developer productivity and security assurance.

These innovations enable smoother, faster, and more secure workflows, integrating AI agents deeply into established development ecosystems while providing strong governance and compliance controls. As TypeScript and agent-powered coding rise alongside data-driven AI workflows, businesses must adopt and align with these new capabilities to maintain competitive advantage, accelerate delivery, and safeguard their software supply chains in an increasingly complex digital landscape. AI-enhanced development is no longer a futuristic concept but a vital, rapidly expanding core pillar of enterprise software engineering.

2. Mistral AI Studio: From AI prototypes to production

Many enterprise AI teams have successfully built multiple prototypes - whether copilots, chatbots, summarisation tools, or internal Q&A systems. Despite capable models and clear business use cases, most projects stall before reaching reliable production deployment. This bottleneck does not stem from model performance but from the lack of robust operational infrastructure to track, evaluate, and govern AI outputs at scale. Teams struggle to monitor prompt and model changes, reproduce results, collect structured feedback, and deploy workflows under strict security and compliance constraints. Often, AI components are hardcoded into applications without proper evaluation or version control, leading to opaque improvements and stalled adoption.

Mistral AI Studio looks to address these challenges by providing a production-oriented platform designed to operationalise AI workflows with reliability, observability, and governance at its core. The platform integrates three core pillars: Observability, Agent Runtime, and AI Registry. The observability module provides full transparency into AI output quality with tools to inspect traffic, build evaluation datasets from real usage, and detect regressions through automated scoring mechanisms. The Agent Runtime offers a durable and transparent execution environment that can run complex multi-step AI tasks with fault tolerance and replayability across hybrid and self-hosted environments. Meanwhile, the AI Registry tracks every artefact across the AI lifecycle - agents, models, datasets, and prompts - maintaining lineage, ownership, versioning, and access controls, enabling oversight and reuse across enterprise systems.

As enterprises move beyond initial AI experimentation, the critical requirement is a platform that enables continuous improvement, safety, and control at the speed AI workloads demand. By combining detailed observability, reproducible execution, and unified asset control into one closed feedback loop, Mistral AI Studio transforms AI from a series of experiments into a dependable system. This platform is particularly relevant for businesses eager to operationalise AI with the same level of rigour applied to core software systems, ensuring secure, observable, and accountable AI deployments that align with enterprise security and compliance standards.

For businesses navigating the complex AI landscape, the transition from promising prototypes to scalable production systems remains a universal challenge. Mistral AI Studio exemplifies the trend towards build-operate capabilities that bring discipline and control to AI initiatives. Its focus on enterprise-grade governance, traceable feedback, and hybrid deployment aligns directly with the demands of large organisations balancing agility with security and regulatory compliance. Deploying AI at scale without losing control requires platforms like AI Studio, which encapsulate best practices around observability and operational resilience. As Chief AI Officer, understanding and leveraging such technologies ensures we not only innovate but embed AI responsibly and sustainably within our core business operations.

3. Airbnb reveals use of Qwen open source models

Airbnb’s CEO Brian Chesky has revealed that the company’s AI-driven customer service agent now relies primarily on Alibaba’s Qwen open-source models, a surprising admission given the use of closed-source models in many businesses. Speaking to Bloomberg, Chesky explained that while Airbnb employs 13 AI models from suppliers including OpenAI and Google, it chose Qwen as its backbone due to superior integration capability, cost efficiency, and speed. The decision underscores the growing credibility of Chinese open-source AI, which is increasingly competing with Western closed-source systems.

Chesky, a long-time associate of OpenAI’s Sam Altman, said ChatGPT’s integration was “not quite ready” for Airbnb’s complex, multi-intent customer interactions. Meanwhile, Alibaba’s Qwen models have matured rapidly and are being recognised for their adaptability, especially in multilingual and structured data tasks. This move follows Alibaba’s broader push to promote open adoption over proprietary strength, with chairman Joe Tsai remarking that the real contest in AI will be won by “who can adopt it faster”, not necessarily who creates the most capable foundation model.

Open-source models such as Qwen and DeepSeek are eroding the dominance of closed systems by offering cheaper, more flexible development pipelines - particularly valuable in customer-facing roles that require domain-specific tuning. It’s another indicator that AI strategy in global firms is moving beyond brand allegiance toward cost-performance balance and deployability.

In our own enterprises, there is a parallel: as AI deployments scale, model plurality is proving essential. Just as Airbnb uses a blend of proprietary and open technologies, many enterprises are now combining models to achieve resilience, control costs, and tailor AI behaviour to their context. The Qwen story is less about East vs West, and more about a pragmatic AI ecosystem, one that rewards integration speed and adaptability - the very qualities shaping enterprise AI delivery today.

4. Is the AI bubble really a bigger threat than tariffs?

Alan Beattie’s recent Financial Times piece (paywalled), “The AI bubble is a bigger global economic threat than US tariffs”, argues that while President Trump’s aggressive new tariffs have dominated headlines, a far greater risk to economic stability may be brewing inside the tech sector. The artificial intelligence boom, Beattie suggests, has inflated into a speculative bubble that carries systemic dangers eerily reminiscent of the dot-com era. The sheer concentration of wealth, capital spending, and investor optimism funnelled into a handful of AI firms poses a risk that extends far beyond technology markets, potentially distorting currency flows, fiscal behaviour, and global trade alike.

Underneath the rhetorical contrast between tariffs and technology lies a serious macroeconomic warning. Economies from the US to China are increasingly reliant on sustained AI investment to maintain growth. Yet several institutions - ranging from the IMF and Bank of England to private wealth funds - now warn that AI valuations have detached from plausible earnings. Capital expenditure on hyperscale data centres is growing faster than underlying demand, while chipmakers such as Nvidia and increasingly AMD have come to dominate index performance. Should revenue growth falter or access to cheap credit tighten, the consequences could reverberate through job markets, pension funds and even sovereign debt exposure, particularly in economies tied to semiconductor exports.

Other commentators have echoed these concerns with data to match. The BBC reports that AI-driven firms now account for about 80% of US stock market gains in 2025, while research from Gartner suggests global AI investment could hit £1.1 trillion by year-end. OpenAI’s vast capital plans and deals with Nvidia and AMD illustrate the sector’s scale, with projects like Stargate and Titan requiring hundreds of billions in infrastructure spending. Yet academics such as Stanford’s Anat Admati remain cautious, noting that recognising bubbles in real time is nearly impossible - confirmation only comes after collapse. Reuters also cites the Bank of England’s Financial Policy Committee, which recently described the risk of an AI-driven correction as “material” for the UK financial system.

Many corporates are betting heavily on AI-driven productivity gains - budgeting for model integration, cloud capacity, and data infrastructure in ways that may mirror the telecoms boom of the late 1990s. The pragmatic takeaway isn’t to pull back from AI adoption, but to ensure that investment cases are weighted toward measurable efficiency benefits rather than strategic ambition alone. The eventual correction, Beattie notes, wouldn’t invalidate AI’s long-term value, but it could reset expectations - and balance sheets - with frightening speed.

In the context of our enterprises, this analysis underscores the need for disciplined risk leadership. We don’t operate in isolation from markets that may be over-leveraged on AI optimism. Ensuring that our use of AI demonstrably improves efficiency, resilience, and capability, rather than simply tracking hype curves, will be central to weathering any eventual market correction.

5. Every Windows 11 PC is becoming an AI PC

Microsoft’s latest updates for Windows 11 position every Windows 11 PC as an “AI PC” through deep Copilot integration and agentic capabilities. Copilot, previously an extra feature, now sits at the heart of the Windows experience - ready to assist users by voice or text, actively interpret on-screen content, and even carry out tasks across desktop and local files.

The most striking element is Copilot’s new natural language interface, activated by the simple wake word “Hey Copilot”. Users can ask for help with almost anything - whether that’s navigating new software, troubleshooting, or getting creative recommendations - with Copilot now understanding both spoken and written context. Its Vision feature means Copilot can “see” what’s on your screen, providing real-time tips and step-by-step guidance within applications, or even reviewing and improving creative projects or business documents. Support for both voice and text-driven interaction is spreading globally, lowering the barriers for more users of all abilities to engage naturally with their device.

The taskbar’s Ask Copilot capability now allows easy transitions between tasks, making Copilot an always-available assistant. The agentic platform goes further by letting Copilot perform real actions on local files - such as sorting photos, extracting data from a PDF, or automating office workflows. This move towards general-purpose agents is supported by integrations across enterprise staples like Word, Excel, PowerPoint, OneDrive, Outlook, and even Google services. Copilot Connectors allow users to connect calendars, contacts, and emails so Copilot can fetch information from across the user’s digital life in a unified, privacy-centric flow.

Security remains foundational: all Copilot Actions are opt-in, fully auditable, and can be paused or overridden at any time. For enterprises, these new features significantly reduce friction for digital adoption and hybrid work, closing the gap between user intent and outcome without risk to corporate data, thanks to robust controls around permissions and activity logs. With Windows 10 support now ended, and Copilot+ PCs setting new standards for battery life, performance, and native AI experiences, the incentive to upgrade is not just about speed or security, but real productivity advantage. New device options across the major OEMs (Surface, Dell, Lenovo, HP, Samsung and more) mean enterprises can provision these experiences at a range of price points, all with the promise of fast, secure, on-device AI that works out of the box.

These enhancements present an opportunity to streamline workflows and boost productivity across our own enterprise. The combination of conversational and visual agentic support will help colleagues spend less time wrestling with applications or searching for information, and more time applying judgement and creativity where it’s needed most. With data controls and integration options spanning both Microsoft and Google’s ecosystems, the transition risk is minimal, but the efficiency gains are significant. As always, IT governance and careful enablement remain paramount, yet the groundwork is in place for a smarter, more responsive business desktop.

POB’s closing thoughts

In this week’s news I was particularly interested to see the adoption of Qwen by Airbnb. Use of models outside of the “big players” can sometimes feel taboo, but the potential benefits mean they definitely deserve consideration. The impressive performance of open-source models was emphasised this week with the release of Minimax M2, another Chinese model, which “achieves a new all-time-high Intelligence Index score for an open weights model and offers impressive efficiency with only 10B active parameters (200B total)”, per findings from Artificial Analysis.

Regular readers will know I’m a fan of Google’s NotebookLM research assistant, which gained some new features this week. The updates include significant backend improvements powered by the latest Gemini models, resulting in an 8x larger context window (up to 1 million tokens), and a 6x increase in conversation memory. These enhancements allow for more seamless, natural multi-turn conversations and deeper insights by exploring sources from multiple angles to synthesise nuanced, high-quality responses grounded strictly in user-provided documents.

Additionally, conversation histories are now automatically saved and can be resumed later, with privacy maintained in shared notebooks. A major feature expansion is the introduction of customisable chat goals, allowing users to set specific goals, voices, or roles for their AI assistant in each notebook. Examples include acting as a rigorous PhD research advisor, a lead marketing strategist with a focus on immediate action plans, or analysing material from multiple perspectives (academic, creative, skeptical). There is even a playful persona option as a Game Master for text-based simulations. This ability to define explicit goals and personas empowers users to get more targeted, context-aware, and productive interactions.

I’ll leave you this week with the latest podcast episode from Microsoft developer legend Scott Hanselman with fellow icon Mark Russinovich. The episode “The AI Productivity Trap: Senior Boost, Junior Drag” discusses how artificial intelligence is changing the software engineering landscape, particularly affecting developers at different career stages. Senior engineers benefit from an “AI boost” because their experience enables them to use AI tools effectively, increasing productivity significantly. Conversely, early-career engineers face an “AI drag” because they lack the necessary experience to effectively handle complex problems, often slowing down when relying on AI. The hosts stress the importance of structured learning environments, mentorship, and apprenticeship models to ensure early-career developers gain essential skills and knowledge transfer from senior engineers. They argue that companies should prioritise mentoring as a formal role for senior engineers to support the development of juniors, ensuring a sustainable talent pipeline despite current challenges in hiring and retention. The episode highlights the need for a long-term approach to professional growth in the era of AI-enhanced software engineering.

Thanks for reading, I hope you have a great weekend! 👍

I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. What AI topics are you most interested in for future explainers? Are there any specific AI tools or developments you'd like to see covered? Remember, if you have any questions around this Substack, AI or how Davies can help your business, you can reply to this message to reach me directly.

Finally, remember that while I may mention interesting new services in this post, you shouldn’t upload or enter business data into any external web service or application without ensuring it has been explicitly approved for use.

Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.

Enterprise AI Weekly