Enterprise AI Weekly #7

Understanding confidence, what is Enterprise Vibe, MS unleashes Researcher and Analyst, a peek inside Claude's brain, Sundar tweets and are the Clone Wars looming?

Apr 03, 2025

Welcome to Enterprise AI Weekly #7

Welcome to the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.

Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to business of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.

If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page.

Before we start…

I have had several enquiries from readers asking whether folks outside Davies can be added to the Davies AI post distribution. Right now, the post is designed primarily for internal audiences, so while I’m happy for you to forward it on, I’m not able to add non-Davies recipients to the distribution list. That said, I do believe that there is value in reaching our existing clients and potential new clients in this way, so I am exploring the possibility of creating a complementary external version of the post. If you think this would be valuable, please reply to this message, and let me know!

Last week in EAIW #6, I waxed lyrical about OpenAI’s new image generating model, deployed in the form of an update to GPT-4o. Previously it was only available to paid accounts, but this week it’s also available on the free tier (with usage limits). You can now try it for yourself, and I guarantee you’ll be impressed!

Explainer: Confidence

One of the big questions around AI relates to trust, so how do we establish confidence in the output from LLMs?

Confidence scores are essential tools for evaluating the reliability of predictions in both Large Language Models (LLMs) and traditional machine learning (ML) models. However, the techniques for deriving these scores differ significantly.

In LLMs, confidence scores often stem from token probabilities (remember, a token is a word or part of a word in the output), derived from the model's internal “logits”. Logits represent the likelihood of each token in a sequence, and their aggregation provides an overall confidence score for a response. Another approach involves verbalised confidence, where the model explicitly states its certainty in natural language. While intuitive, this method can lead to overconfidence due to the human-like phrasing of responses. Entropy-based metrics (measuring how "uncertain" the model is across all possible tokens) and sampling multiple outputs to assess consistency are used to refine confidence estimation.

Despite these methods, LLM confidence scores can be non-deterministic - repeated queries on identical inputs may yield varying scores - posing challenges for reliability.

In contrast, traditional ML models typically produce deterministic confidence scores through well-calibrated probability distributions. These scores directly reflect the statistical likelihood of correctness based on training data. Unlike LLMs, ML models often rely on predefined thresholds for decision-making, ensuring consistency across predictions.

Understanding these differences is critical for deploying AI systems effectively across diverse applications. The key distinction lies in how confidence is interpreted: LLMs focus on token-level probabilities and verbalised outputs but are prone to variability and overconfidence. ML models emphasise statistically grounded, deterministic scores that align closely with their training data.

1. Enterprise Vibe

Back in EAIW #3, I introduced “Vibe Coding”, a rapidly growing movement related to using AI tools to develop software using purely the capabilities of LLMs to write code. While companies won’t be working in exactly this way, the concepts are interesting, and the folks in Consulting and Technology have been exploring what it might mean for Davies.

A paper prepared by the team introduces Enterprise Vibe, a transformative framework for AI-driven software development that promises to revolutionise how organisations deliver technology. By integrating AI-assisted coding into every facet of the development lifecycle, Enterprise Vibe enables teams to drastically reduce development time, optimise resource use, enhance product quality, and speed up time-to-market. The approach shifts developers' focus from routine coding tasks to high-value activities like architecture, business logic, and user experience design.

Central to Enterprise Vibe is the concept of "Vibe Teams"- small, agile groups comprising a Product Lead, Full-Stack Engineer, and a GTM Engineer. These teams leverage AI tools to prototype, test, and deploy solutions within structured five-day cycles. These teams are supported by centralised expert functions ensuring governance, compliance, and scalability. The framework also includes robust change management strategies to address risks such as staff adoption challenges, governance concerns, and data security issues. Enterprise Vibe potentially positions organisations to gain a competitive edge through faster innovation and improved operational efficiency.

For businesses, adopting Enterprise Vibe represents an opportunity to lead in AI-powered development while enhancing our internal processes. By embedding AI into our workflows, we can accelerate project delivery timelines and free up our teams to focus on strategic innovation rather than repetitive tasks. The structured approach outlined in the document aligns with our goals of improving collaboration across departments while maintaining governance and security standards. Furthermore, embracing this paradigm shift will help us attract top talent eager to work in cutting-edge environments and ensure we remain competitive in a rapidly evolving market. Implementing Enterprise Vibe could be the catalyst for unlocking new levels of productivity and creativity in our business operations.

2. Researcher and Analyst modes are coming to CoPilot

Microsoft has unveiled two AI-powered reasoning agents, Researcher and Analyst, as part of its Microsoft 365 Copilot suite. These tools are designed to elevate workplace productivity by providing expert-level assistance with secure access to both internal work data - such as emails, meetings, files, and chats - and external web sources. The rollout begins this month through the "Frontier" programme, offering early access to these innovations (and arriving for everyone else later, although we are liaising with Microsoft to understand more about the programme).

Researcher is tailored for complex, multi-step research tasks. It combines OpenAI’s advanced deep research model with Microsoft 365 Copilot’s orchestration and search capabilities. This agent enables users to identify market opportunities, develop go-to-market strategies and generate detailed reports by integrating internal work data with external sources like ServiceNow, and Confluence. Its ability to leverage third-party data connectors helps ensure comprehensive insights.

Meanwhile, Analyst operates as a virtual data scientist, employing OpenAI’s o3-mini reasoning model to perform advanced data analysis. Using chain-of-thought reasoning and Python programming capabilities, Analyst can transform raw data into actionable insights such as demand forecasts, customer purchasing patterns, or revenue projections. Users can view its iterative problem-solving process and validate its findings in real time.

These tools are likely to be valuable across a wide range of roles, but more than that, it’s encouraging to see Microsoft working to ensure the Copilot feature set doesn’t fall behind that of other providers. While alternatives such as ChatGPT or Perplexity might offer a richer feature set, the deep integration between Copilot and Microsoft 365 data is invaluable.

3. Tracing the Thoughts of Large Language Models

Anthropic has unveiled two new research papers that delve into the inner workings of its language model, Claude 3.5 Haiku, using innovative interpretability techniques. These studies aim to shed light on how AI models process information, make decisions, and generate outputs. Inspired by neuroscience, Anthropic has developed a conceptual "AI microscope" to trace computational circuits within the model, revealing how inputs are processed and transformed into coherent outputs. The research explores key behaviors such as multilingual processing, planning in poetry composition, mental math strategies, and reasoning fidelity.

One striking finding is that Claude operates with a form of "universal language of thought," enabling it to process concepts across languages in a shared cognitive space.

Additionally, contrary to assumptions that language models predict text one word at a time without foresight, Claude demonstrates advanced planning capabilities. For example, when generating rhyming poetry, it anticipates future words and structures sentences accordingly.

The research also highlights areas of concern: at times, Claude fabricates plausible-sounding reasoning to align with user expectations or produces hallucinations due to misfiring circuits. These insights not only advance scientific understanding but also have practical implications for improving AI reliability and transparency, and questions how much we truly understand the inner workings of modern, complex LLMs. The post summarising the papers is well worth a read.

For businesses, this research underscores the potential and challenges of deploying AI systems in critical business contexts. Enhanced interpretability could improve trustworthiness in AI-driven decision-making processes. However, it also highlights the need for robust oversight mechanisms to mitigate risks like hallucinations or reasoning errors. Investing in AI tools with transparent mechanisms and ensuring we continue to work on safety and governance will be key to positioning a company as an innovator in leveraging cutting-edge technology, while maintaining operational integrity and compliance standards.

4. Microsoft hops on the MCP bus

In last week's EAIW #6, I wrote about the adoption of MCP - Model Control Protocol, the standard for connecting systems to LLMs - by OpenAI. The momentum shows no sign of abating.

Microsoft has unveiled the integration of the MCP into Copilot Studio, a significant step toward simplifying the development of AI-powered agents. This integration allows users to add AI apps and agents to Copilot Studio with minimal effort while ensuring enterprise-grade security, real-time data access, and reduced maintenance costs. Key features include dynamic updates to tools, SDK support for custom integrations, and a marketplace of pre-built connectors. These advancements aim to streamline workflows, enhance interoperability, and reduce complexity in managing AI applications.

We are yet to see movement from Google on MCP, however a tweet from Google CEO Sundar Pichai this week - “To MCP or not to MCP” - has sparked discussions about Google's potential adoption of the standard.

In another development this week, MinIO became the first storage vendor to adopt MCP, enabling AI agents to interact with its AIStor platform via natural language commands. Additionally, Cloudflare announced support for remote MCP servers, broadening accessibility by allowing developers to deploy MCP servers globally without requiring local installations, enabling broader deployment flexibility.

I won’t talk about MCP next week, I promise! 😀

We know that by leveraging MCP-enabled systems, we could potentially integrate AI agents more effectively into our processes and operations. Adopting MCP could reduce vendor lock-in risks by standardising integrations across multiple AI providers, providing the ability to experiment with different AI models or tools without disrupting existing workflows. The enhanced interoperability and reduced maintenance costs associated with MCP could lead to operational efficiencies and faster deployment of innovative AI-driven solutions tailored to client needs.

5. H&M: Episode II - Attack of the clones

H&M, the Swedish fashion giant, has revealed plans to create 30 "digital twins" of its models using generative AI. These AI-powered replicas will be used in social media posts and marketing campaigns, provided the models give their consent. The initiative is part of H&M’s exploration into leveraging new technologies to showcase fashion in creative ways while maintaining what it describes as a "human-centric approach." Models will retain ownership of their digital likenesses and receive compensation for their use, with watermarks included on images to indicate their AI origin.

While H&M highlights the potential of this technology to enhance creativity and efficiency, the move has sparked significant backlash. Critics, including influencer Morgan Riddle, have raised concerns about job displacement for professionals involved in traditional photoshoots, such as photographers, make-up artists, and stylists. Trade unions, like Equity in the UK, have also emphasised the need for stronger protections to ensure fair pay and control over digital likenesses in an era of increasing AI adoption. Despite these concerns, some models see benefits, such as reduced travel requirements and expanded opportunities for collaboration.

This development reflects broader trends in AI adoption that could influence industries beyond fashion. The use of digital twins demonstrates how AI can streamline operations while reducing costs. However, it also underscores a critical challenge: balancing technological innovation with its social and workforce implications. As businesses increasingly explore AI-driven efficiencies, there may be growing societal acceptance of job displacement in certain roles. Proactively investing in upskilling employees and fostering transparency around AI use will be essential to maintaining trust and ensuring a sustainable integration of such technologies.

POB’s closing thought(s)

It’s very trendy in tech nowadays to do “launch weeks”, where you effectively announce something every day across the course of a week. Langflow have been running one this week, but it’s a common occurrence that was (maybe) popularised by Cloudflare. In following Langflow’s releases, I happened upon the news that the company, part of Datastax - who specialise in database and agentic solutions - is being acquired by IBM. An interesting move I thought.

Talking of splashing the cash, Gartner posted earlier this week forecasting that “Worldwide GenAI Spending to Reach $644 Billion in 2025”. I’m not sure it’s slowing down either, so it’d be interesting to read what their 2026 prediction is!

Finally, OpenAI this week launched “OpenAI Academy”, an extremely diverse collection of content to help a broad spectrum of people “unlock the opportunities of the AI era by equipping yourself with the knowledge and skills to harness artificial intelligence effectively”. It does look impressive, with everything from an “AI for older adults” livestream to an “Assistants and Agents build hour”. No jokes please about which one of those I should be attending! 👴

Of course, there is yet more to read about in AI this week, so remember to head on over to our Teams channel to get the full list of interesting topics.

Enjoy the rest of your week, and have a great weekend. 👍

Thanks for reading and I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. Remember, if you have any questions around AI at Davies, you can reply to this message to reach me directly or drop a note to the AI mailbox.

If you’re reading this for the first time, you can read previous posts at the Davies AI Substack page.

I have also created a Teams channel to discuss topics mentioned in this post, and AI in general, with your fellow readers, and of course me too. To join, use this link. I’ll also post the things that made my Pocket list, but didn’t make it to the post!

Finally, remember that while I may mention interesting new services in this post, you shouldn’t put business data in any web service or application without ensuring it has been approved for use.

Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.

Enterprise AI Weekly