Enterprise AI Weekly #10

How LLMs are trained, Gemini Flash gets a clever 2.5 release, Copilot gets an agentic update, Google wants to help you Colab, what it means to be a Frontier Firm and the challenges of swapping LLMs.

Apr 25, 2025

Welcome to Enterprise AI Weekly #10

Welcome to the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.

Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to business of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.

If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page.

Last week, I asked for your feedback. Thank you to everyone who responded - if you haven't submitted your feedback yet, you can do so on this page. The feedback so far has been trending a little towards ‘the post is too long’ (fair), and suggests the technical level is about right. In the spirit of responding to feedback, I’ll try and be a little more succinct in this issue, with links to follow for more detail.

And with that, let’s dive into #10!

Explainer: How are LLMs trained?

Large Language Models (LLMs) are built using a combination of advanced training methods that allow them to understand and generate human-like text. The process often begins with unsupervised pre-training, where the model learns from vast amounts of raw, unlabelled text data - think books, articles, and websites. In this phase, the model isn’t given explicit answers or labels; instead, it learns by predicting missing words or the next word in sentences, gradually uncovering the patterns, structure, and nuances of language on its own.

After this foundational stage, the model may undergo supervised learning, where it is further trained on labelled datasets. Here, each example comes with a "correct" answer - such as a question paired with its ideal response. This helps the model refine its abilities for specific tasks, improving its accuracy and usefulness in real-world scenarios.

To make LLMs even more helpful and safe, developers use reinforcement learning, particularly Reinforcement Learning from Human Feedback (RLHF). In this method, the model generates responses which are then rated or ranked by humans based on quality, safety, and usefulness. The feedback is used to adjust the model’s behaviour, teaching it to produce better and more reliable outputs. Additional fine-tuning on specialised datasets can further tailor the model for particular industries or business needs.

Understanding how LLMs are trained helps us make smarter decisions about how and where to deploy AI in our operations. The quality of the training process directly affects the model’s reliability, safety, and suitability for business-critical applications. By staying informed about these methods, we can better evaluate AI solutions, ensure ethical use, and maintain a competitive edge as technology evolves.

1. Another week, another Google model!

Don’t switch off - yet again, this is a cool new release! Google has unveiled Gemini 2.5 Flash, the latest evolution in its family of generative AI models, now available in preview via the Gemini API, Google AI Studio (free!), and Vertex AI. Building on the foundations of the well-received Gemini 2.0 Flash model, this latest version introduces a significant leap in reasoning capabilities while maintaining its hallmark strengths: speed and cost efficiency. What sets Gemini 2.5 Flash apart is its status as Google’s first fully hybrid reasoning model - developers can now toggle the advanced “thinking” processes on or off and even set custom “thinking budgets” to fine-tune the balance between quality, cost, and latency. This means tasks demanding deep analysis, such as complex problem-solving or multi-step research, can benefit from more accurate and thoughtful responses, while simpler queries can be processed at lightning speed and lower cost.

The model’s flexibility is further enhanced by its support for a one million token context window, enabling it to handle long documents, full codebases, and intricate workflows without losing context or coherence (in theory, per last week’s post). Gemini 2.5 Flash is engineered for high-volume, real-time applications - think customer service bots, virtual assistants, and large-scale summarisation tools - where responsiveness and predictable costs are paramount. Its dynamic reasoning controls allow businesses to optimise performance for specific use cases, ensuring that AI-powered solutions remain both scalable and cost-effective.

For businesses, Gemini 2.5 Flash represents a new standard in operational AI: it delivers enterprise-grade reasoning and reliability at a price point and speed suitable for large-scale deployment. The ability to tailor the model’s reasoning depth and control costs directly addresses the demands of modern enterprises, where compliance, and efficiency are non-negotiable.

2. Microsoft Copilot 365 gets upgraded ‘for the era of human–agent collaboration’

Hot on the heels of my complaints about the lagging capabilities of Microsoft Copilot 365, it’s getting an upgrade, rolling out in May.

Catchily known as the ‘Wave 2 spring release of Microsoft 365 Copilot’, the release is said to mark a significant leap toward what Microsoft calls the “era of human-agent collaboration.” The redesigned Copilot app is now positioned as the central interface for interacting with a new generation of AI-powered agents - digital teammates that work alongside users to streamline complex tasks, drive insights, and unlock creativity across the Microsoft 365 suite.

Key updates include advanced AI-powered search for faster information retrieval across the organisation, a new “Create” experience that democratises design and content generation, and most interestingly, Copilot Notebooks for instant data-driven insights and actions (I’m looking forward to getting access to this). The release also brings Microsoft’s specialised “reasoning agents” - such as Researcher and Analyst, as features in EAIW #7 - designed to tackle complex analytical and research tasks that previously required expert human intervention. These agents can synthesise information from meetings, emails, and business data, providing actionable recommendations and surfacing connections that might otherwise go unnoticed. The new “Agent Store” further expands Copilot’s capabilities, allowing users to discover, deploy, and manage both Microsoft and third-party agents directly within their workflow.

An animated Microsoft 365 Copilot UI screen mocked up in a laptop. The screen displays the following prompt with results: What are the latest design tasks for Microsoft 365 Copilot?

By integrating Copilot’s new agents and adaptive memory into our daily workflows, we can automate repetitive tasks, accelerate research, and make smarter, data-driven decisions - freeing up valuable time for higher-level strategic work. The centralised Copilot hub and Agent Store mean we can tailor AI capabilities to our unique needs, while robust governance tools ensure security and compliance. Perhaps the Agent Store will also provide an opportunity to offer AI agents, created by us, outside the organisation.

3. Google Colab could be a game changer for data analysis

Google Colab, short for Colaboratory, is a free, cloud-based platform that allows users to write and execute Python code directly in their web browser - no setup required. Built on the popular Jupyter Notebook framework, Colab provides access to powerful computing resources, including GPUs and TPUs, making it especially valuable for tasks like data analysis, machine learning, and deep learning. Notebooks are stored in Google Drive, enabling seamless sharing, real-time collaboration, and automatic saving - much like Google Docs - so teams can work together on code, data, and visualisations from anywhere. With pre-installed libraries and easy integration with Google services, Colab is an accessible and versatile tool.

Google Colab has taken a significant leap forward in data analysis with the introduction of its new Data Science Agent, powered by Gemini. This tool transforms the traditional workflow by allowing users to generate fully functional, executable Jupyter notebooks simply by describing their data analysis goals in natural language. Instead of spending time on repetitive setup tasks - like importing libraries, loading data, and writing boilerplate code - users can now upload their data, outline their objectives in the Gemini side panel, and watch as the agent produces a complete notebook ready for immediate use.

The Data Science Agent is designed to handle a wide range of tasks, from data cleaning and exploration to advanced visualisations and machine learning model building. It integrates seamlessly with popular Python libraries such as Pandas, NumPy, and Matplotlib, and leverages Google’s cloud infrastructure for scalable, collaborative work. Notably, the agent can answer questions about your data, suggest statistical techniques, and even fix code errors, all through intuitive prompts. Its performance has already been recognised, achieving fourth place in the DABStep benchmark for multi-step reasoning - outperforming some agents based on GPT-4.

For businesses, the Data Science Agent in Google Colab offers a powerful way to accelerate analytics and decision-making. By automating the most time-consuming aspects of data science, teams can focus on extracting insights and driving strategic value rather than getting bogged down in setup and coding. This democratises access to advanced analytics, enabling non-technical staff to contribute meaningfully to data projects and fostering cross-functional collaboration. The tool promises faster turnaround on business intelligence, more agile responses to market changes, and a significant boost in productivity across data-driven teams.

4. Microsoft's 2025 work trend index: meet the Frontier Firm

Microsoft’s 2025 Work Trend Index, a highly recommended read, declares 2025 as “the year the Frontier Firm is born.” This year’s report, based on insights from 31,000 professionals across thirty-one countries and vast data from LinkedIn and Microsoft 365, reveals a seismic shift: organisations are moving beyond AI experimentation and are now fundamentally rebuilding around AI and digital agents. The concept of the “Frontier Firm” is central - these are organisations structured around “intelligence on tap,” where hybrid teams of humans and AI agents can collaborate to drive efficiency, agility, and innovation at scale.

Key findings highlight that AI literacy is now the most in-demand skill for 2025, outpacing even technical capabilities as companies seek employees who can pair deep AI knowledge with human strengths like adaptability, conflict mitigation, and innovative thinking. The report notes that 82% of leaders believe this is a pivotal year to rethink business strategies, and 81% expect AI agents to be moderately or extensively integrated into their company’s operations within the next 12-18 months. While 53% of leaders say productivity must improve, 80% of employees and executives feel they lack the time and energy to meet rising expectations, underscoring the urgent need for scalable, AI-driven solutions. Notably, “Frontier Firms” are already seeing results: 71% of workers at these companies say their organisation is thriving, compared to just 37% globally.

The implications for our business could be significant. As the workplace transforms, the ability to seamlessly integrate human expertise with AI-driven intelligence will define our competitive edge. Embracing the principles of the Frontier Firm means not only investing in AI tools, but also fostering a culture where continuous learning, adaptability, and human-AI collaboration are at the core of our operations. By moving quickly to upskill our teams and reimagine our workflows, we position ourselves to unlock new value, drive innovation, and lead in this new era of work.

5. Swapping LLMs isn’t plug and play

A recent VentureBeat article dives into the often-overlooked complexities of swapping out LLMs in enterprise environments. While it might seem like replacing one LLM with another should be a straightforward, plug-and-play process, the reality is far more nuanced and challenging. The article highlights that each LLM comes with its own set of APIs, data formats, fine-tuning requirements, and performance quirks. This means that migrating to a new model can require significant changes to application logic, retraining of teams, and revalidation of outputs to ensure business continuity and compliance.

Many enterprises are discovering that the costs of model migration go well beyond licensing fees or API integration. There are hidden expenses in retooling workflows, revalidating data pipelines, and adapting user interfaces to accommodate the new model’s strengths and weaknesses. Additionally, issues like latency, output variability, and compatibility with existing infrastructure can introduce unexpected delays and risks. The article underscores the importance of thorough planning, robust testing, and clear communication across teams to mitigate these challenges and avoid business disruption.

As we continue to scale our AI initiatives, understanding the true cost and complexity of LLM migration is critical. By proactively addressing the hidden challenges of model migration - such as integration, validation, and user retraining - we can minimise operational risks and maintain our competitive edge. Staying informed about these industry insights will help us make smarter, more resilient technology decisions as the AI landscape evolves.

POB’s closing thoughts

I was pleased to see this week that Meta has added more features to their Ray Ban Meta smart glasses, which I am a huge fan of. Features migrating from the Beta programme to general release include live translation, the ability to send messages and make calls through Instagram, and conversations with Meta AI based on what you’re currently looking at. I’m looking forward to Live AI, which allows the Meta AI smart assistant to continuously see what you do for more natural conversations.

There have been some more positive changes in the vibe coding space this week. Windsurf has simplified its pricing, Bolt has made major improvements to their design capabilities and Lovable has been upgraded to version 2.0. A key feature in the new Lovable release is what they call ‘multi-player’ - collaboration between multiple users simultaneously - which has been conspicuously absent in these tools up to now.

Finally, I spotted this week that Hertz plans to use AI in their vehicle inspection process. What could possibly go wrong. 😀

Happy Friday, and hope you have a great weekend. 👍

Thanks for reading and I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. Remember, if you have any questions around AI at Davies, you can reply to this message to reach me directly or drop a note to the AI mailbox.

If you’re reading this for the first time, you can read previous posts at the Davies AI Substack page.

I have also created a Teams channel to discuss topics mentioned in this post , and AI in general, with your fellow readers, and of course me too. To join, use this link. I’ll also post the things that made my Pocket list, but didn’t make it to the post!

Finally, remember that while I may mention interesting new services in this post, you shouldn’t put business data in any web service or application without ensuring it has been approved for use.

Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.

Enterprise AI Weekly