Enterprise AI Weekly #17
o3 goes pro, and cheap, people trust AI more than lawyers, Eric Schmidt thinks AI is underhyped, Google Portraits powers personas, self-healing workflows and is Memvid a hoax?
Welcome to Enterprise AI Weekly #17
Welcome to the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.
Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to enterprises of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on business.
If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page.
You’ll have spotted that I’ve made some changes to the branding of the Substack, and there will be additional tweaks over the coming weeks as I look at how I provide the right AI comms inside and outside of Davies. Stay tuned!
I hope you enjoy the post.
Explainer: What is OpenRouter?
OpenRouter is a platform that acts as a unified gateway to a vast array of Large Language Models (LLMs). Instead of needing separate accounts and API keys for models from OpenAI, Google, Anthropic, and various open-source providers, OpenRouter provides a single, consolidated interface. Developers can use one API key and a pay-as-you-go credit system to access over four hundred different models from more than sixty providers. This simplifies the process of integrating AI into applications, as it uses an OpenAI-compatible format, meaning code written for OpenAI's models can often work with OpenRouter with minimal changes.
The primary benefit of a service like OpenRouter is the flexibility and efficiency it offers, particularly in a business environment. It enables developers to easily compare the price and performance of different models, ensuring they can select the most cost-effective option for any given task without being locked into a single provider. Furthermore, OpenRouter is designed for high availability; its distributed infrastructure can automatically fall back to an alternative provider if a primary model experiences an outage, enhancing the reliability of AI-powered services. For businesses concerned with data privacy, it also allows for the creation of custom policies to control which models and providers can process their prompts.
When comparing OpenRouter to enterprise-focused platforms like Microsoft's Azure AI Foundry or Amazon Bedrock, it's helpful to think of them as serving different, albeit related, purposes. OpenRouter acts as an aggregator or “router”, offering the widest possible selection of models from across the entire market, including the very latest open-source releases. Its strength lies in choice, competitive pricing, and rapid experimentation. In contrast, Azure AI Foundry and Amazon Bedrock are integrated cloud platforms that provide a curated selection of first-party and third-party models within a secure, enterprise-grade ecosystem. Their main advantage is deep integration with other cloud services (such as Microsoft 365), unified security, governance, and a single vendor relationship, which is often a critical requirement for large organisations.
1. This week’s model news - OpenAI o3 Pro, o3 price cuts, and welcome, Magistral
OpenAI has just launched o3 Pro, its most advanced reasoning model to date, designed to step up performance in science, education, programming, business, and writing. o3 Pro builds on the o3 model’s step-by-step reasoning capabilities, but with greater clarity, comprehensiveness, and accuracy. It’s now available to ChatGPT Pro and Team users, with Enterprise and Edu customers getting access next week. Developers can tap into o3 Pro via the API, where it’s priced at $20 per million input tokens and $80 per million output tokens. Notably, o3 Pro can search the web, analyse files, reason about visuals, use Python, and personalise responses with memory. The main trade-off aside from the high cost? It’s a bit slower than its predecessor, o1 Pro, and currently doesn’t support image generation or Canvas, OpenAI’s collaborative workspace.
In a move sure to delight budget-conscious developers, OpenAI has also slashed o3’s API prices by a whopping 80%. Input tokens now cost just $2 per million, and output tokens $8 per million, making advanced reasoning more accessible for startups, research teams, and individual developers. This aggressive price drop positions o3 as a formidable competitor to Google’s Gemini 2.5 Pro and Anthropic’s Claude Opus 4. The new pricing strategy underscores a broader industry trend: high-quality AI is becoming more affordable and scalable, lowering the barrier for enterprise adoption and experimentation.
Meanwhile, European AI leader Mistral has entered the reasoning race with its new Magistral family of large language models. Magistral Medium targets enterprises, while Magistral Small, a 24-billion parameter model, is open-sourced under the Apache 2.0 licence. This gives developers and businesses the freedom to use, modify, and commercialise the model (more on that in next week’s post). Magistral’s strengths include transparency (with traceable reasoning chains), multilingual prowess, and impressive speed - Magistral Medium reportedly delivers up to ten times the token throughput of its competitors. Mistral is clearly aiming at high-stakes use cases in finance, legal, and data engineering, while also signalling a renewed commitment to openness in the AI ecosystem.
Adding a dash of suspense, OpenAI CEO Sam Altman has teased that their long-awaited open-weight model - originally slated for early summer - will now arrive “later this summer”, thanks to some “unexpected and quite amazing” research breakthroughs. The open model is expected to rival the reasoning capabilities of the O-series, and Altman promises it will be “very, very worth the wait”.
2. People trust legal advice generated by ChatGPT more than a lawyer
A fascinating new study has revealed that when it comes to legal advice, people without a legal background are more inclined to trust guidance from ChatGPT than from an actual lawyer. This preference is particularly strong when the source of the advice is not disclosed. In a study involving 288 participants, researchers found that when the origin of the legal counsel was hidden, individuals were more willing to act on AI-generated recommendations. Even when participants were explicitly told which advice came from a lawyer and which from AI, they showed a similar level of willingness to follow the guidance from both sources.
One potential reason for this surprising trust in AI is its communication style. The study suggested that large language models (LLMs) like ChatGPT tend to use more complex and sophisticated language, delivering their points concisely. In contrast, the human lawyers involved in the study often used simpler terminology but provided longer, more wordy explanations. The research also tested whether people could even distinguish between the two sources. The results showed they could, but only just. On a scale where 0.5 represented random guessing and 1.0 meant perfect accuracy, participants scored an average of UAC 0.59, indicating a very weak ability to tell the difference.
These findings raise significant concerns, primarily because LLMs are known for producing “hallucinations” – outputs that can be completely wrong or nonsensical, despite being delivered with a high degree of confidence. In a legal context, relying on such flawed advice could lead to unnecessary complications or even miscarriages of justice. Experts warn that this trend could disrupt traditional legal services and lead to a rise in vexatious litigation based on misleading AI-generated information. Consequently, there are growing calls for stronger regulation, such as the EU's AI Act which mandates that AI-generated content must be clearly labelled, alongside a greater public focus on improving AI literacy.
For us, this study is more than just a curiosity; it highlights a developing trend that directly impacts our work. We operate in a world of contracts, liability, and legal interpretation, and it is highly probable that claimants, policyholders, or third parties may be using these widely available AI tools to seek advice or formulate their arguments. This presents a risk that our claims handlers may have to navigate arguments based on flawed or “hallucinated” legal reasoning. However, it also underscores the immense value of genuine human expertise. Our ability to provide accurate, nuanced, and context-aware advice is something AI, in its current form, simply cannot replicate. This study serves as a crucial reminder to be confident in our expertise and to communicate it clearly and concisely, ensuring our clients understand the value of a real expert over an algorithm.
3. Eric Schmidt thinks AI is “wildly underhyped”
In a recent wide-ranging interview at TED2025, former Google CEO Eric Schmidt presented a compelling, if slightly alarming, perspective on artificial intelligence. While many of us feel we are at peak AI hype, Schmidt takes the contrarian view that the technology is, in fact, “wildly underhyped”. He argues that the public's perception, shaped by language models like ChatGPT, fails to grasp the true scale of the revolution underway. He traces the pivotal moment back to 2016, when DeepMind's AlphaGo created a novel move in the ancient game of Go, demonstrating that a machine could produce an idea that had eluded brilliant human minds for millennia. For Schmidt, this was the beginning of a new era, signalling the arrival of a “non-human intelligence” that he believes will be the most significant development for society in a thousand years.
Schmidt explains that the field is rapidly advancing beyond language generation into the far more complex domains of planning and strategy. He points to the next generation of AI systems which can perform deep research and are on a path towards becoming autonomous agents capable of running entire business processes. However, this leap forward comes with immense practical challenges. The computational power required is staggering, with Schmidt citing estimates that the US alone will need an additional ninety gigawatts of power, equivalent to ninety new nuclear power plants, which each take 6-10 years to build, to meet demand. Furthermore, having exhausted the readily available, high-quality data on the public internet, we are now entering a phase where AI models must learn by generating their own data, creating a new set of hurdles to overcome.
This incredible power brings with it what Schmidt calls “wicked hard problems”, particularly regarding geopolitical stability and safety. He outlines a tense dynamic between the US, which is primarily developing closed, proprietary models, and China, which is emerging as a leader in open-source AI. This creates a risk of dangerous capabilities proliferating rapidly, leading to a precarious situation he compares to nuclear deterrence. Schmidt worries that as one nation gets close to superintelligence, a competitor might feel compelled to take pre-emptive action, such as bombing a data centre, to prevent falling behind irreversibly. Despite these sobering risks, the potential upsides are equally transformative, from AI tutors for every child on the planet to AI-powered drug discovery that could eradicate diseases.
Schmidt's perspective is a crucial strategic guide for businesses. His vision of AI agents managing core processes is not a distant sci-fi concept; it is the next frontier we must prepare for. This requires us to look beyond simple generative AI applications and consider how more sophisticated planning and reasoning systems could revolutionise our operations, supply chain, and product development. The colossal energy requirements he highlights should also inform our infrastructure strategy, underscoring the importance of sustainable and efficient computing. Schmidt's starkest warning is a call to action: those who fail to adopt this technology will not remain relevant. We must treat this as a “marathon, not a sprint”, embedding AI adoption and learning into our daily work to ensure we are not just keeping pace, but leading the way.
4. Google Portraits demonstrates the power of persona adoption
In the ever-evolving landscape of artificial intelligence, a fascinating new experiment from Google Labs called “Portraits” aims to bring the wisdom of trusted experts directly to you through conversational AI. The first iteration of this technology features an AI representation of Kim Scott, the bestselling author of “Radical Candor”. The concept allows users to engage in AI-powered coaching sessions, asking for guidance on navigating difficult workplace situations or practising challenging conversations. The Portrait, powered by Google’s Gemini model, draws directly from Kim Scott's own books and materials to generate insightful responses in her voice, delivered via an illustrated avatar. It’s worth noting that as an opt-in Labs-only product, data is used for model improvement per Google’s experiment policy.
A core strength of the Large Language Models (LLMs) that underpin this technology is their remarkable ability to adopt specific personas or “role-play”. These models are trained on vast and diverse datasets, which gives them a deep understanding of language, context, and different communication styles. This allows them to generate human-like text that is coherent and appropriate for a specific role they are asked to assume, whether that is a teacher, a technical expert, or a motivational coach. By processing so much information, LLMs can identify patterns and correlations to create detailed and contextually rich character profiles, making their impersonations feel realistic and nuanced.
What makes Google's Portraits experiment distinct is that the LLM is not simply acting out a generic persona of a “leadership coach”. Instead, it is directly linked to and founded upon the authentic content provided by the real-world expert, in this case, Kim Scott. This grounding ensures that the AI's conversations remain focused within the expert's specific domain and reflect their unique principles. Google notes that the project is still in its early stages and has undergone extensive testing, with feedback mechanisms in place to help shape its future development. This curated approach may also help to mitigate a known challenge with AI-generated personas, which can sometimes tend towards stereotypes without careful guidance and verification.
There are many potential applications of this “expert-in-a-box” technology within our business. Imagine being able to provide new claims handlers with an AI Portrait of one of our most experienced technical experts. They could ask it questions about policy interpretation or best practice for handling a complex claim, effectively receiving personalised coaching on demand. This could accelerate learning and build confidence. Furthermore, such a tool could be developed to help all our colleagues practice difficult conversations, whether with clients, customers, or partners, in a safe and repeatable environment. By simulating real-life scenarios, these AI Portraits could become powerful tools for professional development, ensuring a consistent and high standard of expertise across the entire business.
5. Introducing deterministic, self-healing workflows from browser-use
A new Alpha stage project from the creators of Browser Use, called workflow-use, aims to revolutionise how we automate repetitive browser-based tasks. Positioned as “RPA 2.0”, the tool is designed to create deterministic and self-healing workflows, moving beyond the limitations of traditional Robotic Process Automation (RPA). Born out of customer demand for more reliable and predictable automation, workflow-use allows a user to simply demonstrate a task once by recording their actions. The system then automatically generates a structured workflow that can be replayed consistently, addressing a common frustration with AI agents which can require extensive prompting to perform the same task repeatedly.
The core principle of workflow-use is “show, don't prompt”. Users record their browser interactions, and the system intelligently filters out noise to create a structured, executable workflow stored in a .json file. These workflows are designed to be fast, reliable, and can automatically extract variables from elements like web forms. A key feature is its resilience; if a step in the recorded workflow fails, it can fall back to the more flexible, agentic Browser Use to attempt to complete the task. The project's long-term vision includes “self-healing”, where an AI agent would not only handle a failure but also automatically update the workflow file to prevent the error from recurring.
Getting started with workflow-use involves a browser extension for recording and a command-line interface (CLI) for creating, managing, and executing the workflows. Once a workflow is recorded, it can be run in two main ways: either by providing predefined variables for it to use, or by running it “as a tool” where a large language model (LLM) interprets a natural language prompt to fill in the necessary information, such as populating a form with example data. For a more visual approach, the project also includes a graphical user interface (GUI) to manage, view, and execute workflows. While the project is in its early stages and not yet recommended for production use, its features point towards a more robust future for automation.
The emergence of tools like workflow-use is interesting as it represents the next step in process automation. We have previously discussed agentic AI, workflow automation platforms, and the Model Context Protocol (MCP) that allows AI agents to use external tools. This project neatly ties these concepts together, offering a practical way to build the very tools that more advanced AI agents could leverage. The ability to record a repetitive, multi-step process - such as a specific claims data entry task or navigating between several systems - and convert it into a reliable, fast, and reusable automated workflow has clear efficiency benefits. The “self-healing” vision is particularly compelling, as it promises to reduce the maintenance burden often associated with traditional RPA, freeing up our colleagues to focus on more complex, value-adding activities that require human expertise.
Bonus item: Memvid - video-based AI Memory - fab or fake?
The Memvid project recently made waves in the AI community with its bold proposal: to store and retrieve large text datasets by encoding them as QR codes within video files, promising ultra-efficient, offline-accessible, and highly compressed AI memory. On paper, it looked both clever and feasible - a creative twist on Retrieval-Augmented Generation (RAG) and vector search, but with the original text chunks embedded in a video for portability and cost savings.
The core concept is straightforward. Text is split into chunks (typically using established methods like recursive character splitting to keep each piece semantically coherent), each chunk is converted into a QR code, and these codes are stored as frames in an MP4 file. When a user queries the system, it uses standard semantic search (often with FAISS or similar libraries) to find the relevant chunk, then decodes the corresponding QR frame from the video. This sidesteps the need for a traditional database, and the entire knowledge base becomes a single, highly portable file. The approach is certainly inventive, and in theory, it could be practical for scenarios where infrastructure is limited or where offline access is paramount.
However, recent scrutiny has cast serious doubt on whether Memvid is a genuine, working solution or simply an elaborate demonstration - or even a tongue-in-cheek commentary on AI hype. Critics have pointed out that the underlying retrieval mechanism is essentially standard FAISS semantic search, with the only novelty being that the text is stored as QR codes in a video rather than as plain text in a database. This extra layer of encoding and decoding adds complexity without obvious functional benefit, and some have gone as far as to call it “insanely convoluted” or even a “troll”. The lack of robust, independent validation or real-world use cases has further fuelled scepticism.
This episode is a timely reminder for all of us of the importance of validating open-source projects before investing time or resources. The AI ecosystem is brimming with fascinating ideas, but not every clever-looking project will stand up to scrutiny or deliver practical value. As we look to the future, with memory and RAG technologies evolving rapidly and promising real step-changes in AI’s usefulness, it’s vital to distinguish between genuine breakthroughs and eye-catching but unproven experiments. For our business, this means maintaining a healthy scepticism, insisting on independent validation, and always testing new tools thoroughly before considering them for integration into our workflows.
And that wraps up this week! I was expecting it to be a quite one on the news front, but then OpenAI and Mistral released at the last minute! Still no DeepSeek R2, which is likely to shake up reasoning models once again.
Regular readers will know I have created a Teams channel (Davies only) to discuss topics mentioned in this post, and AI in general, with your fellow readers, and of course me too. To join, use this link. This is likely to move to the Substack chat in the coming weeks.
I also regularly post the things that made my Pocket list, but didn’t make it to the post - “EAIW Extra” - to Teams. Since Mozilla will soon retire Pocket, I'll be migrating my collections to Raindrop.
I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. What AI topics are you most interested in for future explainers? Are there any specific AI tools or developments you'd like to see covered? Remember, if you want to get in touch, you can reply to this message to reach me directly.
Remember that while I may mention interesting new services in this post, you shouldn’t upload or enter business data into any external web service or application without ensuring it has been explicitly approved for use.
Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.