Enterprise AI Weekly #25

We go crazy in Vibe hour 2, GPT-5 arrives with gpt-oss, Genie 3 enables real-time world generation, Claude 4.1 continues to excel in coding and Ideogram character bring consistency to AI images.

Aug 08, 2025

Welcome to Enterprise AI Weekly #25

You’re reading the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.

Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to businesses of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.

We’re also working on something together. I’m building an app, Boring Expenses, in a Vibe Coding style, to demonstrate the process and to give us a test bed for technologies we talk about in future issues. I previously mentioned that I set aside a bit of time in my week for keeping up with tech and doing this sort of thing - usually a Sunday morning - so I’m setting aside an hour each week to progress our experiment.

If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page. Enterprise AI Weekly is now available for anyone to sign up at https://enterpriseaiweekly.com! Please share the link and encourage others who might find it interesting to sign up.

You probably already know that GPT-5 has arrived from OpenAI since our last issue, but it’s been an exciting week in AI otherwise, too! So, buckle up for Hour 2 of ‘Vibe with POB’ and the latest news to brighten up your Friday.

Enjoy EAIW #25!

Vibe with POB: Hour 2 - how far can we get?

Welcome back! In last week’s ‘hour 1’, we managed to complete a ton of platform setup, create and deploy a marketing site, and really set the foundations for the build-out of our expenses app MVP. For hour two, I’ve set myself an ambitious target - can we get enough of the app working to be able to add and manage expenses, including using AI to extract data from receipt images?

We could approach this in a couple of ways. The first is to build out more extensive documentation than our ‘wish list’ from EAIW #23, either manually, using a generic AI tool or using a specialised AI tool such as CreateMVPs or ChatPRD. To provide an example of what the output from these tools looks like, I uploaded our previous feature ideas into both and saved the output to our repo. In the interest of making rapid progress, factoring in that we’re not trying to build a production system and given this isn’t my first Vibe, I’m going with the second build option - just crack on! 😎

We have a marketing site already, but I see the solution split into two halves. The marketing site, and the app itself. The best way to start with the build out of the app is to use AI to implement a new area of the site with basic login, logout, and profile functionality. We’ve already connected Supabase which has excellent auth capabilities, so I prompt Bolt to create the site using Supabase’s email OTP (one-time-password) flow. Why? This means we are validating email addresses by default, we don’t need to include password management in the application, and we make it much less likely our users’ accounts will be compromised. As a bonus, we can use the same flow for user registration too.

We’re off to a good start! The AI takes this request in its stride, and aside from needing to make some adjustments in Supabase to the email template and the email sending settings, things work correctly. Or at least it appears they do. I notice that sessions aren’t saving properly, so I ask the AI to diagnose and resolve the issue.

We now have a marketing site and an application to iterate on, so we’re making good progress. In the prompt to create the app, I specified that the site should be responsive so that it renders well on mobile devices - quite important for an app of this type. Next up we get to the real meat of the application - I vibe the ‘Add Expenses’, ‘View Expenses’, and ‘Settings’ tabs of the application, specifying which fields I would like to collect. One of the fun things about Vibe Coding is that the AI ‘builds around’ your prompt with additional relevant enhancements. As an example, this is the ‘View Expenses’ tab that Bolt created:

I didn’t ask it to add a search, nor a total expenses count, nor a total amount. It appreciated those were the things I would want in an app of this nature, and included them (I can, of course, remove them if I want to). Again, so far, so good! Naturally, there are some bits I would change / restyle and in the Settings tab in particular there are things yet to be wired up, but that’s fine.

The ‘Add Expense’ functionality works perfectly, but up to this point I haven’t specified that I want to support image upload, nor that I want to include AI image analysis. I now need to apply some of my ‘programmer brain’, and this is where the Vibe Coding premise becomes a little more questionable. I know that I want to upload a file, so I know that I need somewhere to store it (and that a Supabase storage bucket will be ideal). I know that I need to think about security permissions for that bucket, as well as for the work we’ve done already. I also know what we’re going to need to do to safely implement the AI functionality. Would a non-coder be able to provide the same context? Probably not, but perhaps the idea is that AI would lead them down the correct path.

After creating the storage bucket in Supabase and setting appropriate permissions, I ask AI to add the receipt upload functionality. This mostly works, although I need to let the AI know that we’re not using public URLs for the files, for security reasons. We’re running out of time in our 1-hour session now, so I hope the plan to implement the AI analysis works first time!

And so, we’re on to the final stretch for hour two. Can we get AI analysis baked into the app in time? I know that a good service to use for multimodal image analysis in this sort of project is Google Gemini, where we can get a free API key from AI Studio. I know that we should start with the Gemini Flash model for price vs performance and then try Flash Lite later to save cost. I also know - and this is important - that we can’t call the API directly from the front-end app, as it will risk exposing our API key, so we need to implement something server side. Thankfully, the use of Supabase turns out to be helpful again, as it has authenticated edge functions for this exact use case (albeit we might move to Cloudflare Workers later).

Bolt can write Supabase functions too - they’re just TypeScript with some domain knowledge after all - and thankfully, it does a decent job on the first try. There are some things we’ll need to put in the backlog, like scaling down huge images before uploading, but as time ticks over our allocated window, it’s all hanging together. An uploaded receipt is automatically processed into the form using AI!

I know that’s a lot. And it means this email isn’t short any more… but it does show the sheer power of building with a Vibe Coding mindset in the latest AI tools. What next? The latest code is deployed on Boring Expenses for you to try, or you can check it out in GitHub. For hour three, we’re going to do a refinement session - we’ll create a snag list, check the security on what has been created, and start thinking about additional capabilities we would like to add.

1. GPT-5 arrives

Courtesy of an unbelievably dull launch livestream and some very dubious charts that The Verge called “Vibe Graphing”, OpenAI's GPT-5 is finally here, delivering a focus on smarter, more reliable AI capabilities. Key advancements include enhanced reasoning that balances depth and speed by deciding when to apply "deep thinking on demand", a novel feature that allows the model to use more computational resources selectively for complex queries. This reduces superficial responses common in earlier versions. GPT-5 also introduces a verbosity toggle and a “minimal” reasoning mode, allowing developers to tailor interactions by balancing cost, response latency and depth of analysis.

For coding professionals, GPT-5 shifts from being a pair programmer tool to a full collaborator in software development. It can create front-end interfaces, generate tests, and manage longer sequences of tool calls while maintaining context, supporting more agentic workflows rather than simple code completion. The model also achieves a step up in reliability, with fewer hallucinations and a more composed refusal style when uncertain, reducing risks in generating client-facing AI outputs.

Consumer and business users benefit from a revamped ChatGPT app featuring personalisation options like chat personality and colour schemes, voice customisation, and a new Study mode for guided tutoring. Integration with Gmail and Google Calendar enables the AI to assist with composing emails and preparing meetings, with these improvements rolling out first to ChatGPT Team users and to Enterprise and Education sectors imminently.

The arrival of GPT-5 also means that all previous versions are deprecated effective immediately in ChatGPT (but not in the API), providing an immediately cleaner user interface.

An interface of Microsoft 365 Copilot showcasing GPT-5

GPT-5 is being rapidly integrated across Microsoft’s suite of products, bringing its advanced reasoning and coding capabilities to a wide range of user experiences. Microsoft 365 Copilot now leverages GPT-5 to handle more complex questions, maintain context in longer conversations and reason over emails, documents, and files, helping enterprise users stay on top of their work with greater productivity.

Meanwhile, Microsoft Copilot offers a “Smart mode” powered by GPT-5, providing enhanced answers, writing assistance and creative idea generation available free to consumers across Windows, Mac, Android, and iOS platforms. Developers benefit from GPT-5 through GitHub Copilot and Visual Studio Code, where they can write, test and deploy code more efficiently with the new model’s ability to manage longer and more complex coding tasks. Additionally, GPT-5 models are accessible via Azure AI Foundry, complete with a model router that selects the optimal GPT-5 model based on task complexity, performance needs and cost efficiency, all under enterprise-grade security and compliance. This comprehensive integration ensures that across Microsoft’s consumer, developer, and enterprise environments, GPT-5 delivers sharper intelligence and more capable agentic workflows seamlessly.

Post-release, GPT-5 is quickly popping up in tools such as Bolt, Windsurf, Perplexity etc. - so, we can really start putting the new model through its paces.

For enterprises, GPT-5’s new “deep thinking on demand” feature brings significant value by applying smart reasoning selectively when complex analysis is needed, rather than defaulting to quick but superficial responses. This ability to dial up thoughtful, multi-step reasoning on an as-needed basis enhances the accuracy and reliability of AI outputs, which is especially important for environments where precision matters. Additionally, the simplification of the product set streamlines adoption and user experience, making it easier for colleagues across the business to choose the right tool for their tasks without confusion. Together, these advancements not only boost productivity by providing sharper insights and contextual understanding but also reduce operational friction and risk, paving the way for smoother, more trusted AI integration throughout the enterprise.

2. OpenAI also releases gpt-oss

This week wasn’t only about GPT-5! OpenAI also unveiled gpt-oss-120b and gpt-oss-20b, their much anticipated, long awaited state-of-the-art open-weight language models designed to push the boundaries of reasoning capabilities while enabling cost-effective deployment.

These models are available under the Apache 2.0 license, making them accessible for a broad range of developers and organisations eager to leverage powerful AI without the constraints of proprietary restrictions. The gpt-oss-120b model achieves performance close to OpenAI’s proprietary o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. On the smaller scale, gpt-oss-20b offers capabilities comparable to the o3-mini model but requires just 16 GB of memory, making it particularly suited for edge devices and local inference scenarios. Both models excel in chain-of-thought reasoning, tool use, and structured output generation, supporting complex workflows with adjustable reasoning effort settings to balance latency and performance.

Technically, these models are built on advanced Transformer architectures that incorporate mixture-of-experts (MoE), optimising efficiency by activating only a subset of experts per token. The gpt-oss-120b has 117 billion parameters with 128 experts available per layer, of which four activate for each token. It supports extremely long context lengths of up to 128,000 tokens, ensuring broad applicability in complex tasks such as coding, competition mathematics, and health-related inquiries. Post-training involved supervised fine-tuning and reinforcement learning techniques like those used on OpenAI’s proprietary models, aligning the open models with OpenAI’s rigorous safety framework and demonstrating strong tool use in agentic workflows. This extensive safety training includes adversarial fine-tuning and independent expert reviews, marking an important advancement in open model safety standards.

Open-Source models are a key step towards democratising AI technology, giving developers, enterprises, and governments the freedom to deploy, customise, and experiment with large language models entirely on their own infrastructure. Partnerships with leading cloud and hardware providers have helped ensure broad accessibility and optimised performance across numerous platforms, including local Windows devices with GPU-accelerated inference. While OpenAI continues to offer hosted models through their API, the availability of these open-weight models significantly lowers barriers for smaller organisations and those in emerging markets, fostering innovation, transparent AI development, and safer AI research.

Many of the most innovative model hosts such as Cloudflare or Cerebras only offer Open-Source models, so having a capable OpenAI model in this space is key. Cerebras is offering the top-tier model at a frankly unbelievable 3000 tokens per second thanks to their custom silicon, which opens a wealth of real-time use cases.

The emergence of gpt-oss models means that enterprises now have access to cutting-edge, open-weight AI models capable of complex reasoning tasks without the heavy infrastructure costs traditionally associated with such technology. This opens opportunities for internal innovation, such as deploying customised AI solutions on-premises for enhanced data security and experimenting with fine-tuning for domain-specific applications. The flexibility and efficiency of these models align well with enterprise requirements for scalable, cost-effective AI, potentially accelerating digital transformation and competitive advantage in an increasingly AI-driven market. Moreover, the strong safety standards embedded in these models provide greater confidence for responsible AI adoption within regulated industries.

3. Is Genie 3 the future of movies and gaming?

Google DeepMind has unveiled its latest breakthrough in AI world models with the release of Genie 3, a groundbreaking system capable of generating interactive, physically consistent 3D environments in real-time. Unlike previous generative AI models limited to short clips or static worlds, Genie 3 can create several minutes of dynamic simulation at 720p resolution and 24 frames per second from a simple text prompt. Remarkably, the model retains a short-term memory of the environments it generates, allowing for visual and physical consistency over time - meaning objects and settings remain stable even when revisited. This represents a significant leap forward in creating immersive virtual spaces that respond realistically to interactions, something akin to a "Star Trek holodeck" experience but accessible via conventional screens.

What sets Genie 3 apart is its underlying architecture, which does not rely on a hard-coded physics engine. Instead, it learns how the world works - how objects move, fall, and interact - by remembering previously generated frames and reasoning over long sequences. This auto-regressive approach, where each frame depends on the last, leads to emergent consistency and a deep understanding of physics within the simulated worlds. Users can trigger “promptable world events” to alter scenarios dynamically, such as changing weather or inserting new characters, expanding creative possibilities. Beyond entertainment, this capability is expected to be crucial for training embodied AI agents and robots in general-purpose tasks, making Genie 3 a key step toward artificial general intelligence (AGI).

Though currently in a limited research preview available only to select academics and creators, the implications for enterprise and business applications are considerable. Genie 3's generation of richly interactive, physically faithful simulations provides a compelling tool for advanced training and prototyping - whether in robotics, gaming, education, or virtual environment creation. As the model continues to evolve and overcome limitations such as multi-agent interaction and geographic accuracy, it promises to redefine how enterprises approach immersive training, automation, and design. For businesses, this means new avenues to explore AI-driven innovation in experiential learning, digital twin environments, and simulation-based problem-solving, all of which can enhance operational efficiency and creativity in complex projects.

4. Anthropic is still not slowing down

We talked last week about Anthropic not slowing down and, er, they still haven’t! This week the company released Claude Opus 4.1, an incremental yet impactful upgrade to its AI model that significantly enhances real-world coding and agentic tasks. Claude 4.1 builds on the architecture of Opus 4 but introduces focused improvements in reliability, autonomy, and contextual reasoning. It boasts a massive 200,000-token context window, allowing it to process and reason over extensive documents or codebases without fragmentation. This feature is particularly valuable for complex workflows requiring sustained attention and deep context retention.

The model scores an impressive 74.5% on the SWE-bench Verified benchmark, which evaluates software engineering tasks, outperforming OpenAI's GPT-4.1 (GPT-5 comparison TBC!) and demonstrating superior ability in multi-step instruction handling, debugging, and planning complex tasks. Its hybrid reasoning approach balances prompt responses with extended, step-by-step thinking for complicated problems, making it notably effective for long-form coding, AI agents, and research tasks. Available across Claude Pro, Max, Team, Enterprise, and API platforms, Claude 4.1 integrates seamlessly into developer environments such as VS Code, JetBrains, and GitHub Copilot.

A benchmark table comparing Claude Opus 4.1 to prior Claude models and other public models

Alongside the model upgrade, Anthropic has introduced automated security review capabilities within Claude Code, a command-line tool that leverages Claude's AI to assist developers in producing secure code. This new feature enables developers to run on-demand security checks directly from their terminal using the /security-review command, scanning codebases for vulnerabilities like SQL injections, cross-site scripting (XSS), authentication and authorisation flaws, insecure data handling, and dependency risks. After identifying issues, Claude Code can suggest or implement fixes, helping keep security integrated early in the development cycle. Furthermore, the tool supports GitHub Actions for automatic security reviews of pull requests with inline comments on vulnerabilities and remediation advice. This automation standardises security checkpoints across teams and fits smoothly into CI/CD pipelines, ensuring that code rarely reaches production without a thorough security assessment. Anthropic's internal use of these tools has already demonstrated their effectiveness by detecting and resolving critical vulnerabilities such as remote code execution and SSRF before deployment.

For enterprises, these developments with Claude 4.1 and its security review automation help integrate AI into software development workflows safely and efficiently. Claude 4.1's improved autonomy and reasoning capabilities can drive complex engineering projects with less need for constant human oversight, boosting developer productivity and reducing error rates. Meanwhile, enhancing security through automated reviews helps mitigate risks from the increasing complexity and volume of AI-generated code. By adopting these tools, our teams can accelerate innovation velocity while maintaining stringent quality and security standards, helping us maintain leadership in deploying cutting-edge, trustworthy AI technologies within our business environments.

5. And now for something a bit different…

Have you noticed how annoying it is to try and maintain any visual consistency when creating images of people with AI? No? Just me? Well, there’s now a solution!

Ideogram Character is an innovative offering that brings unprecedented character consistency to creative projects using just a single reference image. This new tool allows users to generate infinite variations of characters - whether real or imaginary - with remarkable fidelity, based solely on one input photo. Available on Ideogram's Plus and Pro subscription plans, it offers a simple yet powerful way to keep visual coherence across diverse scenes and styles. To celebrate its launch, the feature is currently free for all users to try, making it accessible to both professional creators and enthusiasts.

These are me, except they aren’t, they’re generated only from my publicly available X profile picture. Impressive.

Where Ideogram Character truly shines is in its versatility and ease of integration. For instance, the tool complements Ideogram’s Magic Fill feature, which lets users effortlessly add consistent characters into existing scenes by simply masking the desired area and adding a prompt referencing the character. This workflow is perfect for visual storytelling, marketing materials, and creative productions where maintaining the identity of a character across different settings or poses is crucial. Additionally, the ability to combine Describe or Remix functions enables style transfer magic, allowing creators to match their character precisely to any artistic style or inspiration image with just a few clicks.

For enterprise businesses, Ideogram Character offers tangible value in enhancing brand storytelling and consistent visual communication. Marketing teams can use it to create personalised, on-brand character representations across campaigns without the cost and complexity of bespoke artwork each time. It accelerates creative workflows by reducing the need for manual adjustments and multiple photo shoots, thereby saving time and resources. Moreover, by allowing easy modification of character traits through mask editing, the tool supports tailored visual narratives that maintain character identity while adapting to different messaging contexts, making it a compelling asset for digital content creation and brand management.

POB’s closing thoughts

If there’s one AI topic that is more controversial than others, it relates to the creation of music with AI. The concept has been bubbling under for a while, but courtesy of ElevenLabs, it is now very much here. Touted as “the next step on our mission to build the most comprehensive AI audio platform in the world”, Eleven Music allows businesses, creators, artists, and every single one of their users to generate studio-grade music from natural language prompts, with complete control over genre, style, and structure, options for vocals or just instrumental, support for multi-lingual, including English, Spanish, German, Japanese and more and with the ability to edit the sound and lyrics of individual sections or the whole song. It’s very impressive. And frightening.

After we’ve spoken so much about the Windsurf acquisition, it feels right to give one final update. Cognition’s recent acquisition of the company has quickly turned sour for the acquired staff, following the tumultuous period we previously covered. Just three weeks post-acquisition, Cognition laid off 30 Windsurf employees and offered buyouts to around two hundred others, equivalent to nine months’ salary. Those who choose to stay face grueling demands: six-day office weeks and 80+ hour workweeks, a stark departure from normal expectations. CEO Scott Wu has been blunt about the company’s culture, stating they do not believe in work-life balance, framing their intense work environment as a mission-driven necessity. This harsh shift undermines Cognition’s initial praise of Windsurf’s “world-class people” and highlights that the acquisition was more about intellectual property than talent retention. The whole situation recalls the turbulence Windsurf experienced with earlier failed deals and key staff poached by Google, making this integration phase another challenging chapter for the startup’s employees.

Finally for this week, I found this tweet (I know, I should call it a post!) on ~~Twitter~~ X fascinating! “The AI infrastructure build-out is so gigantic that in the past 6 months, it contributed more to the growth of the U.S. economy than all of consumer spending. The 'magnificent 7' spent more than $100 billion on data centers and the like in the past three months alone”. Crazy right?

chart: capital expenditures, quarterly - shows meta, Google Microsoft and Amazon collectively spending nearly $100 billion on capex in the past quarter

Thanks for reading, I hope you have a great weekend! 👍

I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. What AI topics are you most interested in for future explainers? Are there any specific AI tools or developments you'd like to see covered? Remember, if you have any questions around this Substack, AI or how Davies can help your business, you can reply to this message to reach me directly.

Finally, remember that while I may mention interesting new services in this post, you shouldn’t upload or enter business data into any external web service or application without ensuring it has been explicitly approved for use.

Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.

Enterprise AI Weekly