Enterprise AI Weekly #34

Real-time accent levelling, AgentKit and GPT apps, context engineering tips, Gemini 2.5 Computer Use, Bolt v2 or Lovable Cloud, AI’s exponential growth, AI Studio Gemini Live, pricing models and Uno Q

Oct 10, 2025

Welcome to Enterprise AI Weekly #34

You’re reading the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.

Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to businesses of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.

We’re also working on something together. I’m building an app, Boring Expenses, in a Vibe Coding style, to demonstrate the process and to provide a test bed for technologies we talk about in future issues. I previously mentioned that I set aside a bit of time in my week for keeping up with tech and doing this sort of thing - usually a Sunday morning - so I’m setting aside an hour each week to progress our experiment.

If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page. Enterprise AI Weekly is now available for anyone to sign up at https://enterpriseaiweekly.com! Please share the link and encourage others who might find it interesting to sign up.

Still don’t trust your own ears

I was on a call this week with some of the team at our Vietnamese software development partner, and one of the people on the call had a particularly strong accent that I had to pay extra attention to in order to catch what they were saying. This reminded me of a cool real-time AI capability that I wanted to share with you all.

Krisp offers a software product that focuses on real-time noise cancellation (primarily to level-up their meeting tools and transcription capability), but it also has some additional features that relate to levelling out of regional accents. Their primary use case includes contact centres, but if the tech is applied to both ends of a conversation, I think it has great broader potential. Check out the demo video below - the male-to-female feature still sounds a little synthetic, but it’s clear to see this technology is viable and will only continue to improve.

Now, onto the rest of the news. Enjoy EAIW #34!

1. OpenAI launches AgentKit and apps in Chat GPT

OpenAI has launched two new products targeted at both developer and end-user audiences. AgentKit brings much-needed simplification to building and managing agent workflows, while the new Apps SDK for ChatGPT creates a pathway for embedding interactive tools directly inside chat conversations.

AgentKit challenges popular players such as n8n by offering a visual space for composing workflows with drag-and-drop ease, allowing business and engineering teams to collaborate and iterate quickly. Everything from logic design to preview runs and version tracking happens within the same interface. Templates are included for common tasks, helping teams kick off new projects or tailor bespoke solutions without starting from scratch. Timeframes for deploying new agent workflows have the potential to drop significantly, and the approach encourages participation from subject matter experts, not just technical staff.

Interface view of a customer service automation flow in a visual builder tool. The canvas shows connected nodes labeled Start, Jailbreak guardrail, Classification agent, If/else, Return agent, Retention agent, Information agent, Hallucination guardrail, and End. A sidebar on the left lists available node types such as Agent, Note, File search, Guardrails, MCP, and User approval. Top controls include options for Evaluate, Code, Preview, and Publish.

The new Connector Registry streamlines how data across platforms - including Microsoft Teams, SharePoint, Google Drive, and Dropbox - feeds into agentic workflows. Data governance becomes far easier, giving admin teams the clarity they need for compliance and risk management. With open-source guardrails, businesses can flag or mask sensitive content, enforce safety checks, and support deployment in either Python or JavaScript. Evaluating agents is also much easier thanks to new features that automate grading and prompt improvement and allow businesses to set custom performance criteria or benchmark non-OpenAI models. During testing, teams reported reduced development cycles and improved output quality.

On the end user product side, the new Apps SDK serves to improve how third-party tools are accessed in ChatGPT. Developers can now build apps that appear directly in ChatGPT conversations, offering interactive elements like maps and live data views that adapt to user context. The SDK follows open standards, ensuring apps can run everywhere Model Context Protocol (MCP) is supported. Privacy and safety are built-in, with granular controls for admins and clear visibility into data flows. The ability to surface relevant apps at just the right moment in chat keeps user experience smooth and productive, cutting down on friction and helping deliver the right tool at the right time. App submissions and monetisation will open later this year, expanding opportunities for teams that build reusable tools for the business.

A recent ~~Twitter~~ X post from Steven Heidel at OpenAI draws attention to the impact of Codex on product development velocity, emphasising that it has made a considerable difference for his team’s ability to deliver new offerings. The wider implication is that tools built on Codex - like those within AgentKit - aren’t simply theoretical upgrades, they unlock genuine momentum in the way teams move from ideas to shipping solutions. For any business working to bring new developments to market, the blend of rapid prototyping, flexible workflows, and robust governance now available means teams can deliver value with far less friction and much greater speed.

I noted this week that current progress in coding automation puts us ahead of expectations regarding the AI 2027 report’s forecast and more in line with 2026 predictions: coding automation goes mainstream - agents work like teammates - AI R&D is 50% faster from algorithms, not just compute.

Both AgentKit and the ChatGPT Apps SDK are tailored to simplify and speed up automation, while offering new ways to integrate tools. These systems bridge the gap between technical and non-technical teams, support ongoing compliance needs, and embed powerful business processes within day-to-day chat environments. The result is quicker response times, lower technical overhead, and greater adaptability as team requirements shift. New capabilities like custom grading, template-driven workflow design, and in-chat tools open the door for Enterprises to build solutions aligned with their needs.

2. Anthropic on effective context engineering for AI agents

Anthropic’s recent article on effective context engineering for AI agents highlights a growing trend in working with large-language models (LLMs). Moving beyond classic prompt engineering, context engineering (which we first discussed in EAIW #20), focuses on optimising the complete set of tokens - system prompts, message histories, tools, and external data - that fill an LLM’s finite attention window during inference. This shift recognises the constraints LLMs face due to their architectural limits on context length and attention capacity. Every token introduced dilutes the model’s focus and can degrade performance over longer interactions.

The post details the trade-off of context as a scarce resource, advocating for “the smallest possible set of high-signal tokens” that maximise the likelihood of desired outcomes. Practical guidance includes structuring system prompts clearly and efficiently, designing purposeful and minimalistic tools for agents to interact with environments, and providing diverse but canonical few-shot examples to guide behaviour. Furthermore, Anthropic emphasises dynamic strategies like “just-in-time” context retrieval, where agents load only relevant data at runtime, mirroring human reliance on indexes and bookmarks rather than memorising everything upfront.

Prompt engineering vs. context engineering

For tasks spanning long time horizons or complex workflows exceeding the context window size, Anthropic discusses advanced solutions such as compaction (high-fidelity summarisation and reset of context), structured notetaking (agent memory stored externally and recalled as needed), and multi-agent architectures with specialised sub-agents handling deep tasks and summarising their findings. These techniques collectively preserve coherence and enable agents to operate persistently and effectively over extended periods despite inherent LLM limitations.

This post from Anthropic is relevant to enterprises striving to build capable AI agents because it frames context as a finite, manageable asset essential to reliable AI deployment. Understanding and implementing thoughtful context engineering practices allows businesses to create more efficient, robust autonomous agents that can handle long, complex tasks without losing focus or producing lower-quality outputs. As AI agents become more integrated into enterprise workflows, optimising context use will be a critical factor in realising their value effectively and sustainably.

3. Google releases the Gemini 2.5 Computer Use model

Google’s flood of Gemini 2.5 based model releases continues (ahead of the rumoured imminent release of Gemini 3.0), with Google Deepmind’s new Gemini 2.5 Computer Use model, intended to represent a key component in AI agents’ ability to interact directly with user interfaces in a human-like fashion. Built on the advanced Gemini 2.5 Pro framework, this model enables AI agents to perform on-screen actions such as clicking, typing, scrolling, and manipulating interactive elements on web and mobile applications. This capability allows developers to build powerful agents that can automate complex workflows that still rely heavily on graphical user interfaces (GUIs), rather than structured APIs alone.

The model operates through a continuous loop using the new “computer_use” tool accessible via the Gemini API. Each cycle inputs the user request, a screenshot of the current UI environment, and a history of recent actions. The model then generates structured UI action responses like mouse clicks and keyboard inputs, which the client application executes. The system returns updated screenshots and context to the model, enabling it to adapt iteratively until the task is complete or a safety stop triggers. This design has proven particularly effective in web browsers but shows promise for mobile app interface control and potential future expansion to desktop OS interaction.

Safety and responsible design are embedded centrally in the Gemini 2.5 Computer Use model. Each AI action undergoes an inference-time safety assessment, and developers can require user confirmation for sensitive tasks such as purchases and system-level operations. Early internal and partner use cases have reported improved task accuracy and efficiency, as well as significant reductions in latency compared to previous methods. Practical applications range from automating data entry and form filling to testing UI flows and managing complex digital workflows. This model could transform how enterprises automate routine digital tasks and integrate AI-driven agents into their productivity tools.

For enterprises, the Gemini 2.5 Computer Use model opens new possibilities in automating operational tasks that still require graphical UI interaction, often a bottleneck for automation. It enhances efficiency in customer relationship management, data entry, and user interface testing through more intelligent and adaptive AI agents. With its built-in safety controls and developer-friendly API access, this model provides a robust foundation for integrating AI agents that operate safely and reliably within enterprise systems, helping reduce manual workload while maintaining oversight. The ability to interact naturally with web and mobile interfaces aligns perfectly with the increasingly digital and multi-application nature of enterprise workflows.

I’m intrigued by the potential of the computer use models, and we’ll be trying the model out in a future ‘Vibe with POB’ issue!

4. Bolt and Lovable upgrades expand Vibe Coding capabilities

The recent launches of Bolt v2 and Lovable Cloud move two leading “Vibe Coding” platforms towards being fully integrated, AI-powered full-stack development platforms. They look to deliver professional-grade vibe coding directly in the browser, combining powerful AI coding agents with enterprise-level infrastructure, removing integration hassles, and enabling rapid app building without coding experience.

By embedding Supabase as the backbone for databases, authentication, hosting, and real-time capabilities, both platforms simplify what used to be one of the toughest parts of development - backend setup and management. Both Bolt v2 and Lovable Cloud give users secure, scalable backend services seamlessly integrated with AI tooling and automatically create and manage databases as part of the workflow. This level of abstraction allows product teams, designers, and other non-developers to build complex applications without touching infrastructure details or command-line setups.

From a security perspective, this abstraction has dual facets. On one hand, relying on a trusted, enterprise-grade backend like Supabase reduces risks related to manual misconfigurations or fragmented tools, as parts like authentication and data security audits are managed centrally and intelligently. Bolt, for example, offers built-in security audits with automatic fixes on demand. On the other hand, democratising development to users less familiar with underlying complexities does raise concerns about potential gaps in security awareness. The platforms’ design - with integrated monitoring, audit trails, and managed hosting - helps mitigate these concerns, albeit not completely.

The full development cycle - from idea to deployment - is becoming accessible beyond traditional engineering teams. This empowers businesses to accelerate innovation, reduce bottlenecks, and focus on building value rather than infrastructure headaches. As enterprises embrace these tools, the challenge will be to balance ease of use with maintaining rigorous security and governance, a task these platforms are now seeking to address by embedding security and scale into their core architecture from the outset.

Embracing platforms like Bolt v2 and Lovable Cloud allows enterprises to accelerate digital delivery through smarter tooling. By democratising app development and automating backend complexity securely, these tools can unlock new efficiencies across teams and roles. Understanding their capabilities helps us evaluate how to integrate such solutions responsibly, ensuring we maintain strong security while tapping into the speed and flexibility AI-enhanced full-stack platforms offer.

5. Are we ‘failing to understand the exponential, again’?

I recently happened upon an opinion piece by Julian Schrittwieser, which takes a clear stance against the narrative that AI progress is a bubble or nearing a plateau. Drawing a parallel to the early COVID-19 pandemic, Julian argues that just as many failed to grasp the exponential spread of the virus, the same misunderstanding plagues public discourse on AI capabilities. People tend to focus on today’s imperfections - such as AI making mistakes in coding or design - and prematurely conclude that AI will not reach or surpass human-level performance in these tasks. Yet only a few years ago, such capabilities were pure science fiction.

The piece draws heavily on rigorous data from studies like METR’s “Measuring AI Ability to Complete Long Tasks” and OpenAI’s GDPval (as discussed in EAIW #33). METR’s study shows a clear exponential increase in AI ability, measured by the length of software engineering tasks AI models can complete independently, with recent models like GPT-5 completing multi-hour tasks at significant success rates. Similarly, the GDPval evaluation, which spans forty-four occupations across nine industries and grades AI against experienced human professionals, reveals the latest models approaching and soon to surpass human expert performance. Notably, Claude Opus 4.1 even outperforms GPT-5 in the benchmark, underscoring open, cross-lab comparisons as a positive sign of scientific integrity.

Looking ahead, Julian predicts 2026 as a pivotal year for AI’s integration into the economy. By mid-2026, he expects models capable of autonomously working full 8-hour days (something we’ve seen surpassed in the last few weeks by Claude 4.5 Sonnet), reaching, or exceeding human expert quality across many industries before year-end. By 2027, AI could frequently outperform experts on a wide range of tasks, a forecast grounded in straightforward extrapolation of sustained exponential trends seen in the data.

Julian also acknowledges critiques of the comparison to COVID and the challenges of extrapolating complex underlying factors. However, he highlights that this mirrors Moore’s Law - primarily a statistical trend observed over time - and stresses that even if progress is not guaranteed, the evidence merits far more serious political and policy focus. The current evaluation tasks, while rigorous, still lack the messiness and unpredictability of many real-world jobs, suggesting that AI is likely to become a highly effective tool to augment rather than entirely replace human workers in the near term.

From an enterprise perspective, this analysis carries relevance. The data-backed case for ongoing rapid AI capability expansion means strategic planning should no longer view AI as an ancillary or experimental tool but as a core driver of productivity and competitive advantage. While disruption and job transformation are inevitable, the likelihood is that AI will increasingly act as an enabler - empowering teams with smarter assistants managing well-defined tasks autonomously yet requiring human oversight for complex decision-making and creativity.

Our view should therefore embrace AI’s exponential progress for what it is: a paradigm shift demanding thoughtful investment, workforce reskilling, and governance frameworks that balance productivity gains with societal impacts. Preparing for 2026 and beyond means viewing AI not as a bubble to burst, but as an accelerating economic force that enterprises must integrate proactively to thrive.

This article aligns with our approach encouraging informed, data-driven AI adoption to enhance business outcomes responsibly, while recognising the nuanced realities ahead for workforce and policy adaptation. The scientific integrity and transparency highlighted in the piece exemplify the type of AI discourse enterprises should anchor their strategies around.

POB’s closing thoughts

It’s been quite a busy AI news week this week, as always, so I’m going to leave you with a few other things that caught my eye.

Google AI Studio product lead Logan Kilpatrick announced that ‘You can now Vibe Code voice AI agents and experiences in Google AI Studio, for free, with just a prompt’. He’s right, you can, it works and is cool. It provides a good insight into the capabilities of the Gemini Live model, which for me feels a little behind what you can achieve with ElevenLabs, who themselves introduced ‘Elevenlabs UI’ this week, ‘Open-source components for AI audio & voice agents. Twenty-two components & examples for chat interfaces, transcription, music, and more. Fully customizable. MIT licensed’. Nice!

Also, on my AI travels this week, I came across a company called ‘Paid’. Paid, who’ve just taken $21m of seed funding, promise to deliver ‘Monetization & cost tracking for AI agents. Stop leaving money on the table. Price with confidence, know your margins and bill for the value your customers get’. This is interesting, as AI looks set to up-end conventional pricing strategies, especially in industries with time-based billing, and enterprises look to alternative models.

Finally, (full nerd mode on, you know I love the hardware!), newly acquired Arduino, now part of Qualcomm, revealed the Arduino UNO Q. The Arduino Uno Q combines a Linux-capable Qualcomm ARM processor running at 2 GHz, with an integrated Adreno GPU and dual image signal processors, alongside a real-time microcontroller for deterministic control. This hybrid dual-brain architecture supports advanced AI capabilities such as vision and sound recognition, allowing running AI models directly on the board. It supports 2 GB LPDDR4 RAM, 16 GB eMMC storage, dual-band Wi-Fi 5, Bluetooth 5.1, and offers compatibility with classic Arduino UNO shields plus new connectivity options like Qwiic connectors. The board operates with an all-in-one development environment called Arduino App Lab, which integrates Arduino sketches, Python scripting, and AI model deployment in a seamless workflow. For enterprises and AI enthusiasts, the Uno Q provides a compact, cost-effective prototyping platform that bridges traditional microcontroller real-time control with high-performance Linux computing, ideal for edge AI, robotics, IoT, and smart sensor applications, delivering flexibility, performance, and rapid innovation within the familiar Arduino ecosystem. Oh, and it’s just $44!

Thanks for reading, I hope you have a great weekend! 👍

I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. What AI topics are you most interested in for future explainers? Are there any specific AI tools or developments you'd like to see covered? Remember, if you have any questions around this Substack, AI or how Davies can help your business, you can reply to this message to reach me directly.

Finally, remember that while I may mention interesting new services in this post, you shouldn’t upload or enter business data into any external web service or application without ensuring it has been explicitly approved for use.

Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.

Enterprise AI Weekly