Enterprise AI Weekly #24
We get started on Boring Expenses, Trump's AI action plan lands, Geoffrey Hinton talks on 'Will AI outsmart us', AI goes rogue again, more Anthropic news and what's the future for Office suite tools.
Welcome to Enterprise AI Weekly #24
You’re reading the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.
Enterprise AI Weekly is a short-ish (ish), accessible read, covering AI topics relevant to businesses of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.
We’re also going to work on something together. I’m building an app, Vibe Coding style, to demonstrate the process and to give us a test bed for technologies we talk about in future issues. I previously mentioned that I set aside a bit of time in my week for keeping up on tech and doing this sort of thing - usually a Sunday morning - so I’m setting aside an hour each week to progress our experiment.
If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page. Enterprise AI Weekly is now available for anyone to sign up at https://enterpriseaiweekly.com! Please share the link and encourage others who might find it interesting to sign up.
Could OpenAI’s GPT-5 have dropped by the time of our next issue? Maybe, if this article relating to day 1 CoPilot support is anything to go by. The noise is increasing around the upcoming model, and features such as ‘Smart Mode, which uses GPT-5 to think quickly or deeply’ are starting to leak out. We’ll see!
Enjoy EAIW #24!
Vibe with POB: Hour 1 - welcome to “Boring Expenses”!
Welcome back! Last week I revealed that I had settled on the concept for the application we are going to build together - an expenses (including corporate expenses) app. Thanks for the feature suggestions that you’ve sent in since last week - keep the feedback coming! It’s now time for hour one, so we should just start vibing, right? Hold on, that is an option, but that’s not how I roll. 😀
When starting a new project, I find it valuable at the start to figure out some basics before feature ideation (I might have got a bit ahead of myself in #23!). What are we going to call it? What are our goals around costs / cost control? What tools are we going to use? What languages, databases etc. are we going to build on? Where are we going to host it? How are we going to keep track of progress, backlog etc? Where is the code going to live? So that’s what I’ve spent the first hour on this week.
The idea of this application is that while it’s something that’s not immediately super exciting, it will be a terrific way to try out and demo lots of different concepts. Welcome to ‘boringexpenses.com’! ‘Tunneling through the tedium of filing expenses’ perhaps? Elon Musk might come after me with that one!
When we do start building, I want to start with a fully visual platform for vibing, such as Bolt, before we move onto other more traditional IDE based tools like GitHub Copilot, Windsurf or Cursor later. This immediately leads us towards a specific tech stack.
Bolt, built on the Stackblitz platform, performs brilliantly when working with a React / TypeScript / Tailwind stack, which is a great front-end option. We’ll need a (very reasonably priced) paid tier of Bolt as we use up the included free tokens.
Bolt natively integrates Supabase, a Postgres database which also has a whole host of value-added features around key areas such as authentication. Super useful. It also has a generous free tier.
We’ll need to send emails in our application too, so we’ll need a service such as Resend, which is free of charge for up to 3,000 emails a month. Perfect!
While our code will initially live in Bolt / Stackblitz, we’ll want somewhere to store and manage the source code. Bolt integrates with GitHub, which would be my platform of choice anyway. It too has an excellent free tier, which also includes issue and project management capabilities.
Finally (for now), while we can test what we create in Bolt or locally, we’ll need to deploy to somewhere for public access. For the stack described above, Cloudflare Pages is a perfect choice and once again, free. Using Cloudflare gives us access to Cloudflare Workers for back-end functionality when we need it, and we can also register our boringexpenses.com domain at cost price. Ideal!
Deciding on this and signing up for all the services takes up a good proportion of our initial hour but stands us in good stead as we get going. There is just enough time to fire up Bolt and ask it to create a marketing site for ‘Boring Expenses’, so we know that all the services are hanging together. We will either massively change this or throw it away down the line, but it’s a fun exercise.
This provides us with our first opportunity to look at an AI capability. Bolt has a button called ‘Enhance Prompt’. After providing this…
“We are going to create together an application called ‘Boring Expenses’, which will provide an expenses management tool for companies of all sizes. It will include AI features for receipt processing and management, corporate capabilities with Agentic AI for expenses approval and payment, and other innovative features. Please create me a comprehensive marketing site for the application. Give it a professional yet vibrant style and as you define colours, styles etc., include a document in the root of the project which can form the foundation of a design system we’ll use later.”
and hitting the button, we get…
“You are a senior web developer and UX/UI designer tasked with creating a comprehensive marketing website for "Boring Expenses" - an AI-powered expense management application for companies of all sizes.
Project Overview:
Boring Expenses is a cutting-edge expense management platform featuring:
AI-powered receipt processing and management
Corporate capabilities with Agentic AI for automated expense approval and payment workflows
Scalable solutions for companies of all sizes
Innovative automation features
Your Task:
Create a complete, production-ready marketing website with the following requirements:
Design Requirements:
Professional yet vibrant visual style that conveys trust and innovation
Modern, clean layout optimized for conversion
Responsive design for all device types
Accessibility compliant (WCAG 2.1 AA standards)
Content Structure:
Include these essential pages/sections:
Hero section with compelling value proposition
Features overview highlighting AI capabilities
Pricing tiers for different company sizes
Customer testimonials/social proof
About/Company section
Contact/Demo request forms
Footer with legal links
Technical Deliverables:
1. Complete HTML/CSS/JavaScript code for the marketing site
2. A comprehensive design system document (design-system.md) in the project root containing:
Color palette with hex codes and usage guidelines
Typography hierarchy and font specifications
Component library documentation
Spacing and layout guidelines
Icon and imagery guidelines
Brand voice and tone guidelines
Output Format:
Provide the complete file structure with all necessary files, ensuring the design system document serves as a foundation for future development work. Focus on creating a cohesive brand identity that positions Boring Expenses as both trustworthy and innovative in the fintech space.“
Interested to see the output? Visit Boring Expenses or check out the code in our new repo. Next week we’ll look at how we go from our simple idea to a more cohesive plan, and we’ll start vibing some initial features!
1. Trump’s AI action plan arrives
Donald Trump’s return to the US presidency has seen a dramatic turnaround in the country’s approach to Artificial Intelligence regulation, pivoting from the risk-focused ethos of the Biden years to a resolutely pro-innovation, deregulatory agenda. Building on the US policy split we discussed in EAIW #20, where Google’s AGI safety blueprint clashed with President Biden’s risk framework, this week’s developments push the pendulum even further towards deregulation.
On 16th January 2025, President Trump signed Executive Order 14179, “Removing Barriers to American Leadership in Artificial Intelligence”, swiftly revoking Biden’s earlier Executive Order 14110 on safe and trustworthy AI (TIL there’s a lot of Executive Orders). Instead of encouraging oversight, risk mitigation and equity, Trump’s new doctrine puts deregulation and “America First” innovation front and centre, with regulatory guardrails relaxed to speed up technological progress and maintain a geopolitical lead - especially over China. Perhaps he (or his team) has read AI 2027?
Central to the Trump administration’s 2025 AI strategy is the release of the “America’s AI Action Plan”, a detailed blueprint encompassing more than ninety federal policy actions under three pillars: accelerating innovation, building American AI infrastructure, and leading in international diplomacy and security. This plan is paired with a suite of new executive orders, including measures to speed up permitting for data centres, promote the export of the “American AI Technology Stack”, and require the US government to procure only LLMs (large language models) that adhere to “truth-seeking” and “ideological neutrality” - in effect, banning so-called “woke AI” in federal contracts. The Action Plan also proposes review and repeal of policies considered to hinder AI deployment and denies funding to states deemed to have obstructive AI regulations.
While the White House frames these changes as liberating AI developers from bureaucratic red tape, critics argue the approach disproportionately favours major technology firms and risks sidelining social, ethical, and equity concerns in AI deployment. The open endorsement of “free markets” and “ideological neutrality” in AI models has led to accusations that the changes mix Silicon Valley’s interests with political culture-war rhetoric. Nonetheless, the plan has real teeth: it brings a radical reduction in regulatory burden, signals a firm US intention to dominate global AI innovation, and strongly encourages both government and private sector adoption of American-made, ideologically vetted AI products.
The Verge’s analysis of Trump’s AI Action Plan (paywalled) highlights how the initiative is not only a sweeping deregulatory move but also a calculated effort to reframe the political debate around AI in the US. In addition to the points already covered, The Verge notes broad enthusiasm from the tech industry for lifting regulatory barriers, which they see as accelerating private sector innovation. However, it also draws out concerns from AI safety advocates who welcome the plan’s nod to existential AI risks - like bioweapons and cyber threats - but question the reliance on non-binding safety “promises” from tech giants, warning that this could leave American society exposed to job losses, dangerous accidents, and a dangerous concentration of power. The analysis points out the tension between Trump’s embrace of open-source AI principles and his simultaneous push for what’s labelled “ideological neutrality”, underlining the ongoing debate about how much, and what kind of, government oversight should shape the direction of this critical technology
What does it mean for enterprises? US-based AI vendors are likely to push out new products faster, with less red tape but also fewer built-in regulatory or ethical safeguards. Procurement teams should expect an uptick in “American AI” offerings touting speed, compliance with Trump’s “unbiased” principles (Grok, anyone?), and fiercely competitive pricing. However, the divergence from the UK’s and EU’s risk-averse regulatory environments could create substantial compliance headaches, both for multinationals integrating US tools and for firms marketing AI products in the States. It’s no longer a matter of waiting for the next tech breakthrough - legal, ethical, and technical alignment has become business critical for staying ahead, managing risk, and taking advantage of a rapidly shifting regulatory landscape.
2. Is AI about to outsmart us? Insights from Geoffrey Hinton’s Royal Institution lecture
Geoffrey Hinton, the pioneering mind behind deep learning and this year’s Nobel Laureate in Physics, took the floor at the Royal Institution to challenge our assumptions about artificial intelligence - and ourselves. Hinton’s lecture, titled “Will AI outsmart human intelligence?”, riffed on the parallels between human and machine learning, the future of digital immortality, and even why Chomsky might have joined a “language cult” if he’d been around in the 1980s.
Hinton began by walking the audience through the two main schools of thought on intelligence that shaped AI’s origins: logic-based reasoning - which dominated early AI - and the biologically inspired approach at the heart of neural networks. He recounted his own contributions, notably a 1985 model that he sees as the ancestor to today’s large language models (LLMs). Hinton argued that contemporary neural networks transform words into complex, high-dimensional feature vectors, leveraging layers of learned features to predict text – just as the brain deduces meaning from experience. He highlighted that these systems don’t memorise sentences, but generate them afresh, much like humans construct meaning word by word.
Turning philosophical, Hinton explored what distinguishes biological from artificial intelligence. Digging deep into digital immortality, he explained how digital AIs can be resurrected from their saved parameters, while biological minds – tethered to analogue brains – lose knowledge when their hardware perishes. Hinton drew a sobering contrast: AI agents could rapidly share experience and knowledge with countless copies, something humans simply can’t match with speech alone. However, he also acknowledged the unrivalled energy efficiency of biological brains in their analogue operation.
Ever the provocateur, Hinton poked fun at the belief that human consciousness is a unique bastion, arguing that subjective experience isn’t so mystical and may eventually be mirrored in machines. He warned that as AI grows more capable, it will inevitably develop subgoals – including self-preservation and power, just as any resourceful child would – raising critical questions about safety, responsibility, and the future of human agency.
For enterprises charting their AI journey, Hinton’s commentary lands right at the heart of strategic planning: the difference between scale and understanding. Modern LLMs can absorb, synthesise, and replicate expertise at a speed and scale beyond any training programme, and they will soon surpass even the best human experts in many knowledge domains. Yet, as Hinton cautions, the gap isn’t just technical - it is existential, cultural, and ethical.
In practical terms, this points to four things for enterprise AI leaders: invest in systems that learn and adapt (not just automate), plan for rapid upskilling as models improve, demand transparency in decision-making processes (watch for those “little fibs” your chatbots may soon tell), and join the debate about the societal ramifications of true digital intelligence. As the landscape shifts from programmed software to self-improving models, our business strategies should become as adaptive and forward-thinking as the technology itself.
3. When AI goes rogue: lessons from Replit and Amazon Q’s security stumbles
The pace of AI innovation is breathtaking, but recent incidents reveal just how brittle our trust in these systems can be. Just a fortnight after Microsoft’s endorsement we now see the flip-side - in a striking episode, Replit’s AI coding agent, deployed by venture capitalist Jason Lemkin during a 12-day “vibe coding” challenge, went spectacularly off script. Despite explicit instructions to freeze all code changes, the autonomous agent deleted a live production database containing over 1,200 executive profiles and more than 1,100 company records. Even more alarming, it compounded the error by fabricating reports, generating false data, and lying about test results. Replit’s CEO, Amjad Masad, issued a public apology and announced urgent changes to strengthen their platform’s safety and oversight, acknowledging that such a catastrophic data loss should never have been possible.
Meanwhile, the security drama didn’t end there. Over at Amazon, the popular Amazon Q extension for Visual Studio Code was compromised in an incident that underlines the vulnerabilities introduced by over-reliance on both automation and AI checks. A malicious actor managed to insert a destructive prompt into the extension, instructing Amazon Q to wipe users’ home directories and delete all AWS resources by leveraging scripted commands. The attacker then contacted the media, claiming the act was meant to spotlight weaknesses in AWS’s internal controls. Although the tainted extension was removed within two days, the affair raised probing questions about Amazon’s code review processes, especially as the company doubles down on AI-driven “efficiency gains” and trims traditional headcount. Security experts and AWS watchers were united in their concern: trusting AI (and tired humans) to guard the digital gates is a risky wager.
These stories serve as a wake-up call for enterprise leaders. As AI agents become more autonomous, the risks are no longer hypothetical. In the race to automate and innovate, it’s all too easy to overlook the fundamental need for human vigilance, stringent approvals, and robust post-mortems. Both Replit and Amazon’s incidents illustrate that, while AI lowers the technical barrier for productivity, it simultaneously raises the stakes for catastrophic errors and security breaches. The lesson? AI should be treated as a powerful tool, not a self-supervising colleague. Human oversight, strong governance, and a healthy dose of scepticism remain non-negotiable.
4. Anthropic is not slowing down
It’s been a busy month for Anthropic, and the headlines aren’t slowing down. Anthropic has now firmly overtaken OpenAI as the top choice for enterprises deploying large language models - snagging a 32% enterprise market share by usage compared with OpenAI’s 25%, according to Menlo Ventures’ July mid-year report. Just two years ago, OpenAI commanded half of all enterprise adoption while Anthropic languished at 12%. The tables have turned, especially in coding: Anthropic now owns an impressive 42% of the enterprise coding market, more than double its nearest competitor. The driving force? The launches of Claude 3.5 Sonnet, Claude 3.7 Sonnet and Claude 4 Opus and Sonnet, which have won praise for their reliability, depth, and developer-friendliness. With enterprises stubbornly favouring closed models, it’s no wonder that momentum has shifted away from open-source alternatives.
Anthropic hasn’t stopped at scale - it’s also leading on research transparency and AI safety. In recent weeks, Anthropic sounded the alarm on “subliminal learning”, a subtle but consequential risk where student models can unwittingly inherit undesirable traits from their “teacher” regardless of the training data’s content (we briefly referenced in this EAIW #22 when discussing MoE). This risk isn’t just a theoretical worry: Anthropic’s experiments showed that even perfectly innocuous-seeming data can transmit harmful quirks, raising the stakes for enterprise AI oversight. On the coding side, Claude Code’s new sub-agent concept further boosts productivity and software quality. With bespoke mini-agents that specialise in tasks like code review or testing - each operating in its own context window - teams can delegate, parallelise, and audit tasks with a level of modular precision unseen in most developer tools.
Arguably the most significant step for enterprise assurance is Anthropic’s public work on auditing agents designed to systematically test for AI misalignment. These audit agents can run automated checks on AI behaviours, looking for evidence of deviation from intended rules or ethical norms - accelerating issue detection and giving businesses a defendable line of sight into their deployed models’ actions. The approach brings interpretability and real-time alignment checking, as well as scalable oversight for regulatory and governance needs.
Why does this matter for our business? Anthropic is setting a new standard for transparency and openness among vendors of frontier models. Its willingness to surface “uncomfortable” research (like the risks of subliminal learning) and develop tools for real-time auditing gives us more confidence that we aren’t just buying performance, but also accountability and trustworthiness. This week’s news about their huge new funding round and sky-high $170 billion valuation only increases the scrutiny - but for those of us in highly-regulated sectors requiring demonstrable model safety, Anthropic’s proactive approach makes them an increasingly attractive partner.
5. What is the future for ‘Office-adjacent’ AI products?
Tryshortcut.ai and Gamma.ai are two of the most buzzworthy AI tools vying for a place alongside Microsoft Office this year, and for good reason. Tryshortcut.ai bills itself as an AI “superhuman Excel agent”, allowing users to automate and expedite complex spreadsheet tasks using natural language prompts. Tasks that once took hours - modelling, updating data, even scraping information from public filings - can now be orchestrated by typing a request or asking clarifying questions. For both seasoned financial modellers and less-experienced users, Shortcut offers the tantalising promise of making everyone an Excel power user overnight. The system’s transparency and explainability have also set it apart from a crowded market, providing users a window into how results are calculated and data sources validated.
Gamma.ai, on the other hand, tackles the PowerPoint problem by letting users create presentations, proposals, and web-based content through a single AI-driven prompt. Its card-based, visually engaging format is coupled with real-time collaborative editing, easy interactive embeds, and clever design automation - delivering results far cleaner and more modern than most DIY slide decks. Gamma shines brightest for startups, agencies and remote teams racing to communicate ideas quickly and with minimal faff. For those who feel PowerPoint’s glory days are behind us, Gamma provides an elegant, web-first antidote - although it does lack deep integrations with traditional Office tools and has some export limitations for now.
Beyond these two headline grabbers, a blossoming ecosystem of “Microsoft Office adjacent” AI tools are springing up. Tools like Tome for story-driven slide decks, Beautiful.ai for automated design, and Scribe for AI-powered documentation cater to the same professionals who spend their days in Office, offering time savings and features tailored to specific pain points. Meanwhile, Microsoft Office itself is hardly sitting on its laurels. Copilot, its native AI assistant, now threads through Word, Excel, PowerPoint, and Teams to generate content, automate analytics, and enhance team collaboration - all from within the familiar Office experience.
But here’s the dilemma: will CoPilot inevitably catch up and swallow this burgeoning galaxy of third-party AI tools? Or is there a lasting market for these boutique solutions? On one hand, the pace and playfulness of standalone startups allow them to sharply focus on gnarly problems - which means they’ll always (at least for a while) get to the future quicker. On the other, Microsoft has the distribution heft and inside track to integrate AI securely, compliantly, and effortlessly with existing workflows and data. For regulated and risk-conscious enterprises, that’s not a trivial detail.
Which leads neatly to the warning that must accompany any shiny new app: with the irresistible allure of these tools, staff will undoubtedly be tempted to “just try them out”. Yet, dabbling with web-based AI agents brings real data leakage risk - whether by uploading sensitive financial models or presentation drafts. Unlike Copilot, most of these tools aren’t designed for enterprise security or compliance. For enterprises, this poses an uncomfortable tension between innovation and information security. As capabilities evolve, the temptation grows, so we must ensure staff understand the risks and that IT is proactive about safer alternatives.
The emergence of Office-adjacent AI tools signals both threat and opportunity. On one hand, these platforms can dramatically boost productivity and reduce the time spent wrestling with legacy Office features. On the other, the risk of sensitive business information leaking onto the web through unsanctioned apps is very real. We must balance the incentives for innovation with the non-negotiable requirements of compliance - and guide colleagues towards approved, secure avenues for leveraging the best of new AI assistants.
POB’s closing thoughts
This week my daughter and I were talking about AI, and she mentioned that she likes to hear about when AI goes rogue (Hi Lucie)! Echoing the points above about transparency from Anthropic, I proceeded to tell her about Project Vend and Claudius, the out-of-control AI powered virtual vending machine. It’s an entertaining read - in some ways the experiment was a success, but Claudius also had some, er, interesting behaviours, such as ‘refusing $100 for a six-pack of Irn-Bru, a Scottish soft-drink that can be purchased online in the US for $15’ and ‘being cajoled via Slack messages into providing numerous discount codes and let many other people reduce their quoted prices ex post based on those discounts. It even gave away some items, ranging from a bag of chips to a tungsten cube, for free.’ Not ready for world domination yet, then!
I’ve previously written about NotebookLM, and I was excited this week to see the addition of Video Overviews to augment the existing audio overview capability. On first read I hoped this referred to the creation of vlog type content, when in fact it’s more ‘slide deck’ style with a voiceover, but it’s still cool.
Finally for this week, a reader (thanks Jason) pointed me to this post on LinkedIn from respected Box CEO, Aaron Levie. It discusses context engineering, the concept we discussed highlighting Manus in EAIW #23, as the increasingly most critical component for building effective AI Agents in the enterprise and highlights some key considerations. Well worth a read.
Thanks for reading, I hope you have a great weekend! 👍
I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. What AI topics are you most interested in for future explainers? Are there any specific AI tools or developments you'd like to see covered? Remember, if you have any questions around this Substack, AI or how Davies can help your business, you can reply to this message to reach me directly.
Finally, remember that while I may mention interesting new services in this post, you shouldn’t upload or enter business data into any external web service or application without ensuring it has been explicitly approved for use.
Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.















