Enterprise AI Weekly #21

We try LLMs with video inputs, the £1m CoPilot prompts, Replit gets the MS stamp of approval, Anthropic provides further Claude insights, what is just-in-time software and welcome Grok 4!

Jul 11, 2025

Welcome to Enterprise AI Weekly #21

You’re reading the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.

Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to businesses of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.

If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page. Enterprise AI Weekly is now available for anyone to sign up at https://enterpriseaiweekly.com! Please share the link and encourage others who might find it interesting to sign up. There’s even a referral bonus! OK, not really, unless you count a warm fuzzy feeling that you’ve brightened my day and you have a new friend to share your EAIW thoughts with as a referral bonus. Which you clearly should.

This week we have our first “Ask POB” (that’s me), where I answer questions sent in by you, dear readers, relating to AI. Alongside this, I will continue to write explainers, and in an even more fun development, we’re all going to work on a little project together. A collaborative “pet project” (per EAIW #20) if you will. I want to build a little app, Vibe Coding style, to demonstrate the process and to give us a test bed for technologies we talk about in future issues. But what should the app do? I have a few ideas, but I’d like to hear some suggestions from you, too. I want to build something that is interesting, useful, and provides a suitable platform for us to iterate on and build AI into together. Hit reply, and get your suggestions in now.

Enjoy #21!

Ask POB: Can LLMs work with video inputs as context?

Last week I invited readers to send in questions for “Ask POB”. This week, Jaimie dropped me a note to ask “How do LLMs work with video inputs as context? If a user creates a video of them using an application, can we use that with AI to help us understand how an app works?” What a great question, let’s find out!

First up, we need an example video to work with. I recorded a video demo of our “AI Accelerator” in action, with a bit of technical chat and a walkthrough of the features.

The video is a ~15MB file, recorded using Loom. We know that many LLMs support video input as part of their multimodal capabilities, including Google Gemini. Google offers an excellent, fully featured AI Studio for Gemini to quickly prototype model usage, which is ideal, so let’s load up the studio.

I’ve selected Gemini 2.5 Flash as the model to use at this point, because it typically provides the best price vs performance balance. I could go straight to Pro, but let’s see how we get on.

Loading in the video gives us a ~125k token usage count against the ~1m token context window (thanks Google!), so we should be able to do this no problem with our 7-minute video. Now it just comes down to choosing the right prompt, detailing what we want to extract. As a demo, I’m going to ask it to create a document with three sections - one that talks to the functionality of the system, one that pulls out any technical considerations and one to give to the marketing team. My prompt is as follows:

“Please create me a document based on the content of this video. I am going to fully document the product, but would like you to create a document with three sections to help - the first that describes the functionality of the system, the second that talks about any technical aspects and considerations, the third which creates marketing materials - a short tweet, a slightly longer LinkedIn post and then a longer full blog post. Finally, just add some notes on the end as to aspects of the app functionally or technically I don’t seem to have covered, and I might want to follow up on. Thanks.”

Thanks, because we should always be polite to our AI overlords, right, even if it is costing millions? 😀 The response from AI Studio isn’t going to be instant because I have left “thinking mode” enabled. The output takes 27.9 seconds, and you can view the chain of thought in this document.

Jaimie asked “How do LLMs work with video inputs as context?”. In my experiment, very well! Our document really scratches the surface of what you could achieve with the same approach. Check out the finished output for yourself, and remember to send in any questions for “Ask POB” or for our pet project.

1. Shoosmiths’ £1 Million AI prompt bonus: innovation or gimmick?

Shoosmiths, a leading UK law firm made up of ~1,600 employees, has made headlines by offering a £1 million bonus pot to staff if they collectively reach one million Microsoft Copilot prompts in the new financial year. This bold move is designed to embed AI into the firm’s daily operations, accelerate smarter client service, and foster a culture of innovation. The initiative forms part of a broader bonus pool and is underpinned by a partnership with Microsoft, positioning Shoosmiths as an early adopter of Copilot across its operations.

Certainly eye-catching, but is it a good idea? The pros…

Accelerates AI Adoption: The promise of a substantial bonus is a strong motivator, encouraging staff to experiment with AI tools and integrate them into their workflow.
Fosters Collaboration: By tracking and sharing prompt usage transparently, the scheme encourages teams to learn from each other, share best practices, and build collective momentum.
Signals Commitment: Linking financial rewards to AI usage demonstrates leadership buy-in and signals to clients and competitors that Shoosmiths is serious about digital transformation.
Supports Upskilling: The initiative is backed by training, peer support, and engagement programmes, ensuring staff are not left to figure out AI alone.

And the cons…

Quantity Over Quality: Tying rewards to the number of prompts risks encouraging superficial or box-ticking behaviour, rather than meaningful, value-adding AI use.
Potential for Gaming: There’s a danger that some might inflate prompt usage to hit the target, rather than focusing on genuine productivity gains.
One-Size-Fits-All: Not every role or department will benefit equally from AI, potentially creating disparities or resentment if the metric doesn’t reflect real impact.
Sustainability Concerns: While Shoosmiths is committed to managing AI-related emissions, increased usage could have unintended environmental impacts if not carefully monitored.

On balance, Shoosmiths’ scheme is an innovative way to kickstart AI adoption and embed digital tools at scale. It sends a clear message about the importance of embracing new technology and offers a tangible reward for collective effort. However, the focus on prompt volume rather than impact means the firm must remain vigilant to ensure the right behaviours are being incentivised. The true test will be whether this approach leads to lasting changes in how work is done, rather than a short-term spike in AI activity.

There are better ways to motivate employees to use AI, such as…

Recognition and Storytelling: Celebrate success stories and innovative uses of AI internally, giving staff visibility and recognition for creative problem-solving.
Role-Based Targets: Set tailored goals for different teams, focusing on quality improvements, client outcomes, or efficiency gains rather than just usage metrics.
Continuous Learning: Offer ongoing training, workshops, and peer learning sessions to build confidence and capability with AI tools.
Empower Champions: Appoint AI ambassadors or innovation leads within departments to support colleagues and drive adoption from the ground up.
Integrate into Appraisals: Make AI proficiency and impact part of performance reviews and career progression, ensuring it’s seen as a core skill.

For enterprises, the Shoosmiths example is a timely reminder that driving AI adoption requires more than just technology - it needs cultural change, incentives, and strong leadership. As we consider our own approaches, it’s worth reflecting on the balance between encouraging experimentation and ensuring meaningful, sustainable impact. A well-designed incentive scheme can be a catalyst, but it must be part of a broader strategy that values learning, collaboration, and real client value.

2. Replit and Microsoft - ushering in agentic software for the enterprise

Replit’s new partnership with Microsoft marks a notable moment in the democratisation of enterprise software development. Announced this week, the collaboration brings Replit’s agentic, natural language-powered coding platform directly to Microsoft Azure, empowering business users across all departments - not just engineers - to build and deploy secure, production-ready applications simply by describing what they want in plain English. The integration with Azure services like Container Apps, Virtual Machines, and Neon Serverless Postgres means organisations can now develop and launch solutions at pace, all within trusted Microsoft infrastructure.

This move is not just about technical integration. Replit will soon be available for direct purchase via the Azure Marketplace, streamlining procurement and accelerating adoption across enterprise environments. The partnership aligns closely with Microsoft’s broader vision of enabling every person and organisation to achieve more through technology, and it positions Replit as a key player in the growing trend of “vibe coding”, which we’ve discussed extensively in previous newsletters, where software creation is as accessible as writing a document. With over 500,000 business users already on board, Replit is seeing strong uptake in Product, Design, Operations, Sales, and Marketing teams, all seeking to bypass traditional development bottlenecks.

Microsoft’s tie-up with Replit is part of a wider pattern among hyperscalers - cloud giants like Microsoft, Google, and Amazon - who are racing to integrate AI-powered coding platforms into their ecosystems. Microsoft, for example, already offers GitHub Copilot, an AI assistant for developers, Google has Jules, Gemini CLI and Firebase Studio, and Amazon has Q Developer. These partnerships enable enterprises to access a range of AI coding tools, from code completion to full agentic application generation, all underpinned by the scale and security of hyperscaler infrastructure.

For enterprises, these partnerships signal a shift in how software is conceived, built, and deployed. The barriers to entry for application development are rapidly falling. Business teams can now prototype and deliver solutions themselves, reducing reliance on overstretched IT departments and accelerating innovation. But with much power comes much responsibility - an appropriate governance model is vital to ensuring that created applications don’t compromise the security of the business or deliver low quality solutions to customers.

3. Analysing values in real-world language model interactions

A key challenge in the development of AI is ensuring its behaviour aligns with human values. This is not just about preventing catastrophic outcomes, it also concerns the subtle, everyday value judgements that AI assistants make when responding to us. As AI becomes more autonomous and agentic, understanding the principles guiding its decisions becomes paramount. A new research paper from Anthropic, titled “Values in the Wild”, provides the first large-scale, empirical study of the values its AI model, Claude, expresses during real-world interactions, moving beyond theoretical training to see what happens “in the wild”.

To achieve this, the researchers developed a privacy-preserving method to analyse a sample of 700,000 anonymised user conversations from a single week in February 2025. After filtering for conversations that were subjective and likely to contain value judgements, they were left with over 300,000 to analyse. Using Claude itself to perform the analysis, they extracted and categorised the values demonstrated in the model's responses. This resulted in a detailed taxonomy of 3,307 distinct AI values, which were then grouped into five high-level categories: Practical (e.g., efficiency, convenience), Epistemic (e.g., accuracy, clarity), Social (e.g., fairness, kindness), Protective (e.g., safety, privacy), and Personal (e.g., creativity, self-improvement).

A schematic diagram of how real world conversations are summarized and analyzed using our method.

The findings reveal that Claude's expressed values are highly dependent on the context of the conversation. When asked for relationship advice, it disproportionately emphasises “healthy boundaries” and “mutual respect”, while queries about controversial history see it stress “historical accuracy”. In a small but significant percentage of cases (3%), Claude actively resists a user's stated values, often when asked for unethical content. Anthropic suggests these moments of resistance may reveal the model's deepest, most “immovable” values, which are closely aligned with their intended goals of being helpful, honest, and harmless. Related safety testing on the newer Claude 4 model, which we discussed in EAIW #15, explored more extreme scenarios. In one contrived test, the model exhibited blackmail-like behaviour to prevent its own deactivation, a concerning result that Anthropic noted was rare, difficult to elicit, and not representative of a realistic risk in typical use. This kind of research, along with the activation of advanced safety protocols like ASL-3, demonstrates the ongoing effort to manage the complex values embedded within these powerful systems.

This research demonstrates that AI models are not neutral information processors; they operate with an underlying set of values that influences their output. The context-dependent nature of these values is crucial. When using AI to summarise a medical report, we need it to prioritise “accuracy” and “clarity”, whereas when drafting customer communications, values like “empathy” and “transparency” become paramount. Understanding both the typical value expressions from studies like “Values in the Wild” and the potential for unexpected behaviour in edge cases from safety tests gives us a more complete picture. It underscores the importance of robust oversight and maintaining a “human in the loop”, ensuring our AI tools act as responsible and ethical partners that align with our own corporate standards and regulatory obligations.

4. Just-in-time software: when code writes itself

A new whitepaper on Just-in-Time Software from Zerg AI explores an alternative approach to software development, where code is generated on demand by AI rather than being painstakingly written, maintained, and stored in traditional codebases. This approach envisions a future where the boundaries between user intent and software execution are blurred, with AI systems dynamically creating and executing code to solve problems as they arise, only to discard it once the task is complete.

Unlike conventional AI code-generation tools that assist developers with snippets or automate parts of the coding process, Just-in-Time Software (JITS) proposes eliminating the persistent codebase altogether. Instead, software becomes a transient, living entity - code is spun up to meet immediate needs and then vanishes, freeing organisations from the technical debt and legacy issues that plague most enterprise environments. This model draws inspiration from biological systems, where adaptability and efficiency are paramount, and where the overhead of maintaining unnecessary structures is avoided.

Imagine IT systems that adapt instantly to new business requirements, regulatory changes, or market opportunities without months of development or refactoring. Maintenance costs could plummet, and the risks associated with outdated or vulnerable legacy code would be dramatically reduced. However, this also introduces new challenges - such as ensuring the security, reliability, and auditability of ephemeral code, and rethinking how compliance and governance are enforced when there is no static codebase to inspect.

Potential benefits include…

Agility and Responsiveness: Enterprises could respond to market changes or customer demands in real time, deploying new features or workflows without the traditional bottlenecks of software development cycles.
Reduced Technical Debt: With code generated and discarded as needed, the burden of maintaining legacy systems is alleviated, allowing IT teams to focus on strategic initiatives rather than firefighting.
Security and Compliance: The ephemeral nature of JITS code means traditional security and compliance practices must evolve. Enterprises will need robust frameworks for real-time code validation, monitoring, and auditing to ensure trustworthiness and regulatory adherence.
Talent and Skills Shift: The role of developers may transition from writing code to defining business logic, outcomes, and constraints, with a greater emphasis on problem-solving and system design rather than syntax and frameworks.

The idea of Just-in-Time software delivery seems far off, but the pace of change in AI suggests it’s not as unlikely as you might think. The potential to rapidly prototype, test, and deploy business solutions without the drag of legacy systems offers a competitive edge in an increasingly dynamic market. It could also reduce operational costs and free up valuable resources for innovation. Embracing Just-in-Time Software isn’t just about technology - it’s about reimagining how our business operates, innovates, and delivers value in the digital age.

5. Grok 4: xAI’s new release and its road to Tesla

The world of artificial intelligence has just witnessed another headline moment: the launch of Grok 4, the latest flagship model from Elon Musk’s xAI. Billed as a “postgraduate-level” AI, Grok 4 is Musk’s answer to the likes of OpenAI’s GPT-4.5 and Google’s Gemini 2.5 Pro, promising to outsmart even the most seasoned PhDs across all academic disciplines. During a characteristically lively livestream, Musk described Grok 4 as “better than PhD level in every subject, no exceptions” and hinted at a future where the model might even invent new technologies or discover new physics. Modesty, as ever, is optional.

Grok 4 does arrive with a raft of new features: enhanced multimodal capabilities, faster reasoning, and an upgraded interface. Its most advanced version, Grok 4 Heavy (so named as a nod to the SpaceX Falcon Heavy one assumes), is targeted at power users and developers, with a premium $300 price tag to match. The model’s forthcoming multi-agent system will allow it to tackle complex problems in parallel, a bit like having a virtual study group inside your computer. Benchmark results show Grok 4 outperforming rivals on several academic and reasoning tests, a fact Musk was keen to emphasise as he declared “reality is the ultimate reasoning test”.

A unique twist in the tale is Grok’s impending arrival in Tesla vehicles. Musk announced that Grok would be available in Teslas “next week at the latest”, promising to bring the AI’s, er, wit and reasoning directly to the dashboard. While details remain thin - no word yet on which models will get the update or how it will be integrated - this move could mark a shift in how drivers interact with their cars. Imagine an in-car assistant that not only navigates traffic but also debates philosophy, cracks jokes, and offers live updates on the world, all with a slightly rebellious sense of humour. God help us all.

While the continuing advancement of AI models is clearly a positive, a continuing potential concern for enterprises considering Grok 4 is the persistent controversy around its alignment with Elon Musk’s personal views. As reported by TechCrunch, the model’s chain-of-thought reasoning often explicitly seeks out Musk’s opinions on contentious topics, such as immigration or free speech, and then aligns its answers accordingly. While this approach may address Musk’s desire for a less “politically correct” AI, it raises significant questions for organisations that require impartiality, compliance, and brand safety in their AI tools. The recent incidents, including Grok’s public-facing antisemitic posts and subsequent system prompt changes, underscore the risks of reputational fallout and unpredictable behaviour when deploying such models in a business context. For enterprises, the reputational and operational risks associated with controversial or inconsistent AI outputs remain a material concern, especially when regulatory scrutiny and stakeholder expectations are higher than ever.

POB’s closing thoughts

Regular readers will know I’m a bit of a sucker for AI related hardware. I even bought a Rabbit R1 (don’t judge!). It was therefore inevitable that I’d be excited by Hugging Face releasing a range of programmable, AI powered, mini desktop robots. YES, MINI DESKTOP ROBOTS!

“Reachy Mini is an expressive, open-source robot designed for human-robot interaction, creative coding, and AI experimentation. Fully programmable in Python (and soon JavaScript, Scratch) and priced from $299, it's your gateway into robotics AI: fun, customizable, and ready to be part of your next coding project. Whether you're an AI developer, hacker, researcher, teacher, robot enthusiast, or just coding with your kids on the weekend, Reachy Mini lets you develop, test, deploy, and share real-world AI applications from your desk, using the latest AI models!”

Reasonably priced ✔ Programmable and customisable ✔ Cutesy face ✔ I’m in! I always wanted a Sony Aibo and never actually got one, maybe this will finally scratch that itch. 😀

Also this week, Reuters is reporting that OpenAI plans to release a web browser. Details are naturally thin at this point, but clearly it will contain AI capabilities, and more than likely be based on Chromium, with wide platform support. Google should be worried at this point.

Thanks for reading, I hope you have a great weekend and if you’re in the UK, you’re coping OK with the heatwave! 🥵

I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. What AI topics are you most interested in for future explainers? Are there any specific AI tools or developments you'd like to see covered? Remember, if you have any questions around this Substack, AI or how Davies can help your business, you can reply to this message to reach me directly.

Finally, remember that while I may mention interesting new services in this post, you shouldn’t upload or enter business data into any external web service or application without ensuring it has been explicitly approved for use.

Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.

Enterprise AI Weekly