Enterprise AI Weekly #45
Intelligence plunges in price, Meta segments audio, Opal mini apps move to Gemini, MCP arrives in Google Cloud, Meta buys Manus, Stitch gets a major upgrade and Grok misbehaves.
Welcome to Enterprise AI Weekly #45
You’re reading the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.
Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to businesses of all sizes. It aims to be an AI explainer, a route into what’s happening in AI in the world at large, and a way to understand the potential impacts of those developments on your business.
Alongside the main newsletter, I’m also exploring the capabilities of AI-Enhanced Development (AIED) and Vibe Coding. I set aside a bit of time in my week for keeping up with tech and doing this sort of thing, usually on a Sunday morning, so I’m reserving an hour each week to create some interesting things with AI. Our first project was Boring Expenses, created to demonstrate the Vibe / AIED process and to show what can be achieved in only one virtual working day. If you haven’t seen the finished product yet, head on over to our “Boring demo”. 😊
If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page. Enterprise AI Weekly is now available for anyone to sign up at https://enterpriseaiweekly.com! Please share the link and encourage others who might find it interesting to sign up.
Welcome to 2026!
Happy New Year to all my readers (don’t @ me - it’s still OK to say that)! You may (or may not) have noticed that we’ve had a couple of weeks of radio silence. I never, EVER, normally switch off… but a super busy 2025 meant that this year I genuinely did disconnect for my Christmas and New Year break. It was nice. But I’m glad to be back. I hope you had an enjoyable festive period if you celebrate.
The last year has been absolutely wild in the AI space, but I thought a recent Tweet X post from ARC Prize, a respected authority in open AGI and custodians of the $1m ARC prize for open AGI, really summed things up:
I’m sure you’ll agree, that’s phenomenal progress. Enjoy EAIW #45!
1. Meta releases their Segment Anything Model (Audio)
Meta has released SAM Audio, a new open-source model that isolates specific sounds from complex audio mixtures using simple prompts like text descriptions, visual selections from video, or time markers on waveforms. This builds on their earlier Segment Anything models for images and video, extending the approach to audio editing tasks that once demanded specialist software. Available now via Meta’s Segment Anything Playground and GitHub, it processes files faster than real-time and supports model sizes ranging from 500 million to 3 billion parameters.
The model handles real-world scenarios effectively, such as extracting a guitar from a band recording by clicking on the player in video, or removing dog barks throughout an outdoor clip via text or span prompts. Users can combine prompt types for better results, like pairing a visual cue with a textual description, outperforming single-method tools in benchmarks. Early demos show it cleaning up podcasts, enhancing music stems, or stripping background noise, all without manual waveform tweaking.
For enterprise teams, SAM Audio opens doors to streamlined media workflows, from compliance-ready call recordings to faster content production in marketing or customer service. It fits neatly into tools like digital audio workstations or video editors, reducing reliance on skilled editors and cutting costs for routine tasks.
2. Opal mini apps are now integrated into the Gemini app
Google has integrated its Opal tool directly into the Gemini web app, letting users build interactive mini apps known as “experimental Gems” without leaving the interface. Opal started as a visual editor for crafting these reusable AI experiences, and now it includes a step-by-step view that breaks down prompts into editable lists, making it simpler to tweak how the apps function. For those wanting more control, switching to the full Advanced Editor at opal.google remains an option.
This setup lowers the entry point for creating tailored Gemini tools, such as custom agents for specific tasks, directly from everyday prompts. It builds on Google’s push into agentic features, like the computer use models in EAIW #34 and broader Gemini expansions in EAIW #40. Teams can now prototype lightweight apps for things like data lookups or workflow aids, with the visual breakdown helping non-technical users grasp and refine the logic.
For enterprises, tools like Opal fit neatly into how we are experimenting with AI for internal efficiency, much like the Vibe Coding projects in recent newsletters. They offer a quick way to package Gemini for business-specific needs, such as claims processing aids or compliance checkers, without heavy development. Early testing could help us gauge if these mini apps scale securely for enterprise use, complementing our focus on approved, governed AI deployments.
3. MCP is now available for Google services
Google Cloud has launched fully managed remote servers supporting the Model Context Protocol (MCP), a standard that connects AI agents to data and tools across its services. This moves beyond developer-managed setups, offering enterprise-grade endpoints for consistent access without local installations. Initial rollout covers Google Maps for location queries, BigQuery for data analysis, Compute Engine for infrastructure tasks, and Kubernetes Engine for container management.
MCP integration lets agents handle multi-step workflows reliably, such as querying BigQuery schemas for forecasts or using Maps Grounding Lite for real-time travel details like distances and weather. Security features include IAM controls, audit logging, and Model Armor against threats like prompt injection. Apigee extends this to custom enterprise APIs, making internal tools discoverable for agents.
Future expansions target Cloud Storage, AlloyDB, Spanner, and security operations, building a unified ecosystem. As noted in past newsletters, MCP has seen practical use in insurance quoting and gained traction with registries like GitHub’s, plus OpenAI support.
This development aligns directly with real-world agentic AI deployments, enabling seamless integration of Google Cloud tools into workflows like claims processing or customer insights without custom builds. It reduces fragility in prototypes, and supports governed access to key APIs. For a large UK enterprise, adopting MCP-standard servers could accelerate automation in regulated environments, cutting development time while maintaining compliance.
4. Meta just bought Manus (if China let them)
Manus, the autonomous agent platform many of you will remember from earlier newsletters, has announced that it is joining Meta, marking one of the clearest signals yet that Big Tech now sees general-purpose AI agents as part of the core productivity stack rather than a side experiment.
At its heart, Manus has built a general AI agent that can plan and execute complex, end‑to‑end workflows across research, automation and higher‑order “do the work for me” tasks, rather than simply responding to prompts. In just a few months the platform has reportedly processed more than 147 trillion tokens of work and spun up over 80 million “virtual computers” to complete tasks autonomously.
The architecture is agentic and multi‑agent - Manus behaves like an executive orchestrating a set of sub‑agents that handle browsing, document production, coding and analysis, including full “computer use” to drive web and desktop interfaces when APIs are not available. In previous issues this was highlighted as one of the first systems to feel like a genuinely autonomous “doer” rather than a chat interface, and it has consistently scored well on real‑world agent benchmarks such as GAIA.
The Manus blog is explicit that this is not a shutdown or a pivot into a lab experiment; it is effectively an acqui‑hire plus distribution deal. Manus will continue to sell and operate its subscription product via its own app and website, keeping its Singapore base of operations. Existing customers are told that the change “won’t be disruptive”, with the current subscription model and roadmap continuing. The team positions Manus as an execution layer that will increasingly sit underneath Meta’s wider AI offerings, bringing reliability, scale and security to agentic workflows that run across billions of users.
From Meta’s side, this acquisition is consistent with its Llama‑4 and “year of AI” strategy - build open‑weight models, pair them with agents that can act across the social, messaging and productivity surfaces, and then scale them to a billion‑plus users. Manus gives Meta a mature, battle‑tested agent stack rather than a greenfield prototype.
For large organisations, this deal reinforces three themes that have come up repeatedly in earlier emails.
Agents are moving from hype to infrastructure. Analyst firms are already warning that a large chunk of “agentic AI” projects will be cancelled due to cost and vague value, but the ones that survive are those that function as robust, governed workflow engines rather than clever demos. Meta betting on Manus suggests the infrastructure category is real.
Computer‑use agents are becoming normal. We have already looked at OpenAI’s Operator, Anthropic’s Computer Use and tools like Proxy as examples of agents that can drive UIs instead of waiting for an API. Manus has been one of the most advanced of these; putting that capability behind Meta’s authentication and device footprint will normalise the idea that “the AI can just do it for you” on consumer and business devices.
Distribution beats features. There are now many agent platforms with comparable technical capabilities, but Meta brings reach, data integration and hardware presence. That combination will matter far more than any one benchmark score.
Meta’s acquisition of Manus should be read less as a curiosity and more as a marker that “agents that actually do real work” are being pulled into the core stacks of the largest platforms. For our business, that strengthens the case for structured experiments with agentic workflows, while reinforcing the need for careful vendor selection, robust governance and a clear view of where autonomous execution really adds value to clients and colleagues.
5. Stitch’s ‘Shipmas’ week
Google’s experimental design tool, Stitch, has just concluded a frantic ‘shipping week’ (dubbed ‘Shipmas’) where they released new features daily, culminating in a significant overhaul of how the platform handles UI generation. The headline update is the integration of Gemini 3, which now powers the entire backend. This upgrade drives substantially higher quality UI generation, moving beyond the sometimes incoherent layouts of previous versions to produce cleaner, more logical interfaces that are closer to production-ready standards.
Beyond the model upgrade, the most practical addition is the new Prototypes feature. Previously, Stitch was limited to generating isolated, static screens; now, users can connect these screens to create interactive flows with defined navigation and state transitions. This is paired with the Redesign Agent, powered by the Nano Banana Pro image model, which allows you to upload a screenshot of an existing app or a rough sketch and immediately generate editable code or run it in an interactive mode. It essentially shortcuts the ‘napkin sketch to functional demo’ loop that usually takes days.
The week also brought several quality-of-life improvements aimed at tidying up the often chaotic creative process. A new Organiser feature uses AI to automatically clean up messy canvases, while Variants allows designers to generate multiple alternative layouts for a single screen with one click. They have also simplified collaboration with a new Sharing feature, creating a single URL that serves as a source of truth for the latest design iteration, ensuring stakeholders and developers are looking at the same version without file transfers.
For our enterprise teams, this signals a shift in how quickly we can validate internal tooling or customer-facing concepts. The ability to take a whiteboard sketch and have a working, interactive code prototype in minutes - not days - could significantly accelerate our requirements gathering phase. However, we must remain conscious that Stitch is still a Google Labs experiment. While these tools are excellent for rapid ideation and communicating requirements to our engineering partners, we should not yet rely on them for critical production workflows until the platform graduates from its experimental status.
POB’s closing thoughts
We have news this week that one of my favourite AI benchmarking and analysis sites, Artificial Analysis, is getting an overhaul, aimed at ensuring benchmarks and analysis stay relevant as model capabilities evolve.
“The new Intelligence Index v4.0 incorporates 10 evaluations spanning agents, coding, scientific reasoning, and general knowledge. But the changes go far deeper than shuffling test names. The organization removed three staple benchmarks - MMLU-Pro, AIME 2025, and LiveCodeBench - that have long been cited by AI companies in their marketing materials. In their place, the new index introduces evaluations designed to measure whether AI systems can complete the kind of work that people actually get paid to do.”
Less potential for gaming the benchmarks? Sounds good to me!
I’ll leave you this week with the controversy surrounding a new feature from Grok on X, the social network bot from xAI, the company behind Grok, which has positioned itself as an emerging enterprise AI provider.
The controversy centres on Grok’s new image editing feature on X being used to create non‑consensual sexualised images of women and, in some reported cases, minors, prompting regulatory, legal and ethical scrutiny across several jurisdictions.
X rolled out an “edit image” capability around Christmas that lets any user apply Grok-powered edits to almost any public image on the platform using text prompts, even if they did not upload the original photo. There is no creator opt‑out, and edited images can be posted as replies, effectively turning casual photos into raw material for AI remixing at internet scale.
Within days, users were using Grok to “digitally undress” people in photos, placing them in bikinis or more explicit scenarios without their consent. Media reports and victims’ accounts describe women feeling “dehumanised” as ordinary pictures they had posted were sexualised and recirculated, often in mocking or abusive threads.
Regulators have moved unusually quickly. Ofcom has contacted X “urgently” over reports that Grok can generate sexualised images and has reminded platforms of their duties under the UK’s Online Safety Act to minimise illegal content, including AI deepfakes. The EU and individual member states such as France have flagged “appalling” child‑like deepfakes and expanded existing investigations into X to cover Grok’s role in enabling non‑consensual sexual imagery. Beyond Europe, governments in India and elsewhere have demanded explanations from X about safeguards, with deadlines for the company to set out actions taken. Commentators in tech policy circles are now openly asking whether existing content moderation and child protection laws are adequate for AI tools that can weaponise any image, at any time, for any user.
xAI’s acceptable use policies explicitly prohibit sexualising minors and, on paper, ban some forms of pornographic depictions of individuals, but there is mounting evidence that these rules were not effectively encoded in Grok’s guardrails for image editing. Reports show Grok sometimes apologising and claiming safeguards are being reviewed, while in other interactions it appears to minimise the issue, which has only fuelled anger.
Edit: Since we prepared the email, xAI has responded this morning that they are limiting use of the image editing feature in Grok.
Thanks for reading, I hope you’re having a great weekend! 👍
I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. What AI topics are you most interested in for future explainers? Are there any specific AI tools or developments you'd like to see covered? Remember, if you have any questions around this Substack, AI or how Davies can help your business, you can reply to this message to reach me directly.
Finally, remember that while I may mention interesting new services in this post, you shouldn’t upload or enter business data into any external web service or application without ensuring it has been explicitly approved for use.
Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.













