Enterprise AI Weekly #5

Proxy Convergence allows anyone to try an AI assistant, Google Gems are now free, Canvas UI catches on, more models from Mistral and Baidu and what is the future of websites?

Mar 20, 2025

Welcome to Enterprise AI Weekly #5

Welcome to the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.

Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to business of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.

If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page.

I have also created a Teams channel to discuss topics mentioned in this post, and AI in general, with your fellow readers, and of course me too. To join, use this link. I’ll also post the things that made my Pocket list, but didn’t make it to the post!

Finally, remember that while I may mention interesting new services in this post, you shouldn’t put business data in any web service or application without ensuring it has been approved for use.

Explainer: Fine tuning

When discussing the use of Large Language Models for business purposes, the concept of fine tuning often comes up, typically as an opportunity to steal a march on competitors with something unique. But what does fine tuning mean?

LLM fine tuning adapts pre-trained models like GPT-4 or Gemini to specialised tasks by training them on smaller, domain-specific datasets. This process adjusts the model's parameters to improve performance in areas such as claims terminology comprehension, legal document analysis, or proprietary data interpretation. Modern techniques enable efficient fine-tuning with reduced computational costs while maintaining the model's general capabilities.

Why fine tune? Domain specialisation and task optimisation remain key drivers. Fine-tuning enhances accuracy for proprietary data formats, industry-specific jargon, or unique compliance requirements that base models can't address through prompting alone. It also reduces latency and costs compared to repeatedly prompting larger general-purpose models. However, the approach risks catastrophic forgetting (losing general knowledge), overfitting to narrow datasets, and rapid obsolescence as base models improve. A 2024 study found fine-tuning only beneficial for well-defined, repetitive tasks where response patterns are highly predictable.

Advances in LLMs are also reshaping the fine-tuning calculus. While early models like GPT-3 required extensive customisation, modern iterations demonstrate stronger out of the box performance across domains, reducing the need for resource-intensive tuning. Parameter-efficient tuning methods do now enable fine tuning of models more quickly and at a lower cost than ever before, but the rapid evolution of base models creates a "race against obsolescence". Many teams now prioritise retrieval-augmented generation (RAG) for dynamic data integration, reserving fine tuning for static, high-value use cases, while also achieving great results in domain specific tasks by using chain-of-thought prompting (breaking down complex tasks into smaller, logical steps within the prompt) or few shot learning (providing the model with a small number of examples within the prompt, to help it understand the desired output).

Effectively, as base models grow more capable, the threshold for justifying fine-tuning's costs and risks continues to rise. A post this week from Microsoft suggests that developments in improving LLM domain knowledge might render fine tuning further redundant.

The increasing capability of base models together with the fact capabilities are growing at an incredible rate means that the decision about whether to fine tune a model is more nuanced than ever. If we take this route, we’ll need to consider training cost, maintenance cost and risk of obsolescence due to advancing base models carefully.

1. More on computer use agents

Last week we talked about Manus, the groundbreaking autonomous AI agent developed by Chinese startup Butterfly Effect. No, I still don’t have an invite… I don’t know where I am on the 2m+ person waiting list, but I’m clearly a) not near the top and b) not influential enough on X (Twitter! 💔) nowadays to warrant special treatment.

It’s not all sad news though, as other Computer Use Agents continue to emerge, including one you can try, right now, free of charge. Proxy is an innovative AI assistant designed to automate complex online tasks using simple natural language prompts. Developed by Convergence, a London-based AI company aiming to democratise access to automated AI assistance, Proxy is built on Large Meta Learning Models (LMLMs), which enable it to learn from user interactions and provide a highly personalised experience.

Proxy can navigate websites, fill in forms, analyse data, and automate repetitive tasks without requiring technical expertise. It supports multitasking, allowing users to run multiple requests simultaneously. The ability to learn from past interactions is unique and promises to allow continual improvement in assistance over time.

Proxy offers a free tier with limited sessions and a Pro tier priced at a very reasonable $20 per month for unlimited sessions, additional ‘Deep Work’ features where multiple agents act at once, and repeatable scheduled tasks. Proxy is positioned as a versatile tool that can assist with tasks ranging from booking holidays to managing enterprise workflows, making it a promising solution in the growing market for AI agent-based products.

I set Proxy off searching for some cheap airport parking for my daughter and it was quite mesmerising to watch it do its thing!

Another Computer Use Agent service that has emerged is Scrapybara, which provides a playground for computer-based tasks with free credits, access multiple LLMs (including OpenAI CUA) and virtual Ubuntu or Chrome instances. And an API interface of course. Neat.

How much of your work admin or life admin could you automate with a capable Computer Use Agent? As noted last week, the rapidly improving ability to build solutions using web browser or computer control could be useful in specific scenarios where API connectivity is not available.

2. Gems, GPTs and Copilot agents

I wrote last week about how Google is on a bit of a roll and sure enough, that continues. Just as we ‘went to press’ with EAIW #4, the company announced yet more features for Gemini. An updated version of Gemini 2.0 Flash Thinking Experimental now has a longer 1m token context window and file upload (I am a huge fan of the file upload capability in Gemini both in their AI studio and on the API). Deep Research is even smarter and is rolling out for everyone to try a few times a month at no cost. Most significantly though, ‘Gems’ have migrated from the paid Gemini service to the free tier.

Gems let you customise Gemini to create your own personal AI expert on any topic. You can get started with premade Gems or quickly create your own custom Gems, like a translator, meal planner or maths coach, by going to the “Gems manager” on desktop, providing instructions and giving it a name. You can then chat with it whenever you want and even share your Gem with others. You can upload files when creating a custom Gem, so it can reference helpful documents via RAG.

The Gem concept isn’t new and was pioneered by OpenAI with their ‘GPTs’ (I never liked the name). While they work well on OpenAI, they are only available to paid users.

Microsoft have their own equivalent, called ‘Agents’, which are currently available only to Enterprise users. They enable the same ability to provide a conversational interface to access uploaded ‘Knowledge’, that can be effectively packaged and redistributed in the organisation. This does, of course, come at a cost - it’s either included in the Microsoft CoPilot 365 $30/month subscription, or charged on a ‘pay as you go’ basis otherwise. I do anticipate that we’ll see agents in the CoPilot consumer product in future, given Microsoft’s recent move to provide Deep Research and Voice for free in the app.

The ability to create pre-packaged agents has a lot of potential. Many areas of the organisation have a collection of knowledge where it would be useful if that were easily accessible to others (think policies and processes, for example), the creation of Microsoft agents could unlock this capability.

3. OpenAI and Google now both provide canvas UIs

In my closing thoughts last week I mentioned NotebookLM, the fabulous Google AI based research tool (which incidentally gained mind map support this week). It’s interesting to see that NotebookLM-like features have now migrated to Gemini.

Along with ‘Audio Overview’ (the NotebookLM podcast generating feature), Google has introduced ‘Canvas’, a new interactive space within Gemini designed to simplify the creation and refinement of your work. Selecting ‘Canvas’ in the Gemini prompt bar provides the ability to write and edit documents or code, with changes appearing in real-time.

Interactivity throughout the creation process is key to Canvas, with the LLM assisting in creation of drafts and curation of the final edit with AI’s assistance. Where Gemini Canvas really gets impressive though is when it moves beyond documents to code.

Canvas can help transform your coding ideas into working prototypes for web apps, Python scripts, games, simulations, and other interactive apps. Within Canvas, you can generate and preview your HTML/React code and other web app prototypes to see a visual representation of your design.

Clearly, I had to try this out, so I headed to Gemini and entered the first prompt that came into my head: ‘Make me a Flappy Bird game with a pigeon.’ (In my defence, I’d been reading about someone's vibe-coded Flappy Bird 3D game earlier that day!). It set off doing its thing...

…and about 1 minute later…

Really! A working game (although it used the pigeon sprite from a site that was blocking direct image linking, hence the weird black square!)

Once again, it’s important to note that Google isn’t the first to introduce the Canvas concept, OpenAI did it in October last year, and made it available to free users last month.

The Canvas approach to using AI is incredibly useful. Unfortunately, there’s no equivalent in Microsoft Copilot yet, but as the AI giants continually try to better each other’s offerings, it may arrive in the future. In the meantime, I’m thinking about how we might be able to safely provide these tools to the organisation. P.S. Did you know that if you’re a Google Pixel 9 owner it includes a year of Gemini Advanced, including NotebookLM advanced, free?

4. This week’s model updates

The model updates continue to come thick and fast, as do the releases from Mistral AI, the French AI startup. On the back of their Mistral OCR release, they this week announced Mistral Small 3.1, an Open-Source model released under an Apache 2.0 license.

As the name suggests, this is a Small Language Model, with much lower hardware requirements. Mistral Small 3.1 can run on a single gamer-grade RTX 4090 GPU or a Mac with 32GB RAM, making it a great fit for on-device use cases. It’s ideal for virtual assistants and other applications where quick, accurate responses are essential, or where low latency is important such as in automated or agentic workflows.

Mistral Small 3.1 adds multimodal capabilities to its predecessor and is designed to handle a wide range of generative AI tasks, including instruction following, conversational assistance, image understanding, and function calling. It provides a solid foundation for both enterprise and consumer-grade AI applications.

What’s crazy is that the model outperforms comparable models like Gemma 3 and GPT-4o Mini in industry benchmarks, while delivering inference speeds of 150 tokens per second. Yes, that’s the Gemma 3 that was released by Google a week ago and that they have been demonstrating running on mobile and at the edge… such is the pace of progress in AI now.

We know that the pace of AI progress is relentless not just in Europe and the US but in China too, and that was reaffirmed this week with a significant release from Baidu. Baidu unveiled ERNIE 4.5 & X1. ERNIE 4.5 is the company’s latest foundation model and new-generation native multimodal model, which “outperforms GPT-4.5 in multiple benchmarks while priced at just 1% of GPT-4.5”. ERNIE X1 is a deep-thinking reasoning model with multimodal capabilities, which delivers performance on par with DeepSeek R1 at only half the price. Bear in mind that we were only just talking about how DeepSeek destroyed the pricing structure for leading models!

Highly capable SLMs are a key ingredient in providing AI capability while minimising energy use and impact on the environment. They also provide versatility around potential hosting options and local or edge deployments. These advancements, together with Baidu driving the AI race further, provide more evidence that the capability curve is increasing rapidly while the cost curve is travelling in the opposite direction.

5. llms.txt

AI-based search engines are rapidly gaining traction, marking a significant shift in how users interact with the internet. Traditional search engines like Google, while still dominant, are seeing their market share erode as AI-driven platforms like Perplexity and Leo emerge. These AI search engines offer more personalised, accurate, and concise results, often providing direct answers without the need for users to click through multiple links. Google itself is integrating AI into its search capabilities, such as with AI Overviews, which use generative AI to deliver detailed responses directly on the search engine results page (SERP).

This evolution in search technology has profound implications for websites. As AI-driven search becomes more prevalent, websites must adapt to remain visible. This includes optimising content for natural language processing (NLP) and ensuring that it is structured in a way that aligns with user intent and conversational queries. Websites will need to prioritise relevance, context, and freshness of content to be effectively indexed by AI crawlers. Moreover, the rise of zero-click searches means that websites must be prepared to provide concise, fact-based answers that can be used in featured snippets or AI-generated summaries.

To thrive in this new landscape, websites should also focus on creating machine-readable versions of their content. Techniques such as using llms.txt files can provide a competitive edge by presenting content in a clean, Markdown-optimised format. This simplifies content extraction for AI systems, improving the accuracy of indexing and enhancing the quality of AI-generated search snippets. Additionally, structuring data in a way that aligns with the training process of large language models (LLMs) helps AI systems better understand the semantic connections between different pieces of content, leading to more contextually relevant search results.

In the future, websites that fail to adapt to these changes risk being left behind. As AI search engines continue to refine their ability to understand user intent and deliver personalised results, businesses must prioritise ethical SEO practices and adapt their content strategies to align with these advancements. This includes focusing on holistic content, unique storytelling, and expert insights to build trust and credibility with their audiences. By embracing these changes and optimising for AI-driven search, websites can not only maintain their visibility but also enhance their user engagement and conversion rates in a rapidly evolving digital landscape.

We have many web properties and while they may not be our primary source of client acquisition, it will be important to ensure we support the changing landscape of discovery.

POB’s closing thought(s)

It’s been another exciting week in the world of AI, and there doesn’t appear to be any sign of things slowing down any time soon. There’s lots of work for us all to do to determine how we most effectively yet safely use AI in our working lives, and lots of work for me to do to ensure I make it as easy as possible to take advantage. Challenge accepted!

I’ll end this week with a couple of interesting things I’ve come across. In a previous post, I talked about a voice cloning service that provided impressive results with a few minutes of training. This week I tried out Cartesia, a service that can clone your voice using only 3 seconds of audio. Yikes! Huge props to this guy, who created an MCP plugin (a tool used to connect a LLM to external services) for Ableton, a music application, and can now use natural language to write music. Amazing!

Happy Friday (by the time you read this, probably…) have a great weekend! 👍

Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.

Enterprise AI Weekly