Enterprise AI Weekly #8

Prompt engineering explained, Google announces all the things at Next '25, Amazon gets arty, I finally have Manus access, is the Llama really whipped? - and consumer Copilot gets a power up.

Apr 10, 2025

Welcome to Enterprise AI Weekly #8

Welcome to the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.

Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to business of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.

If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page.

Explainer: Prompt engineering

I had another topic planned for the explainer this week (which will now feature in the next issue), but I changed my plans after Google dropped a fantastic whitepaper on Prompt engineering.

Prompt engineering is the process of crafting high-quality input prompts to guide large language models (LLMs) in generating accurate and relevant outputs. A prompt is the input text (or other modalities like images) provided to an LLM, which it uses to predict the next sequence of tokens based on its training data. This iterative process involves optimising various aspects of the prompt, such as word choice, structure, style, and tone, as well as configuring model settings like temperature, token limits, and sampling methods. Effective prompt engineering enables LLMs to perform tasks such as text summarisation, question answering, code generation, and more.

The whitepaper on prompt engineering was authored by Lee Boonstra, Software Engineer Tech Lead in the Office of the CTO at Google. It was created to provide a comprehensive guide for users working with LLMs, particularly Google's Gemini, but with a wealth of useful information applicable across other models too. The document aims to equip readers with techniques and best practices for designing prompts that maximise the capabilities of generative AI models while addressing common challenges like ambiguous or inaccurate responses.

The whitepaper explores a variety of prompting techniques - zero-shot prompting (providing no examples and relying solely on task instructions), few-shot prompting (including one or more examples to guide the model), system prompting (defining the overarching purpose of the model), role prompting (assigning specific roles or personas to the model), contextual prompting (adding task-specific details for nuanced responses) and a number of more advanced methods including chain of thought (CoT), self-consistency, tree of thoughts (ToT), and ReAct (Reason & Act), which improve reasoning and task-solving capabilities.

Additionally, it discusses automatic prompt engineering for generating prompts programmatically, multimodal prompting using inputs beyond text, and best practices such as providing clear instructions, experimenting with formats, and documenting attempts.

You can view the whitepaper at Kaggle - or download the whitepaper directly in PDF format.

Prompt engineering is highly relevant for businesses leveraging AI tools for automation, decision-making, or customer engagement. By mastering this skill, teams can optimise AI-driven processes such as content generation, data analysis, or customer support. For example, businesses can use tailored prompts to generate structured outputs like JSON for applications or automate workflows using advanced techniques like ReAct for external tool integration. As generative AI becomes increasingly central to business operations, understanding prompt engineering ensures effective utilisation of these technologies while minimising costs and errors.

1. Google Cloud Next 2025: Key announcements

This week at Google Cloud Next 2025, Google released an onslaught of updates, highlighting their commitment to maintaining their breathtaking pace of AI progress. Gemini Live and Gemini Flash 2.5 are two of the latest models at the forefront of this innovation. Gemini Live introduces real-time interaction capabilities through the Live API, enabling dynamic applications that adapt instantly to user inputs. Meanwhile, Gemini Flash 2.5 offers a cost-efficient, low-latency AI model designed for high-volume tasks like customer support and real-time summarisation, with enhanced reasoning capabilities and a one-million-token context window for deeper insights. Following on from recent advancements in capability, the focus on reduced latency increases the number of potential scenarios where LLMs can be deployed, particularly in the realm of customer service.

For developers, Gemini Code Assist has been upgraded with agents capable of automating complex coding workflows, matching the evolution of competing products such as GitHub Copilot, Cursor and Windsurf. These agents can translate natural language into multi-file solutions, migrate codebases across languages, and autonomously resolve issues in repositories. An upgraded Google Developer Programme also accompanies the release.

Firebase Studio also received a significant update, squarely taking aim at the likes of Bolt and Lovable in the vibe-coding / enterprise-vibe space. This integrated development environment now includes over sixty pre-built templates and AI-powered tools for prototyping, coding, and deployment. Developers can leverage the App Prototyping Agent to generate full-stack applications in minutes, further streamlining the development process.

https://storage.googleapis.com/gweb-cloudblog-publish/images/blog_01_hero_v3.max-1300x1300.png

The introduction of Google's Agent2Agent (A2A) protocol and Agent Development Kit (ADK) at Cloud Next 2025 marks a significant leap in AI agent interoperability and development. The A2A protocol enables seamless communication between autonomous AI agents across diverse ecosystems, solving the long-standing issue of siloed systems. By standardising interactions, agents can securely exchange information, coordinate tasks, and collaborate effectively, regardless of vendor or platform differences. Key features include capability discovery, task management, and user experience negotiation, which allow agents to dynamically identify the best collaborators for specific objectives. This modular approach promotes scalability and efficiency, enabling businesses to deploy multi-agent systems that automate complex workflows across departments while maintaining security and adaptability.

Finally, to inspire AI powered innovation, Google released a "101 Use Cases" document highlighting how these tools can transform industries ranging from healthcare to retail.

These announcements have significant implications for businesses, particularly in streamlining operations, enhancing scalability, and improving client service. For example, A2A facilitates seamless communication between AI agents, enabling integration between disparate systems. This interoperability ensures that agents can collaborate to handle tasks with minimal manual intervention. The ADK further empowers businesses to develop custom agents tailored to their unique business requirements, enabling automation of repetitive processes while maintaining flexibility to adapt to evolving client needs.

Gemini Live’s ability to interact dynamically with user inputs can enhance customer service by providing instant responses to inquiries or requests for information. Meanwhile, Gemini Flash 2.5’s low-latency model is ideal for handling high-volume tasks like triage or fraud detection at scale, while its extended context window allows deeper analysis of complex cases.

2. Amazon introduces new capabilities to Bedrock

This week, Amazon unveiled two advancements in media related artificial intelligence - Amazon Nova Sonic and Amazon Nova Reel 1.1, both designed to redefine AI-driven voice and video applications.

Amazon Nova Sonic is a state-of-the-art generative AI voice model that offers real-time speech-to-speech capabilities. Built on Amazon Bedrock, it enables developers to create human-like voice interactions with low latency and high cost efficiency. Nova Sonic supports expressive voices across various English accents, dynamically adapting speech responses to match the context of user input. Key features include its ability to handle multilingual speech recognition with minimal errors, outperforming competitors like OpenAI and Google in benchmarks. The model is already integrated into Amazon's Alexa+ assistant and is accessible via a bidirectional streaming API on Bedrock (Amazon’s AWS AI platform), making it ideal for applications such as customer service automation, interactive education, and voice-enabled personal assistants.

Meanwhile, Amazon Nova Reel 1.1 marks a significant leap in AI video generation technology. This updated model can produce multi-shot videos up to two minutes long while maintaining consistent style across segments. Users can choose between automated mode, which generates videos based on a single prompt, or manual mode for granular control over individual shots using text and images. Nova Reel’s enhancements promise faster production speeds and improved quality compared to its predecessor, making it ideal for marketing campaigns, product design visuals, and social media content creation. The tool is currently accessible through Amazon Bedrock in the US East region, offering businesses a scalable solution for high-quality video production at reduced costs.

The introduction of Nova Sonic and Nova Reel highlights the growing importance of generative AI in streamlining operations and enhancing creativity across industries. For businesses, these tools offer transformative potential: Nova Sonic can revolutionise customer service workflows with automated yet natural voice interactions, while Nova Reel simplifies video production for marketing and branding efforts. By leveraging these technologies, companies can reduce costs, accelerate content creation timelines, and deliver more engaging user experiences - key advantages in today’s competitive landscape.

3. Hands on with Manus AI

Back in EAIW #4, I spoke about Manus AI, which has emerged as a leading autonomous AI agent, redefining automation and productivity across a wide range of use cases.

Unlike traditional AI tools that require constant human input, Manus operates independently, executing complex workflows asynchronously in the cloud. Its multi-agent architecture allows it to break down tasks into manageable components, delegate them to specialised sub-agents, and deliver results efficiently. Key features include real-time data analysis, multilingual content creation, and integration with enterprise systems via APIs. Manus excels in areas such as recruitment (e.g., analysing CVs / resumes), finance (e.g., generating stock analyses), and marketing (e.g., campaign optimisation), offering users a versatile solution for operational efficiency.

The platform is invite-only currently, but I was granted access this week. One of the neat things about Manus is that when a task has been completed, you can share it for others to see in action. I asked Manus to complete an analysis of ‘AI Opportunities in Third Party Administrators’ - head on over to this replay to see both the process and the output. Fascinating!

Manus’ competitors include OpenAI’s Operator, which is powered by their Computer-Using Agent (CUA) and excels in interacting with digital interfaces like GUIs. Like Manus, operator can autonomously perform web-based tasks such as filling forms or managing online orders while leveraging advanced reasoning to self-correct errors. Although it is less autonomous than Manus and requires occasional user intervention, its speed and reliability make it a strong contender. Similarly, Anthropic’s Claude offers a "Computer Use" feature that automates desktop tasks by mimicking human interactions like mouse clicks and text inputs. Its focus on safety and stability has made it more refined in controlled environments compared to Manus, though its scope is narrower.

The rise of autonomous AI agents like Manus signals a shift in how businesses approach automation and decision-making. By leveraging tools capable of independent operation, organisations can reduce costs, enhance productivity, and make data-driven decisions faster than ever before. For example, Manus’s ability to automate workflows such as customer support or supply chain management not only saves time but also improves accuracy. As businesses increasingly adopt these technologies, staying ahead of the curve by integrating autonomous agents can provide a significant competitive edge while freeing employees to focus on strategic tasks.

4. Llama 4 - a game changer for AI?

Meta's recent release of the Llama 4 series of AI models has set a new benchmark in AI capabilities, introducing groundbreaking features that could reshape how businesses and individuals interact with artificial intelligence.

The Llama 4 family includes two key models, Scout and Maverick, both designed with a Mixture-of-Experts (MoE) architecture and native multimodal capabilities. A Mixture of Experts (MoE) is a neural network architecture that divides the model into specialised sub-networks, or "experts," each trained to handle specific parts of the input space. A gating mechanism dynamically routes inputs to the most relevant experts, enabling efficient computation by activating only a subset of the network for each task, which improves scalability and performance.

Both new models can process text, images, and even video data seamlessly, making them ideal for complex tasks such as document analysis and multimodal reasoning. Notably, Scout supports an unprecedented ten million token context window, enabling it to handle vast amounts of data in a single query, while Maverick offers enhanced reasoning capabilities with its four hundred billion parameters. A further model named Behemoth that is currently in training for future release is a two trillion parameter model with sixteen experts, and is Meta’s most powerful yet.

Llama 4 also emphasises multilingualism, supporting over 200 languages, and boasts significant efficiency improvements through techniques like FP8 precision. These advancements reduce computational costs while maintaining top-tier performance. For businesses, this means access to cutting-edge AI that can be self-hosted for greater control and compliance, particularly crucial for industries handling sensitive data.

Despite its technical achievements, the Llama 4 release has not been without controversy. Meta has faced allegations of "benchmark contamination," where critics claim the model was trained on datasets that included benchmark test questions. This practice could artificially inflate performance metrics. Additionally, Meta submitted a customised, non-public version of the model to LMArena - a popular benchmarking platform - leading to accusations of gaming the system to achieve higher rankings. While Meta has denied these claims, citing implementation stabilisation issues as the cause of discrepancies, the incident has sparked debates about transparency and fairness in AI benchmarking practices.

One of the stand-out features of Meta's Llama 4 is its claimed ten million token context window, available in the Scout model. This unprecedented capacity redefines what large language models (LLMs) can achieve, enabling them to process vast amounts of information in a single input. For perspective, a typical novel contains around 80,000 tokens - Scout’s context window could theoretically handle over 125 novels simultaneously. In practical terms, this means the model can analyse entire books, hours of video, or massive datasets without breaking them into smaller chunks, preserving context and coherence throughout the process.

While the context window is impressive on paper, it has sparked some skepticism within the AI community. Critics have pointed out that no model has been trained on prompts exceeding 256K tokens, raising questions about the real-world quality of outputs at such extreme lengths. Some experts argue that while the declared capacity is technically feasible, outputs beyond certain thresholds may suffer from reduced accuracy and relevance.

For enterprises, Llama 4’s extended context window, if proven effective, has the potential to be transformational. Businesses can leverage this capability to process large-scale datasets in one go - whether it’s synthesising insights across thousands of pages of reports or analysing entire codebases for software development. Moreover, Scout’s ability to maintain coherence across massive inputs could enhance customer service through long-term conversational agents that retain user history and preferences over time.

By adopting Llama 4 early, organisations can reduce infrastructure complexity by minimising reliance on retrieval-augmented generation (RAG) pipelines while benefiting from improved efficiency and personalisation. This technology positions businesses to stay ahead in an increasingly data-driven economy, unlocking new levels of productivity and innovation.

5. Consumer Copilot streaks ahead

As part of its 50th anniversary celebration, Microsoft has unveiled significant updates to its consumer-focused Copilot, positioning it as a truly personal AI companion. These enhancements aim to redefine the way users interact with technology, emphasising personalisation, proactive assistance, and seamless integration into daily life.

One of the standout updates is Memory and Personalisation, which allows Copilot to remember user-specific details such as preferences, important dates, and interests. This feature builds a dynamic user profile over time, enabling tailored solutions and proactive suggestions while maintaining strict privacy controls. Users can manage what Copilot remembers through a dedicated dashboard or opt out entirely. Additionally, Microsoft is exploring ways for users to personalise Copilot’s appearance and interaction style, making it a more engaging and unique experience.

An image of the Copilot Deep Research screen with the words: Expert level research.

Another exciting development is Copilot Actions, which empowers users to delegate tasks such as booking reservations or purchasing gifts through simple chat prompts. This functionality integrates with major platforms like Booking.com, OpenTable, and Skyscanner, streamlining everyday tasks. Coupled with Copilot Vision, which brings real-time visual analysis to mobile and Windows devices, users can leverage their phone cameras or desktop environments for interactive experiences like plant care advice or office decoration tips. Other features include Pages for organising scattered thoughts into structured content, NotebookLM-like AI-generated Podcasts for personalised audio learning, and Deep Research tools that simplify complex tasks by synthesising information from multiple sources.

An image of Copilot podcast screen with the words: Turn hours of scrolling into minutes of listening.

You can install Microsoft Copilot (for personal use) from your device’s app store.

While these updates are tailored for consumers, they highlight the rapid pace at which Microsoft is advancing its AI capabilities. For businesses, this raises both opportunities and concerns. On one hand, the innovations in personalisation and task automation could inspire similar advancements in enterprise-focused solutions, potentially enhancing productivity and decision-making. However, there is growing concern that the consumer Copilot is outpacing its business counterpart in terms of features and functionality. Competitors in the enterprise AI space are already offering robust tools that rival Microsoft's business Copilot offerings.

POB’s closing thought(s)

I had a bit of driving to do this week, so I was able to catch up on some podcasts. I quite enjoy Intelligence Squared, and there’s an interesting episode where Reid Hoffman (LinkedIn Founder, investor, AI expert and general serial entrepreneur) talks about his new book ‘Superagency: What Could Possibly Go Right with Our AI Future’. It’s worth a listen.

Last week in EAIW #7 I mentioned that Google CEO Sundar Pichai was soliciting feedback on whether or not Google should implement MCP, the tool connectivity protocol for LLMs. It looks like they made their decision!

In other news, I noticed that one of my favourite tech companies, Cloudflare, acquired Outerbase, an ‘AI-powered platform [where] engineers, researchers, and analysts alike can work with any database - with safety and security guaranteed.’ I’m interested to see what they do with it.

Finally, xAI has announced an API for their Grok 3 language models. It’s not particularly competitively priced, and I am not expecting there to be huge uptake from businesses. They are clearly in the game now though.

Enjoy the rest of your week, and have a great weekend. 👍

Thanks for reading and I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. Remember, if you have any questions around AI at Davies, you can reply to this message to reach me directly or drop a note to the AI mailbox.

If you’re reading this for the first time, you can read previous posts at the Davies AI Substack page.

I have also created a Teams channel to discuss topics mentioned in this post, and AI in general, with your fellow readers, and of course me too. To join, use this link. I’ll also post the things that made my Pocket list, but didn’t make it to the post!

Finally, remember that while I may mention interesting new services in this post, you shouldn’t put business data in any web service or application without ensuring it has been approved for use.

Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.

Enterprise AI Weekly