Enterprise AI Weekly #26

Claude gains a 1m token context window, billionaires beef, AWS reduces hallucinations, AI is overtaking traditional search and Gemini finally gets memory

Aug 15, 2025

Welcome to Enterprise AI Weekly #26

You’re reading the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.

Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to businesses of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.

We’re also working on something together. I’m building an app, Boring Expenses, in a Vibe Coding style, to demonstrate the process and to give us a test bed for technologies we talk about in future issues. I previously mentioned that I set aside a bit of time in my week for keeping up with tech and doing this sort of thing - usually a Sunday morning - so I’m setting aside an hour each week to progress our experiment.

If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page. Enterprise AI Weekly is now available for anyone to sign up at https://enterpriseaiweekly.com! Please share the link and encourage others who might find it interesting to sign up.

Last week we talked about the much anticipated release of GPT-5. The announcement was met with a mixed bag of reactions, starting with a bit of disappointment from many in the AI community and beyond. While the hype had been building for months, the actual announcement felt somewhat muted, with a livestream that some found underwhelming and the absence of flashy, breakthrough features. The excitement quickly gave way to a more critical tone as users and developers digested the news. Much of the early chatter centred around the decision to deprecate older ChatGPT models within the app, a move that sparked controversy (and forced OpenAI to reinstate GPT-4o). Many users felt caught off guard as familiar versions were removed, raising concerns about flexibility, usability, and cost implications. Sam Altman, CEO of OpenAI, reflected on the launch day challenges in his post on X.

Despite the initial letdown and debates, the broader appreciation of GPT-5’s capabilities soon took hold. Its ability to engage in deeper, on-demand reasoning and deliver more reliable, thoughtful responses without slowing down impressed many who took the time to explore the upgrade thoroughly. The new controls for verbosity and minimal reasoning proved useful for tailoring the AI’s behaviour, balancing efficiency with quality. However, the change also highlighted how disruptive AI progress can be, especially when familiar tools are suddenly phased out, affecting business workflows and personal usage alike. I guess we now start wondering when GPT-6 will arrive… 😀

You’ll notice that the newsletter doesn’t open with “Vibe with POB” any more. Don’t panic! I’ve broken it out separately, to avoid overwhelming the main newsletter. Enjoy EAIW #26!

1. Is context (window) King?

Anthropic has just announced a significant advancement in the capability of one of their Large Language Models: Claude Sonnet 4 can now process up to 1 million tokens of context on the Anthropic API. That’s roughly the size of a full complex codebase, the complete Harry Potter series, or dozens of lengthy research papers in a single prompt. This fivefold increase smashes through the previous 200,000 token ceiling, enabling whole-project analysis, mass document synthesis, and agents that maintain coherence over hundreds of interactions. API customers at Tier 4 or custom enterprise rates are the first to get access, with Amazon Bedrock already supporting the feature and Google Cloud’s Vertex AI soon to follow.

The expanded context window isn’t just about bigger prompts - it’s about transformative workflows. For developers, it means no more slicing and dicing codebases or contract bundles; entire systems can be reviewed at once, making cross-file dependencies and architecture-level recommendations possible in real time. Bolt.new (our current Vibe Coding platform) confirms that their code generation accuracy has improved on production workloads, and agentic use cases (which I’ve covered previously when discussing agentic coding and AI agent orchestration) are now supercharged. Context-aware agents can conduct uninterrupted, multi-day engineering sessions, synthesising vast tool documentation, workflow histories, and previously fragmented knowledge with ease.

This breakthrough puts Anthropic toe-to-toe with other AI giants. Google’s Gemini 2.5 Pro and Gemini 2.5 Flash already offer 1M+ context windows, and OpenAI has signalled million-token support in recent releases. However, early benchmarks show Claude Sonnet 4 outpaces Gemini on both response speed and reduction of hallucinations at scale. Pricing is competitive: prompts under 200,000 tokens cost $3 per million input tokens, rising to $6 for the full 1M window, with output tokens billed at $15 and $22.50 per million respectively. Enterprise customers can slash costs further with batch processing and prompt caching.

Significantly, Claude Sonnet 4’s performance on the “needle in a haystack” test - designed to assess whether a model can accurately find rare, specific information buried within up to 1 million tokens of irrelevant data - is nothing short of exceptional. Anthropic reports an internal score of 100 percent on this evaluation, which is highly impressive given the test’s real-world relevance for enterprise workflows, such as locating key clauses in voluminous contracts or identifying specific code dependencies across enormous repositories. This marks a substantial advance over earlier generations, where models typically showed significant performance drop-off as context length increased.

Compared to leading rivals, Claude Sonnet 4’s needle-in-a-haystack results put it firmly in the top tier for long-context reasoning. Models like OpenAI’s GPT-4.1 and Google’s Gemini 2.5 Pro also tout million-token context windows, but recent benchmarks highlight degradation at ultra-long contexts for many competitors, with accuracy often dipping below optimal levels as input size balloons. Meta’s Llama 4 Maverick supports the same or larger context windows, yet the community has noted challenges with inference speed and precise retrieval when it comes to deeply buried information. In contrast, Claude Sonnet 4 maintains both speed and fidelity, consistently surfacing the target data in a vast sea of tokens without losing coherence or reliability.

This update is a key advancement for anyone managing vast data estates. Large context windows mean we can ingest, review, and automate analysis across full codebases, contract sets, and technical repositories in one shot, reducing both operational overhead and cognitive bottlenecks. It directly supports ambitions for agentic AI, allowing us to orchestrate highly capable, context-aware agents for compliance, engineering, and knowledge management. As these capabilities roll out across Anthropic, Google, and OpenAI platforms, the pressure is on to rethink how we structure, process, and secure our largest datasets - while unlocking true breadth and depth in enterprise AI applications.

2. The Altman / Musk beef continues

This week, the rivalry between Elon Musk and Sam Altman hit fresh heights on social media and beyond, putting the tech world’s most tempestuous duo back in the headlines. It all flared up after Musk took to his platform, X, at around 1 AM, to accuse Apple of “unequivocal antitrust violation” over its alleged favouritism for OpenAI’s ChatGPT in the App Store. Musk claimed Apple was making it impossible for any AI company other than OpenAI to reach number one - a move he said warranted immediate legal action. Enter Sam Altman, who responded within hours, calling the accusation “remarkable”, and cheekily referencing media reports that Musk himself had tweaked X’s algorithms to boost his own posts and diminish the reach of his competition.

To add a dash of digital drama, both ChatGPT and Grok (the AI chatbots championed by Altman and Musk, respectively) weighed in. In a twist, when users asked Grok “Who’s right, Sam Altman or Elon Musk?”, the bot backed Altman, citing evidence of Musk’s history of manipulating X’s ranking algorithms to his benefit. ChatGPT, meanwhile, sided with Musk when asked who’s more trustworthy, and Musk gleefully publicised the result to his many followers. Naturally, Musk was none too pleased to see his own bot back his rival - calling Grok’s response “defamatory” and promising to “fix” it. Altman’s response was breezy: suggesting Musk sign an affidavit swearing he’d never tampered with X’s algorithms. If this feels more soap opera than Silicon Valley, you’re not wrong.

Why don’t Musk and Altman get along? The source of the tension runs deep. Musk and Altman famously co-founded OpenAI in 2015, joined by other tech luminaries, with a mission to make artificial intelligence benefit humanity. OpenAI began as a nonprofit with a focus on research and safety. The pair initially appeared aligned, united by a desire to counteract what they saw as profit-driven approaches from giants like Google. But by 2018, the cracks showed. Musk wanted to run OpenAI himself after disagreements with the team, particularly Altman; when rebuffed, he pulled both his financial backing and his seat on the board. Notably, Musk then launched his own AI company, xAI, and has remained publicly critical of OpenAI’s subsequent commercial successes. Since then, their relationship has oscillated between professional rivalry and bitter public sparring. Musk has accused OpenAI -and Altman - of abandoning their founding principles, especially as OpenAI pivoted towards a “capped-profit” model and received significant investment from Microsoft. Meanwhile, Altman has made light of Musk’s complaints, poking fun at his rhetoric and even joking on X that OpenAI would buy Twitter if Musk fancied a swap.

For enterprise leaders, this spat is more than celebrity gossip. With OpenAI deeply embedded in Microsoft platforms - think Copilot, Azure and data analytics - Musk’s challenge to antitrust practices at Apple and elsewhere signals potential scrutiny and disruption for firms investing in AI solutions. More broadly, the rivalry underscores the importance of governance and transparency in AI. The accusations of algorithmic manipulation, commercial bias and legal drama illuminate why enterprises need robust, ethical frameworks for AI deployment. As these two tech titans battle it out, the rest of us should keep a close eye on the fallout, ensuring our own strategies benefit from the competition, not just the theatrics.

3. AWS: Fewer AI hallucinations and 99% verified accuracy

AWS has recently launched Automated Reasoning checks as a new feature within its Amazon Bedrock Guardrails, aiming to significantly reduce AI hallucinations by validating AI-generated content against established domain knowledge with up to 99% verification accuracy. Unlike traditional probabilistic approaches that assign likelihoods to outcomes, this technology employs mathematical logic and formal verification techniques to provide concrete assurance that the AI’s responses are accurate and align with strict rules encoded from domain-specific policies. This breakthrough capability comes with support for processing very large documents - up to 80,000 tokens, or approximately 100 pages - enhancing its applicability to complex, text-heavy environments.

The setup process involves translating natural language policies into formal logic through variables, rules, and types that capture the essential concepts and constraints of a given domain. Users can create, test, and refine these policies within the Amazon Bedrock console, benefiting from features like automated test scenario generation and policy validation tools to ensure the AI outputs stay consistent with the defined rules. This modular system integrates smoothly with diverse foundation models, including those from third parties like OpenAI and Google Gemini, and can be combined with other safeguard mechanisms such as content filtering and contextual grounding checks.

A striking example of its practical value is seen in utility outage management, where timely and accurate responses are critical. Collaborating with PwC, AWS has demonstrated how Automated Reasoning checks can enforce compliance with regulatory protocols, validate real-time operational plans, and structure workflows with clearly defined targets. This not only enhances operational efficiency but also boosts the reliability and legal compliance of AI-assisted decision-making in regulated industries. For enterprises, deploying this technology offers a new standard of trust and control over AI systems, particularly essential in sectors where accuracy is non-negotiable.

This development is highly relevant to enterprises as it addresses a key challenge in AI adoption: ensuring that AI outputs are trustworthy and verifiable, thereby reducing risk while enabling automation at scale. With AWS offering this technology as part of their managed services in multiple regions, it positions us to enhance our AI systems’ governance and reliability, directly supporting more informed, compliant, and confident use of AI across our operations and client engagements. If we continue to follow AWS developments closely, incorporating such advanced verification tools could substantially improve how we deploy AI in critical, rule-heavy contexts where accuracy is paramount.

4. Will AI search overtake traditional search in 31 months?

Recent data from Siege Media indicates a significant shift is underway in how online traffic is sourced, highlighting AI-driven referral traffic - particularly from ChatGPT - as an emerging force to rival traditional organic search.

Over a two-month period ending June 2025, ChatGPT referral traffic surged by an impressive 25.6%, significantly outpacing the 5.2% growth seen in organic search traffic. While AI-driven traffic still accounts for a modest share, roughly 0.5% of organic search volume, its rapid growth trajectory suggests it is more than a fleeting trend. Projections indicate that, if current rates persist, ChatGPT referral traffic could surpass organic search in just 31 months, positioning AI as a primary channel for digital discovery much sooner than many marketers anticipate.

This shift challenges the established SEO-centric marketing paradigm and suggests a hybrid strategy is essential for sustained success. Traditional SEO remains invaluable, delivering the lion’s share of traffic and retaining a foundational role in digital marketing strategies. However, with the rise of conversational AI, businesses must adapt by tailoring content not only for search result listings but also for direct answers within AI conversations. Crafting clear, authoritative content that responds naturally to user queries - and embedding schema markup to enhance AI understanding - will be key to capturing AI-driven traffic. This dual approach ensures brands remain visible across evolving digital touchpoints while capitalising on the increasing adoption of AI as a discovery tool.

Moreover, the data-driven insights themselves present valuable opportunities beyond traffic acquisition. Using original research to inform digital PR and content marketing initiatives can elevate brand authority and attract high-quality backlinks and media attention. Siege Media’s approach exemplifies how turning quantitative data into compelling narratives can strengthen a business’s position in competitive markets. For enterprise organisations, staying ahead of this evolution means not only embracing AI as a referral medium but also leveraging data strategically to enhance brand reputation and market influence.

The rapid adoption of AI referral traffic poses both a challenge and an opportunity for enterprises. We need to evolve our content strategies to maintain visibility within AI-driven discovery while continuing to invest in traditional SEO. Embracing this hybrid approach will ensure our brand remains prominent in search and AI ecosystems, driving sustained, diversified traffic growth and reinforcing our industry leadership as conversational AI reshapes digital engagement. Additionally, leveraging data as a digital PR asset aligns with our strategic priorities to build authoritative, insight-driven narratives that enhance our corporate reputation in an increasingly data-conscious market.

5. Google (finally) adds memory to Gemini

Google's recent update to its Gemini app introduces foundational personalisation features, such as “personal context”, which allow the AI to learn from past conversations to provide more tailored and relevant responses over time. This enhancement means Gemini can remember user preferences and build on previous chats, making interactions feel more like a seamless collaboration with an assistant that is familiar with the user’s interests. New privacy controls, including Temporary Chats that do not influence future conversations or training models, accompany these features, giving users a greater say in their data usage.

A side-by-side comparison of the Gemini "Personal context" settings on a mobile phone and a desktop browser, showing customization options.

However, this move towards memory and personalisation arrives noticeably later than some competitors who have already integrated persistent memory capabilities into their large language models (LLMs). Many rival AI assistants have been building on memory features for some time, recognising that such capabilities are critical for deeper, more nuanced, and context-aware interactions. The delay in Gemini's rollout of these functions may impact user experience and adoption in the short term, as personalised AI assistance increasingly becomes a baseline expectation rather than a premium feature.

The importance of memory in the evolution of LLMs cannot be overstated. It enables AI systems to go beyond one-off task execution to truly adaptive, ongoing dialogue that can personalise workflows, anticipate user needs, and provide continuity across sessions. For enterprise businesses, this evolution suggests a future where AI acts not just as a tool but as an insightful partner in decision-making, creative processes, and customer engagement. As memory features mature, the next frontier will likely involve more sophisticated understanding of user context, preferences, and business environments - delivering proactive, tailored insights while maintaining rigorous privacy controls.

For enterprises, embracing AI platforms that integrate robust memory capabilities will be vital to harnessing the full potential of AI assistants. These tools will transform routine interactions into strategic advantage, enhancing productivity, innovation, and customer experience. Monitoring how Google’s Gemini progresses with these personalisation features - and how its approach to privacy balances trust and utility - will be key to selecting the most effective AI solutions for our enterprise in the coming years.

POB’s closing thoughts

As always, a couple of other topics caught my eye this week. This article at VentureBeat talks about “How a vibe working approach at Genspark tripled ARR growth and supported a barrage of new products and features in just weeks”. Interesting, and the Genspark product itself looks like it’s worth checking out too.

We’ve talked a fair bit in this newsletter about Mistral, the French AI startup that has some very capable Open Source models. I like them, I’m surprised they don’t get more airtime, and their Le Chat app is actually very good. Reuters is reporting this week that the company is looking to raise $1bn at a value of $10bn. AI is expensive, right?

Finally, we talked a few weeks back about the growing ecosystem of “Office 365 adjacent” AI companies. This week I came across Endex, who offer “An Excel-native AI Agent that accelerates financial modeling and data analysis”, and are backed with $14m of investment from OpenAI. One to check out, there’s a waitlist currently, but it looks super interesting.

Thanks for reading, I hope you have a great weekend! 👍

I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. What AI topics are you most interested in for future explainers? Are there any specific AI tools or developments you'd like to see covered? Remember, if you have any questions around this Substack, AI or how Davies can help your business, you can reply to this message to reach me directly.

Finally, remember that while I may mention interesting new services in this post, you shouldn’t upload or enter business data into any external web service or application without ensuring it has been explicitly approved for use.

Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.

Enterprise AI Weekly