Enterprise AI Weekly #33
Building a LLM in Minecraft, model updates from Claude, Deepseek and Gemini, the Comet AI browser is available, Vibe working arrives, Accenture retrain staff, OpenAI plot GDPvalue and workslop is here
Welcome to Enterprise AI Weekly #33
You’re reading the Enterprise AI Weekly Substack, published by me, Paul O'Brien, Group Chief AI Officer and Global Solutions CTO at Davies.
Enterprise AI Weekly is a short-ish, accessible read, covering AI topics relevant to businesses of all sizes. It aims to be an AI explainer, a route into goings-on in AI in the world at large, and a way to understand the potential impacts of those developments on your business.
We’re also working on something together. I’m building an app, Boring Expenses, in a Vibe Coding style, to demonstrate the process and to provide a test bed for technologies we talk about in future issues. I previously mentioned that I set aside a bit of time in my week for keeping up with tech and doing this sort of thing - usually a Sunday morning - so I’m setting aside an hour each week to progress our experiment.
If you’re reading this for the first time, you can read previous posts at the Enterprise AI Weekly Substack page. Enterprise AI Weekly is now available for anyone to sign up at https://enterpriseaiweekly.com! Please share the link and encourage others who might find it interesting to sign up.
That’s crafty!
I started last week’s issue with a fun little item, and here’s another. Someone has recreated a miniature ChatGPT-style language model entirely inside Minecraft, using only Redstone circuits. The build has over 5 million parameters, trained on simple English conversations, with six layers and a vocabulary of 1,920 tokens. It’s vast - spanning 1,020×260×1,656 blocks - and produces responses, though at a glacial pace of about two hours per answer unless tick rates are accelerated.
With a context window size of sixty-four tokens, the model supports very brief dialogues. The project’s creation relied on external mods like the Distant Horizons mod for visualisation and the Minecraft High Performance Redstone Server to boost computation speed. Accompanying materials include world download links and detailed documentation for enthusiasts eager to explore or replicate the build. This unusual convergence of gaming mechanics and AI computation challenges traditional ideas about AI hardware and provides a fascinating educational tool for understanding neural network operation from first principles.
On to the rest of the news. Enjoy EAIW #33!
1.0 Anthropic launches Claude 4.5 Sonnet
Anthropic has officially launched Claude 4.5 Sonnet, touting it as the most advanced iteration in their Claude family aimed at powering legons of autonomous AI agents and excelling in coding tasks. This release combines extended autonomous operation - up to 30 hours! - with a suite of developer-focused features such as access to virtual machines, memory management, and durable multi-agent support, giving enterprise teams the ingredients to construct powerful, persistent AI agents. However, the price remains notably higher than many peer models, which may influence adoption strategy for cost-sensitive organisations.
Claude 4.5 Sonnet is engineered to outpace its predecessors in several areas. Benchmarks reveal significant improvements in navigation of software environments, browser automation, and prolonged multi-step workflow management. The model also brings state-of-the-art coding abilities, achieving a leading 77.2% on SWE-bench Verified and dominating OSWorld results for AI computer use. Anthropic has introduced advanced debugging tools, context editing, improved memory handling, and a native Visual Studio Code extension - all driven by enterprise customer feedback. Crucially, the 1M-token context window and new checkpoint function mean agents can reliably operate across far larger datasets and more complex workflows than before.
Independent benchmarking by Artificial Analysis places Claude 4.5 Sonnet as the #4 most intelligent model on the Artificial Analysis Intelligence Index in both non-reasoning (scoring 49) and reasoning (scoring 61) configurations. The model makes impressive domain-specific improvements, especially in agentic tool use, telecom, and graduate-level reasoning. Notably, reasoning mode brings significant intelligence gains over older Claude models. Despite these advancements, Artificial Analysis notes that the model’s output speed is slightly slower than average, though its 1.68-second first token latency and immense context window are industry-leading. Importantly, Sonnet 4.5 maintains the same pricing as its predecessor - $3 per million input tokens and $15 per million output tokens - but remains more expensive than many rivals, including Gemini 2.5 Pro and Grok 4 Fast.
For enterprise teams, the arrival of Claude 4.5 Sonnet signals a step up in building and operating agentic AI workloads, particularly in domains demanding deep reasoning, complex workflows, and reliable coding at scale. The expanded context, task management, and enhanced code tooling can unlock new automation and research use cases that were previously infeasible. However, the relatively high cost per token must be weighed against the competitive landscape, particularly as other providers drive down prices and offer increasingly comparable reasoning power. As the pressure on budgets continues, decision-makers will need to evaluate whether the performance gains of Claude 4.5 Sonnet justify the premium, or whether a hybrid approach using less expensive models for lighter workloads might maximise value.
1.1. DeepSeek’s V3.2 Experimental model slashes costs
DeepSeek AI’s latest release, V3.2-Exp, has flown in like a budget-friendly breath of fresh air for enterprise LLM deployments, slashing API costs while introducing technological advances that once again look to shape up the sector. Where competitors such as Claude 4.5 Sonnet remain rightly celebrated for innovation, it’s once again cost structure and efficiency where DeepSeek steals the limelight.
Central to DeepSeek’s magic is its adoption of Sparse Attention - a clever reimagining of how transformers process token relationships. Rather than slavishly comparing each token to every other, Sparse Attention picks out pockets of relevance, forming direct connections only where the context truly matters. Imagine a conference call where only the right people are invited to each discussion (crazy, right), making everything faster, cheaper, and much less convoluted. That’s Sparse Attention’s premise. The result is vastly better handling of both long-form and multi-document contexts, delivered with efficiency that would give most CFOs something to smile about.
Pricing for the new model currently stands at $0.07 per million input tokens (using cached prompts) and $0.16 per million output tokens. Even off-cache, input tokens are billed at $0.56/million and output at $0.42/million. For comparison, Anthropic’s Claude 4.5 Sonnet clocks in at $3 per million input tokens and $15 per million output tokens - around 20 times higher for typical workflows. For those planning high-volume summarisation, document review, or automated reporting, these new DeepSeek rates can change the whole economic equation.
DeepSeek’s relentless quest to lower costs and open infrastructural advances is likely to ripple across the AI sector. The company’s model releases routinely inspire both admiration and some nervous arm-crossing from competitors and cloud service providers alike. Technologically, the adoption and open-sourcing of Sparse Attention isn’t just a cost-cutting move - it’s a community offer that may inform future approaches across both open and proprietary models worldwide. As more enterprises see these options as viable, the sector is forced to respond with greater transparency, smarter pricing, and, ideally, more sustainable AI operations worldwide.
With high-volume language models continuing to permeate workflows, cost remains pivotal in scaling deployments. DeepSeek’s leadership in efficient model design and aggressive price points could let enterprise teams trial, pilot or operationalise advanced AI tools without triggering alarm bells from finance.
1.2. Google updates its Gemini 2.5 Flash models
Google has updated its Gemini 2.5 Flash and Flash-Lite models, bringing several worthwhile advancements despite retaining the same version number. Gemini 2.5 Flash-Lite now excels at understanding and following complex instructions, delivering more concise output, and improving multimodal capabilities such as audio transcription, image understanding, and translation. Output token counts have been cut in half, resulting in better cost efficiency and throughput for high-volume tasks like summarisation and real-time transcription. This makes Flash-Lite especially useful for batch processing and cost-sensitive pipelines, as it delivers much faster and less verbose results than before.
Gemini 2.5 Flash, meanwhile, has improved massively in agentic and tool-based workflows, which is crucial for orchestrating multi-step business processes. Notably, the model’s “thinking” capability now exposes intermediate reasoning steps, allowing for more transparent outputs and better multi-stage reasoning. Flash now handles larger context windows - with support for up to 32,768 tokens in some preview variants - and accepts a broad range of media including text, code, images, audio, and video. The agentic upgrades mean a 5% gain on benchmark tasks such as SWE-Bench Verified, and Flash’s structured outputs reduce operational friction in developer and enterprise flows.
Both models now feature a ‘-latest’ alias, so teams can always access the newest version with minimal code changes and proactive two-week update notices. For production workflows, the recommendation remains to stick with the stable model IDs until preview versions are validated. The model suite’s audio capabilities in the preview are also notably improved: higher-quality voice output, better affective dialogue (emotional nuance in responses), and reliable function calling for custom business logic. These upgrades transform Gemini into a better fit for scalable, intelligent enterprise automation, allowing end users to effortlessly multitask between languages, media, and conversational contexts.
In even more exciting news, Gemini 3.0 has started making appearances on LM Arena, identified under the codename “Oceanstone” in some leaderboard rankings, where it’s already outpacing many competitors in preference and capability. Additionally, an A/B test system for Gemini 3.0 Pro and Flash has been rolled out within Google AI Studio. This feature allows randomly selected users to compare model output between Gemini 2.5 and early 3.0 checkpoints, often showing significant improvements in reasoning, code generation, and SVG/image output - all accessible with just a few reruns or clever prompt engineering.
As Gemini 3.0 reaches these external evaluation platforms and quietly enters limited public preview, its advanced capacity for real-time video understanding, 3D object handling, and multi-million-token context windows is closer to general release. Not only does it bring a unified approach to multimodal processing, but early evidence points to enhanced autonomous planning and deeper integration with device and system control. Businesses and developers can anticipate an expanded toolkit for document analysis, creative workflows, and agentic automation.
Gemini’s ongoing progress is hard to ignore, with each release being adopted faster and more widely across the enterprise landscape. Gemini’s robust feature set makes it increasingly appealing for organisations seeking dependable and scalable AI. Adoption figures back this up: nearly half of US enterprises now have Gemini powering parts of their productivity workflows, reporting tangible efficiency gains and reductions in workflow bottlenecks.
2. Perplexity’s Comet browser is now available to everyone
Perplexity AI has globally released its AI-enhanced web browser called Comet, now available for free to all users worldwide. Initially launched in limited release in July 2025 to Perplexity Max subscribers at a price of $200 per month, Comet became highly popular with millions joining the waitlist. The browser features a unique AI assistant integrated into every new tab, designed to act as a personal aide that helps with tasks such as web search, tab management, email drafting, online shopping, and more. Unlike traditional browsers, Comet’s assistant travels with the user, seamlessly blending AI capabilities into the browsing experience to increase productivity and reduce the need to juggle multiple tabs or tools.
The Comet browser is built on Chromium and incorporates Perplexity’s AI search engine as its default search option, providing direct answers with web source links, enhanced by AI-driven summarisation and task automation features. Key highlights include deep integration with Gmail and Google Calendar, multi-LLM (Large Language Model) support including GPT, Claude, Gemini, and others, AI-driven tab organisation, and smart shortcuts for automating multi-step commands by voice or text. The assistant is accessible via a sidebar in each new tab and is designed to navigate websites, perform research, and manage tasks like a virtual personal assistant.
With the recent move to make Comet free, Perplexity aims to attract a larger user base and compete more directly with major players like Google Chrome, which has integrated its own Gemini AI, Anthropic’s AI browser initiatives, and OpenAI’s Operator. Perplexity also offers Comet Plus, a news and content service, and is planning further features including a mobile version and a Background Assistant for asynchronous multitasking. This release marks a step forward in AI-driven browsing by embedding intelligent assistance naturally into the browsing workflow, promising enterprises, and individual users alike a streamlined, AI-powered internet experience.
This development holds relevance for businesses looking to enhance digital workflows and productivity through integrated AI tools, while also presenting competition and innovation pressures to established browser and search engine providers.
While the Comet browser offers compelling AI-driven productivity gains, organisations must carefully weigh privacy implications. The browser’s embedded AI assistant interacts extensively with personal and business data, including browsing activity, emails, calendars, and documents, to deliver tailored assistance. This raises potential risks concerning sensitive data exposure or inadvertent sharing of confidential information with external servers, even if encrypted and processed under privacy policies. Enterprises need to ensure that using Comet aligns with their data governance frameworks and compliance requirements.
Prompt injection attacks and other AI-specific vulnerabilities add further complexity to securing browser-based AI tools. Malicious actors can craft webpage elements or embedded metadata to trick the AI assistant into performing harmful actions or leaking data. While Perplexity and other AI browser developers implement layered protections such as action confirmations and suspicious pattern detection, no defence is foolproof. Consequently, IT and security teams must remain vigilant, enforce strict access permissions, and conduct regular threat assessments to mitigate risks.
In summary, while Comet presents an innovative browsing experience with AI assistance, organisations must carefully manage privacy, security, and compliance considerations to safely integrate such tools within their environments. This ensures that efficiency gains do not come at the cost of data risks or regulatory breaches.
3. Forget Vibe Coding, welcome to Vibe Working!
Microsoft has introduced a new paradigm in productivity called “Vibe Working” with the rollout of Agent Mode and Office Agent features in Microsoft 365 Copilot. Vibe Working aims to transform how people collaborate with AI, making interaction with Office apps more conversational, iterative, and intelligent. The aim is to shift from static task execution to a dynamic, ongoing dialogue where users and AI agents work in sync to craft, analyse and perfect work products like spreadsheets, documents, and presentations.
Agent Mode, debuting in Excel and Word, exemplifies Vibe Working by enabling an AI assistant that “speaks Excel” and Word natively. In Excel, it goes beyond formula generation to perform multi-step reasoning, evaluation, and visualisation - effectively acting as an expert analyst guided by natural language prompts. This democratises the power of Excel, allowing users of varying expertise to produce sophisticated financial models, loan calculators, or budgeting tools without needing deep Excel skills. Meanwhile, Agent Mode in Word transforms document drafting into an interactive, vibe writing process - users engage Copilot as a writing partner, iterating on content and style seamlessly.
Office Agent extends this concept into the chat experience, enabling users to create polished PowerPoint presentations and Word reports purely from conversation. By clarifying intent, conducting deep research, and delivering structured, high-quality output, Office Agent fosters a fluid handoff from chat to traditional Office applications. Together, these capabilities encapsulate Vibe Working - a new pattern of productivity that leverages advanced reasoning models to enhance human-agent collaboration, accelerating workflows while making complexity approachable.
For enterprise organisations, Vibe Working represents an opportunity to unlock AI’s value across everyday tasks, reducing dependence on experts for data analysis or content creation and empowering more employees to contribute efficiently. As these features continue to roll out across Microsoft 365, they affirm a vision of AI as an embedded productivity partner, reshaping the way work gets done, and strengthen Microsoft’s case for issuing Copilot licences within an organisation.
4. Accenture to exit staff unable to retrain for AI era
Global professional services giant Accenture has announced a major workforce restructuring tied to the adoption of artificial intelligence. The company is parting ways with employees who cannot be reskilled for AI-related roles under an $865 million programme focused on upskilling and business optimisation. This move comes as AI becomes central to the company’s operations and client services.
Accenture CEO Julie Sweet explained in a recent earnings call that the firm’s strategy prioritises “reinventors” - staff who can retrain and apply AI skills. Those deemed unable to reskill “on a compressed timeline” face exit from the organisation. Over the last three months, the company has laid off more than 11,000 employees globally, reducing its headcount from 791,000 to 779,000 at the end of August 2025. Despite these reductions, Accenture plans to grow its AI and data professionals who now number 77,000, nearly doubling since 2023. Sweet emphasised that the restructuring aims to create investment capacity through efficiency gains and reinvest in AI talent and the business.
The company has actively retrained some 550,000 employees in generative AI basics as part of its broader strategy to embrace AI across all operations. Financially, this shift has driven a tripling of revenue related to generative AI, reaching $2.7 billion in FY 2025, with bookings nearly doubling to $5.9 billion. Accenture forecasts business optimisation savings exceeding $1 billion, with severance and related charges expected to total over $800 million across two fiscal quarters. Despite a temporary slowdown in growth due to reduced US government contracts, Accenture continues to invest in expanding its headcount in key markets including the US and Europe, reflecting ongoing strong demand for AI expertise.
This clear message from Accenture underlines the accelerating influence of AI in enterprise IT consulting. This development is a reminder of the importance of proactive employee reskilling to thrive in an AI-driven business landscape.
5. OpenAI introduces the GDPval evaluation framework
GDPval is a new evaluation framework introduced by OpenAI designed to measure AI model performance on economically valuable, real-world tasks across forty-four occupations in nine industries that significantly contribute to the US GDP. Unlike traditional academic or artificial benchmarks, GDPval assesses AI capability on tasks that mirror actual professional deliverables, such as legal briefs, engineering blueprints, customer support conversations, or nursing care plans, created by experienced professionals averaging 14 years in their fields. This realism and diversity set GDPval apart as an impactful tool for understanding how AI can assist in real workplace settings.
The evaluation spans 1,320 specialised tasks with a gold open-source subset of 220 tasks. Tasks are graded by human occupational experts who compare model-generated deliverables blindly to those crafted by professionals, ranking them as better, equal, or worse. Early results have shown that leading AI models like GPT-5 and Claude Opus 4.1 approach or match expert quality on a substantial fraction of tasks, delivering results 100 times faster and 100 times cheaper than human experts in these professional roles. Model performance has improved significantly over time, showing clear linear progress with successive versions.
GDPval aims to provide organisations with a clear, evidence-based yardstick for AI adoption by linking model performance directly to outputs that drive economic value. This allows business leaders to move beyond speculative AI discussions to concrete strategies for automation and augmentation of knowledge work. The framework highlights specific occupations and tasks where AI can reliably take over routine work, freeing experts to focus on more judgement-intensive activities and driving productivity and economic growth. In this way, GDPval is a crucial step in making AI a practical tool for business and workforce optimisation.
GDPval’s evidence-based approach helps enterprises quantify the cost savings and efficiency gains achievable through AI, supporting informed investment decisions. By benchmarking AI on actual professional tasks across diverse industries, it offers a trustworthy preview of how AI can integrate into the workplace, improving operational workflows without compromising output quality. For large companies and enterprises navigating AI adoption, GDPval’s insights provide a practical foundation to build phased implementation plans prioritising impactful yet low-risk AI use cases. This aligns well with balanced AI strategies focused on augmenting human expertise while realising measurable ROI.
POB’s closing thoughts
I came across an interesting article from Microsoft this week about establishing an AI Centre of Excellence, as we have at Davies. The article is well worth a read and introduces the AI Center of Excellence (AI CoE) as an internal team within an organisation focused on driving successful AI adoption and outcomes. It provides a solid foundation and governance to prevent fragmented AI initiatives. Establishing an AI CoE involves securing executive sponsorship, appointing a skilled leader, and assembling a multidisciplinary team comprising business leaders, AI experts, data scientists, and governance specialists. The CoE can either integrate into existing teams such as a Cloud Center of Excellence or operate as a standalone team if necessary. Initially, a centralised operating model is recommended to consolidate AI expertise and accelerate adoption, evolving later into an advisory role to support product teams and promote agile implementation.
Let’s wrap up this week with an article from Harvard Business Review, which discusses whether “AI-Generated ‘Workslop’ Is Destroying Productivity”. Are “Employees using AI tools to create low-effort, passable looking work that ends up creating more work for their coworkers”? Discuss! 😀
Thanks for reading, I hope you have a great weekend! 👍
I’d love to hear your feedback on whether you enjoy reading the Substack, find it useful, or if you would like to see something different in a future post. What AI topics are you most interested in for future explainers? Are there any specific AI tools or developments you'd like to see covered? Remember, if you have any questions around this Substack, AI or how Davies can help your business, you can reply to this message to reach me directly.
Finally, remember that while I may mention interesting new services in this post, you shouldn’t upload or enter business data into any external web service or application without ensuring it has been explicitly approved for use.
Disclaimer: The views and opinions expressed in this post are my own and do not necessarily reflect those of my employer.
















