Everything That's New in
VMware Cloud Foundation 9.1
Broadcom called 9.0 the architectural reset. They're calling 9.1 the optimization layer. After spending time with the release notes, the official blogs, and the hands-on labs, here's my take on what actually matters — and what's just marketing fluff.
I'll be honest — VCF 9.1 isn't a version that'll blow you away with brand-new capabilities. It's a version that makes everything you already have work a lot better. Think of it as Broadcom tightening every bolt, doubling every ceiling, and removing a handful of the most painful operational friction points. And honestly? That's exactly what the platform needed right now.
Compute Gets Bigger, Faster, and Smarter
Let's start with the number that will matter most to large enterprises: VCF 9.1 now supports up to 5,000 ESX hosts per environment — double what was possible before. For most organizations, that's a theoretical ceiling they'll never hit. But for large service providers and government clouds running massive shared platforms, this matters a great deal.
Equally important is what Broadcom did to lifecycle management. You can now run 256 simultaneous cluster upgrades instead of 64. That's a 4× improvement. If you're managing hundreds of workload domains, the difference between a 48-hour upgrade window and a 12-hour one is enormous — both for operations teams and for the business units depending on that infrastructure.
NVMe Memory Tiering — The Real Deal
This is the feature I'd push the most if I were advising a team running memory-hungry workloads. The idea is simple: hot memory pages stay in fast DRAM, colder pages spill over to NVMe. From the application's perspective, nothing changes — it still sees a single contiguous memory address space. But the effective memory per host goes up significantly, which means you can consolidate more VMs without buying more DIMMs. Broadcom's own estimates put the TCO savings at around 40%. Even if the real number is half that in production, it's worth evaluating seriously.
vMotion Encryption Gets a Hardware Boost
Encrypting live migrations has always made teams nervous because of the CPU overhead. VCF 9.1 addresses this by offloading the AES encryption work to Intel QuickAssist Technology (QAT) co-processors when available. The result, per Broadcom testing, is roughly 70% lower CPU utilization during vMotions. That means your hosts return to steady state faster after a migration and your VMs barely notice they moved.
Intel QAT offload requires hosts with compatible QAT hardware. Check your server BOM before expecting the full 70% savings. On older hosts without QAT, encrypted vMotion still works — it just runs on the CPU as before.
Elastic Provisioning — Goodbye Manual Imaging
Anyone who's ever had to image a rack of new hosts knows the pain: boot media, PXE configs, waiting, checking, waiting some more. Elastic Provisioning in 9.1 automates the entire process — parallel imaging, automatic discovery, and consistent golden image application. Add a host to the fabric and VCF takes care of the rest. It's the kind of quality-of-life improvement that won't show up in any benchmark but will save your operations team hours every single quarter.
5,000-Host Ceiling
Double the previous limit. Large service providers and government clouds now have room to grow without splitting management domains.
4× Parallel Upgrades
256 simultaneous cluster upgrades versus 64 before. A maintenance window that took a weekend can now happen overnight.
Smarter Memory
NVMe memory tiering extends effective DRAM capacity transparently. Higher VM density, lower hardware spend.
Ubuntu, Finally Native
Canonical Ubuntu 24.04 LTS images auto-sync into VCF Automation. No more hunting for images or manual uploads. Included in the base license.
Storage Costs Less, Recovers Faster
Storage is where VCF 9.1 arguably delivers its most tangible financial impact. Between global deduplication, enhanced compression, and NVMe tiering working together, Broadcom is claiming up to 39% lower storage TCO and up to 42% reduction in server costs for customers running vSAN ESA. I'd caveat those numbers with "your mileage will vary depending on data type" — dedupe ratios on encrypted or already-compressed data will be lower. But on general-purpose VM workloads, these are realistic targets.
Global Deduplification — Actually Global This Time
Previous versions of vSAN deduplicated within a disk group. Global dedup in 9.1 works across the entire cluster, meaning identical blocks on different hosts get eliminated too. It runs continuously in the background and — this is the detail that matters for regulated environments — it works on encrypted datastores. You don't have to choose between security and storage efficiency anymore.
Native Object Storage — Worth Watching
This one is marked tech preview, so temper expectations accordingly. But the concept is genuinely interesting: an S3-compatible object storage endpoint served natively from vSAN, with no external platform required. Developers who need object storage for their apps can consume it from the same infrastructure their VMs run on. When it hits GA, this could meaningfully simplify a lot of architectures.
Native Object Storage is tech preview in 9.1.x. Don't put tier-1 production workloads on it yet. Use it for testing, dev/test environments, and to start planning your architecture. Expect GA in a future 9.x release.
vSAN for Recovery — DR and Ransomware in One
Recovery got a major boost. vSAN now supports deep snapshot chains with integrated replication that serves both disaster recovery and ransomware recovery workflows. The combination with CrowdStrike EDR (more on that in the security section) means you can recover into a clean-room environment, scan the workload before it touches production networks, and only return it once it's confirmed clean. That's a genuinely mature recovery story.
Networking Gets Simpler and Opens Up
Networking in VCF has historically been the area where complexity creeps in. NSX is powerful but not simple, and requiring dedicated Edge Clusters just to route between VPCs and the physical network has been a pain point for a while. VCF 9.1 directly addresses this with the Distributed Transit Gateway — and it also does something I didn't expect: opens up NSX to peer with third-party physical fabrics via EVPN.
The Distributed Transit Gateway Changes the Game
Previously, every time you wanted to connect NSX VPCs to the physical network, you needed a dedicated Edge Cluster — extra VMs, extra overhead, extra complexity. The DTGW removes this requirement entirely. Each ESXi host connects directly to the switch fabric using just a VLAN ID. No Edge VMs sitting in the middle. The result is lower latency, simpler architecture diagrams, and — for legacy environments — the ability to bridge existing VLANs into NSX VPCs without a complex migration project.
EVPN Interoperability — NSX Plays Nice with the World
This is a meaningful shift in positioning. For years, NSX required you to commit fully to the NSX underlay story. VCF 9.1 adds EVPN-based BGP peering with Arista Networks, Cisco, and SONiC. If your physical fabric is Arista and you've been hesitant to invest in VCF because of networking concerns, that objection is now significantly weaker.
vDefend micro-segmentation rules are automatically applied when workloads are deployed. New VMs and containers don't start with open network access — they start with deny-by-default and get rules applied from the platform. This is the right default posture, and it's great to see it become automatic rather than opt-in.
Kubernetes at the Scale Enterprises Actually Need
"64% of platform engineers identify Kubernetes as a primary focus area for achieving automated,
reliable, and standardized application deployment."
— 2025 State of Platform Engineering Report
The VKS story in 9.1 is fundamentally about scale and operational velocity. The previous Supervisor architecture had scaling bottlenecks that started showing up around 50–100 clusters. Broadcom has re-engineered the control plane from scratch, and the result is a 500-cluster-per-Supervisor ceiling that they're quite confident about. More importantly, provisioning a new cluster drops from 37 minutes to 11 minutes — and upgrade time drops from nearly 7 hours to under 2 hours. At 50 clusters, that math is significant. At 500, it's transformational.
Container-as-a-Service — Kubernetes Without the Kubernetes
Not every team needs a full Kubernetes cluster. Sometimes you just want to run a container. VCF 9.1's new Container Service executes containers directly on ESX — no cluster overhead, no YAML, no kubectl. The entire lifecycle (deploy, monitor, upgrade, delete) is handled through the VCF Automation UI. And when your app grows beyond what CaaS can offer, the platform generates the VKS YAML for you automatically. It's a genuinely sensible on-ramp.
Multi-NIC Nodes for Serious Workloads
Worker nodes can now launch with multiple vNICs, each carrying a different traffic type. Application traffic, storage I/O, and management packets are isolated at the node level. For financial services workloads where latency variance matters, or for AI training jobs that saturate storage bandwidth, this separation is more than cosmetic — it prevents noisy-neighbor issues that are hard to debug in a shared environment.
If you manage 100 clusters and each upgrade previously took 6.9 hours, that's 690 hours of upgrade windows per cycle. At 1.7 hours each, that drops to 170 hours. That's 520 hours — roughly 22 days — given back to your team every single release cycle. That number gets bigger as your cluster count grows toward 500.
Automation That Actually Closes the Loop
Broadcom ran a customer survey in March 2026 asking organizations using VCF Automation how 9.0 changed their workflows. The responses were striking: 49% reduction in time from request to ready-to-use environment, and 49% reduction in manual effort across the application lifecycle. VCF 9.1 pushes that further with some genuinely clever ideas.
App Stack Formation — The Feature I'm Most Excited About
I'll be blunt: environment drift between Dev, Test, and Production is one of the most persistent sources of "but it worked on my machine" problems in enterprise IT. App Stack Formation directly attacks this. You select a running group of VMs — along with their network topology, storage config, and OS settings — and VCF captures it as a portable OVF/OVA bundle. Boot-order sequencing is preserved. The bundle goes into the catalog. Any tenant can spin up an identical environment in minutes with zero manual configuration. That's genuinely powerful, and it's the kind of feature that tends to get quietly adopted everywhere once people realize it exists.
Cost Visibility — Finally Closing the FinOps Loop
Private cloud has always had a blind spot here. Public cloud bills you per-resource with brutal granularity. Private cloud historically showed you a vCenter performance chart and left you to figure out who was consuming what. VCF 9.1 adds real cost visibility at the Org, Project, and Namespace level, with upfront pricing estimates before provisioning, email-based billing alerts, and downloadable cost reports. It's not perfect chargeback — but it's a serious step in the right direction.
Linked-Mode uses delta-disk linked clones: the VM powers on instantly while the full disk sync happens asynchronously in the background. Best for short-lived dev/test VMs. Direct-Mode parallelizes full-disk provisioning — no linked clone complexity — and is better for production workloads where full disk integrity matters from the start.
Security That's Built In, Not Bolted On
The most interesting thing Broadcom has done with security in 9.1 isn't any single feature — it's the philosophy. Rather than shipping a list of security tools you can enable, they've designed the platform so that the default posture is secure. Zero Trust networking from Day 1. Automated compliance enforcement. Recovery workflows that assume breach rather than hoping it won't happen. It's a mature security mindset, and it shows.
Advanced Cyber Compliance — Audit Season Gets Easier
The ACC capability in 9.1 continuously assesses your environment against baselines like PCI DSS and Broadcom's own VCF security guidelines. When drift is detected — a config change that violates a baseline, a new VM without the right security tag — it auto-remediates. This shifts compliance from a quarterly scramble to an always-on background process. For anyone who's spent three weeks before an audit manually running scripts to check configuration drift, this is a genuine quality-of-life improvement.
CrowdStrike Integration — The Recovery Story I Want to Tell
This is the detail that should make security architects pay attention. When ransomware triggers a recovery event, VCF doesn't just restore a snapshot and call it done. The recovered workload boots into an isolated clean-room network. CrowdStrike EDR scans it. Only after a clean verdict does VCF return the workload to production. This substantially reduces the risk of re-infection — which is one of the most common failure modes in ransomware recovery scenarios.
Live patching for critical security updates means your compliance team can stop asking for emergency maintenance windows every time a CVE drops. Patches apply without disrupting running workloads. This one change alone could meaningfully reduce the friction between security teams and operations teams at a lot of organizations.
My Honest Verdict on VCF 9.1
I came into this review expecting a polished but incremental release, and that's largely what I found — with a few genuine surprises. The Distributed Transit Gateway is more impactful than the release notes suggest; removing the Edge Cluster requirement simplifies a lot of architectures that were previously awkward. App Stack Formation is the kind of feature that solves a problem nobody has figured out a clean solution to, and I expect it to get adopted broadly and quietly. And the CrowdStrike recovery integration demonstrates a maturity in thinking about security that goes beyond checkbox compliance.
The numbers are real. 5,000-host scale, 500 Kubernetes clusters per Supervisor, 70% faster provisioning, 39% storage TCO reduction, 70% CPU savings on encrypted vMotion. These aren't theoretical — they're measured against specific workloads and configurations. Your numbers in production will vary, but the direction is consistently positive.
VCF 9.1 is the release that makes VCF 9.0's architecture deliver on its promises. If 9.0 was the foundation, 9.1 is the building you can actually move into.
For organizations already on VCF 9.0, the upgrade is straightforward and the gains are real. For organizations evaluating VCF for the first time, 9.1 is the strongest version of Broadcom's private cloud story to date. The remaining friction points — primarily around the complexity of initial deployment and the licensing model — haven't changed. But the platform capabilities have taken a meaningful step forward.
| Pillar | What Changed | Key Metric | My Take |
|---|---|---|---|
| 🖥️ Compute | 5,000 hosts, NVMe tiering, 256 parallel LCM, Intel QAT vMotion, Ubuntu native | 40% TCO down · 70% CPU saved | Solid across the board |
| 🗄️ Storage | Global dedup/compression, native S3 (preview), vSAN recovery, software mirroring | 39% storage TCO down | S3 preview is one to watch |
| 🌐 Networking | DTGW removes Edge Clusters, EVPN interop with Arista/Cisco/SONiC, self-service VPCs | No Edge Cluster needed | DTGW is underrated |
| ☸️ VKS | 500 clusters, 70% faster provisioning, multi-NIC nodes, CaaS, K8s 1.35 | 37m → 11m deploy | Control plane rebuild pays off |
| ⚙️ Automation | App Stack Formation, Fast Deploy, cost visibility, Canonical Ubuntu, Terraform | 49% faster provisioning | App Stack is the standout |
| 🛡️ Security | ACC continuous compliance, CrowdStrike EDR recovery, live patching, clean room | Always-on compliance | Recovery story is mature |
This post is based on official Broadcom/VMware documentation and blogs: VMware Cloud Foundation Blog · VCF 9.1 Solution Brief · VCF 9.1 Release Notes (Broadcom TechDocs) · VMware Hands-on Labs
Comments
Post a Comment