From vSphere 8 to VCF 9.x Upgrade with No Spare Hosts : The Consolidated Domain Path with DR Strategy Design
From vSphere 8 to VCF 9.x Upgrade with No Spare Hosts : The Consolidated Domain Path with DR Strategy Design.
A practical guide to adopting VMware Cloud Foundation 9.0.2 in a brownfield, two-site environment when dedicated management hardware is not an option.
The challenge
Every customer who considers VCF 9 eventually hits the same question: where do I put the management VMs? The standard guidance calls for a dedicated management domain — a separate cluster of hosts running nothing but SDDC Manager, NSX, vCenter, and the VCF Operations stack. For a greenfield deployment that is straightforward. For a brownfield vSphere 8 environment at capacity, it is a capital expenditure problem that can stall an entire modernisation programme.
The scenario this post addresses is deliberately constrained: a production vSphere 8 environment across two sites, no spare hosts, no NSX deployed, no Layer 2 stretch between sites, and a firm requirement that all management appliances are protected across sites. Broadcom's answer to this constraint is the Consolidated Domain architecture and it is more capable than most customers expect.
Consolidated Domain defined: A single VCF domain where management VMs and workload VMs share the same physical ESXi hosts, separated by vSphere DRS resource pools with reservations. No dedicated management hardware is required. This is a fully supported deployment model in VCF 9.0.2.
Two-site architecture overview
The design below deploys one VCF Instance per site : Instance A at Site A (primary/protected) and Instance B at Site B (recovery). Both instances are registered under a single VCF Fleet, giving unified lifecycle management across all hosts from one VCF Operations pane & VCF Automation.
Management appliance protection tiers
This is the most critical design decision and the one most commonly misunderstood. Not all management VMs can or should be SRM-protected. The tier is determined by the VM's architecture, not by preference.
How it works: vSphere Replication continuously replicates these VMs from Site A to Site B. SRM orchestrates power-on order and applies IP customisation rules at failover Site B uses different subnets and there is no L2 stretch, so every VM receives a new IP via VMware Tools guest customisation. DNS must be updated post-failover. Boot order: VCF Operations powers on first, then Fleet Manager, then logging and network visibility.
How it works: A daily SFTP file-based backup is taken. At Site B failover, a fresh VCF Automation instance is deployed and the latest backup is restored into it. No vSphere Replication is used. This is per Broadcom VCF 9 Validated Solution VCF Automation is not SRM-protected.
vCenter: Cannot be SRM-protected — SRM runs inside vCenter and cannot protect the host it runs in. Site B's vCenter is always independently operational. VAMI file-based backups taken daily to SFTP for local recovery if Site B's vCenter itself fails.
SDDC Manager: Each site has its own instance. Daily SFTP backup. If Site B's SDDC Manager fails, restore from backup after vCenter is confirmed operational — sequential dependency.
NSX Managers ×3: VLAN-only mode is deployed (no overlay transport zone on workload clusters). Each site's NSX Managers serve only their own VCF instance. No cross-site NSX sharing. No TEP VMkernel on any workload host 1500 MTU preserved.
Identity Broker: Stateless federation broker. Identity lives in your Active Directory, not in the broker. Site B's Identity Broker was already pointing at the same AD and continues serving Site B components when Site A fails. After VCF Operations fails over to Site B, re-register VCF Operations with Site B's Identity Broker — a five-minute runbook step.
Key design considerations
DRS resource pools with reservations and limits keep management VMs from competing with production workloads. vSphere HA automatically restarts management VMs on surviving hosts.
VCF 9.0.2 requires a minimum of 3 hosts for the management cluster with FC/VMFS storage down from 4 in VCF 5.x. Two hosts is the minimum for workload domain import.
NSX is mandatory in VCF 9 but VLAN-only mode is fully supported. No TEP VMkernel, no GENEVE encapsulation, no MTU changes on workload clusters.
SRM Network Mappings and IP Customisation Rules handle all VLAN and subnet differences between sites. This is the standard SRM routed DR model, no L2 extension required.
Array-based replication (e.g. Pure Storage SnapMirror) feeds SRM protection groups. NetApp SRA maps datastores via SnapMirror relationships. LUN IDs at Site B do not need to match Site A.
The VCF Installer convergence wizard takes your existing vCenter, ESXi hosts, and NSX as inputs. Your vCenter is not replaced — it becomes the management domain vCenter.
What this design delivers
By the end of the deployment programme, the organisation has a unified VCF 9.0.2 fleet managing both sites from a single VCF Operations console, fully automated lifecycle management for all management appliances, bi-directional SRM protection with tested recovery plans, and centralised log collection via VCF Ops for Logs all without purchasing a single additional physical server.
Recommended approach: Do not run VCF convergence and pre-convergence remediation as parallel tracks. Complete HCI cluster remediation, DVS upgrades, and NSX 4.2.1 installation as Phase 0 and Phase 1 blockers. Once those are clean, the VCF Installer convergence is typically a single-day event per site.
Comments
Post a Comment