Skip to main content

From vSphere 8 to VCF 9.x Upgrade with No Spare Hosts : The Consolidated Domain Path with DR Strategy Design

Technical Architecture

From vSphere 8 to VCF 9.x Upgrade with No Spare Hosts : The Consolidated Domain Path with DR Strategy Design.

A practical guide to adopting VMware Cloud Foundation 9.0.2 in a brownfield, two-site environment when dedicated management hardware is not an option.


The challenge

Every customer who considers VCF 9 eventually hits the same question: where do I put the management VMs? The standard guidance calls for a dedicated management domain — a separate cluster of hosts running nothing but SDDC Manager, NSX, vCenter, and the VCF Operations stack. For a greenfield deployment that is straightforward. For a brownfield vSphere 8 environment at capacity, it is a capital expenditure problem that can stall an entire modernisation programme.

The scenario this post addresses is deliberately constrained: a production vSphere 8 environment across two sites, no spare hosts, no NSX deployed, no Layer 2 stretch between sites, and a firm requirement that all management appliances are protected across sites. Broadcom's answer to this constraint is the Consolidated Domain architecture and it is more capable than most customers expect.

Consolidated Domain defined: A single VCF domain where management VMs and workload VMs share the same physical ESXi hosts, separated by vSphere DRS resource pools with reservations. No dedicated management hardware is required. This is a fully supported deployment model in VCF 9.0.2.

Two-site architecture overview

The design below deploys one VCF Instance per site : Instance A at Site A (primary/protected) and Instance B at Site B (recovery). Both instances are registered under a single VCF Fleet, giving unified lifecycle management across all hosts from one VCF Operations pane & VCF Automation.

No L2 stretch · No NSX overlay · SRM IP customisation rules handle re-IP on failover
Site A — Primary Protected · 3+ ESXi hosts Consolidated Domain rp-sddc-mgmt VCF Operations ● SRM Fleet Manager ● SRM Ops for Logs ● SRM Ops for Networks ● SRM VCF Automation ● BKP vCenter ● VAMI SDDC Manager ● SFTP NSX Managers ×3 ● IND Identity Broker ● IND rp-user-vm Production workload VMs Shared ESXi hosts · vSphere HA · DRS SRM vSphere Replication Live Recovery → Workload VMs → No L2 stretch IP rules via SRM Storage replication Site B — Recovery Recovery site · 3+ ESXi hosts Consolidated Domain rp-sddc-mgmt VCF Operations replica ● ON Fleet Mgr replica ● ON Ops for Logs replica ● ON Ops for Networks replica ● ON VCF Automation ● RST vCenter B (own instance) ● RUN SDDC Manager B ● RUN NSX Managers B ×3 ● RUN Identity Broker B ● RUN rp-user-vm Workload VM replicas (storage) Shared ESXi hosts · vSphere HA · DRS SRM — Live Recovery Backup / Restore Independent per site ● ON = powers on failover · ● RUN = always running

Management appliance protection tiers

This is the most critical design decision and the one most commonly misunderstood. Not all management VMs can or should be SRM-protected. The tier is determined by the VM's architecture, not by preference.

Tier 1 : VMware Live Recovery
VCF Operations Fleet Manager Ops for Logs Ops for Networks

How it works: vSphere Replication continuously replicates these VMs from Site A to Site B. SRM orchestrates power-on order and applies IP customisation rules at failover Site B uses different subnets and there is no L2 stretch, so every VM receives a new IP via VMware Tools guest customisation. DNS must be updated post-failover. Boot order: VCF Operations powers on first, then Fleet Manager, then logging and network visibility.

Tier 2 : Backup and Restore
VCF Automation

How it works: A daily SFTP file-based backup is taken. At Site B failover, a fresh VCF Automation instance is deployed and the latest backup is restored into it. No vSphere Replication is used. This is per Broadcom VCF 9 Validated Solution VCF Automation is not SRM-protected.

Tier 3 : Independent per site
vCenter (each site) SDDC Manager (each site) NSX Managers ×3 (each site) Identity Broker (each site)

vCenter: Cannot be SRM-protected — SRM runs inside vCenter and cannot protect the host it runs in. Site B's vCenter is always independently operational. VAMI file-based backups taken daily to SFTP for local recovery if Site B's vCenter itself fails.

SDDC Manager: Each site has its own instance. Daily SFTP backup. If Site B's SDDC Manager fails, restore from backup after vCenter is confirmed operational — sequential dependency.

NSX Managers ×3: VLAN-only mode is deployed (no overlay transport zone on workload clusters). Each site's NSX Managers serve only their own VCF instance. No cross-site NSX sharing. No TEP VMkernel on any workload host 1500 MTU preserved.

Identity Broker: Stateless federation broker. Identity lives in your Active Directory, not in the broker. Site B's Identity Broker was already pointing at the same AD and continues serving Site B components when Site A fails. After VCF Operations fails over to Site B, re-register VCF Operations with Site B's Identity Broker — a five-minute runbook step.

Key design considerations

Resource isolation

DRS resource pools with reservations and limits keep management VMs from competing with production workloads. vSphere HA automatically restarts management VMs on surviving hosts.

Host count minimum

VCF 9.0.2 requires a minimum of 3 hosts for the management cluster with FC/VMFS storage down from 4 in VCF 5.x. Two hosts is the minimum for workload domain import.

No NSX overlay required

NSX is mandatory in VCF 9 but VLAN-only mode is fully supported. No TEP VMkernel, no GENEVE encapsulation, no MTU changes on workload clusters.

SRM without L2 stretch

SRM Network Mappings and IP Customisation Rules handle all VLAN and subnet differences between sites. This is the standard SRM routed DR model, no L2 extension required.

Storage replication

Array-based replication (e.g. Pure Storage SnapMirror) feeds SRM protection groups. NetApp SRA maps datastores via SnapMirror relationships. LUN IDs at Site B do not need to match Site A.

Brownfield convergence

The VCF Installer convergence wizard takes your existing vCenter, ESXi hosts, and NSX as inputs. Your vCenter is not replaced — it becomes the management domain vCenter.

What this design delivers

By the end of the deployment programme, the organisation has a unified VCF 9.0.2 fleet managing both sites from a single VCF Operations console, fully automated lifecycle management for all management appliances, bi-directional SRM protection with tested recovery plans, and centralised log collection via VCF Ops for Logs all without purchasing a single additional physical server.

Recommended approach: Do not run VCF convergence and pre-convergence remediation as parallel tracks. Complete HCI cluster remediation, DVS upgrades, and NSX 4.2.1 installation as Phase 0 and Phase 1 blockers. Once those are clean, the VCF Installer convergence is typically a single-day event per site.


Comments

Popular posts from this blog

Changing the FQDN of the vCenter appliance (VCSA)

This article states how to change the system name or the FQDN of the vCenter appliance 6.x You may not find any way to change the FQDN from the vCenter GUI either from VAMI page of from webclient as the option to change the hostname always be greyed out. Now the option left is from the command line of VCSA appliance. Below steps will make it possible to change the FQDN of the VCSA from the command line. Access the VCSA from console or from Putty session. Login with root permission Use above command in the command prompt of VCSA : /opt/vmware/share/vami/vami_config_net Opt for option 3 (Hostname) Change the hostname to new name Reboot the VCSA appliance.   After reboot you will be successfully manage to change the FQDN of the VCSA . Note: Above step is unsupported by VMware and may impact your SSL certificate and face problem while logging to vSphere Web Client. If you are using self-signed certificate, you can regenerate the certificate with...

Collecting Logs from NSX-T Edge nodes using CLI

  This article explains how to extract the logs from NSX-T Edge nodes from CLI. Let's view the steps involved: 1) Login to NSX-T  Edge node using CLI from admin credentials. 2) Use of  " get support-bundle " for Log extraction. get support-bundle command will extract the complete logs from NSX-T manager/Edge nodes. nsx-manager-1> get support-bundle file support-bundle.tgz 3) Last step is to us e of " copy file support-bundle.tgz url " command. copy file will forward your collected logs from the NSX-T manager to the destination(URL) host from where you can download the logs. copy file support.bundle.tgz url scp://root@192.168.11.15/tmp Here, the URL specified is the ESXi host ( 192.168.11.15) under /tmp partition where logs will be copied and from there one can extract it for further log review. Happy Learning.  :)

What's New in VMware Cloud Foundation (VCF) 9.0

   What's New in VMware Cloud Foundation (VCF) 9.0 VMware Cloud Foundation 9.0 is a major release that redefines private cloud platforms with a focus on unified management, operational efficiency, advanced security, and robust support for modern and AI workloads. Below is a comprehensive summary of the most significant new features and innovations. Unified Operations and User Experience ·          Single Unified Interface: VCF 9.0 introduces a consolidated interface for cloud administrators, providing a holistic view of private cloud operations. This streamlines daily management and reduces complexity, making on-premises environments feel more like public cloud in terms of usability . ·          Quick Start App: A new application that dramatically reduces setup time and complexity for deploying and configuring private cloud environments . ·        ...