Setting up Edge Clusters in VMware vCloud Director 9.7

Currently, I am working on some overall design content for Edge Clusters inside of VMware vCloud Director 9.7. However, I wanted to share a post on providing a step by step guide on establishing an Edge Cluster inside of vCD. I will much more to share on our corporate blog shortly, but this should start some thoughtful discussions.

Quick Intro to Edge Clusters

So what’s the deal with Edge Clusters? Edge Clusters now allow a provider to discrete control of tenant Edge placement. Previously, this was rather limited and only controlled at the Provider Virtual Data Center (pVDC) layer. With Edge Clusters, we now can establish this on a per-oVDC basis. In essence, the three main value points of Edge Clusters:

  1. Consumption of dedicated Edge Clusters for North/South traffic – optimized traffic flow while minimizing the span of Layer 2 broadcast traffic.
  2. Provide a higher level of availability to Edge nodes that can distinctly fail between two clusters.
  3. Ability to balance organization Edge services between multiple Edge Clusters – I do not have to use the “same” Primary and Secondary Edge Cluster for every org VDC. This can be configured on a per orgVDC basis.

Below is a overall high level design of Edge Clusters from a physical and logical layer –

This is now available based on a new construct called VDC Network Profiles. Network Profiles allow us to define org-VDC specific network configurations – this is starting now with Edge Clusters: a Primary and/or Secondary location.

The configuration of Edge Clusters on a per orgVDC basis is all completed via the API. The steps below will show the process on instantiating Edge Clusters inside of a vCD instance while configuring it on a per-tenant basis. However, the diagram below shows what it looks like from a visual representation. All of this is configured utilizing a JSON body.

Much more on design considerations and further insight too – let’s get to how to configure this for a vCD instance.

Setting up Edge Cluster in vCD – Step by Step Configuration

In this section, we will review the necessary steps to instantiate Edge Clusters inside of a vCloud Director instance. We will break this down to manageable sections that can be easily followed.

Edge Cluster Preparation

First off, we need to prepare our newly created Edge Cluster in our vCenter along with creating a resource pool. Currently, I am using two Edge Clusters each with two nodes for my lab – RegionA01-EDGE01 and RegionA01-EDGE02. We can have up to ten (10) Edge Clusters registered to a vCD instance, however, I am utilizing two for high availability purposes.

Let’s go ahead and create my resource pool – naming it respectively Edge-RP-01 and Edge-RP-02 –

Next, we need to create a storage policy and a tag inside of the vCenter where the Edge Cluster is located. If the provider is utilizing an existing storage policy that will be used for Edge Cluster consumption, one can skip this step. However, let’s assume this is a greenfield Edge Cluster deployment.

First, let’s create a new tag called “Edge Cluster”

Then we need to tag our datastore that resides on the Edge Clusters with this specific tag. For my lab environment, I am using “RegionA01-ISCSI01-COMP02” for Edges.

Now, let’s create a new storage policy, we will call this “Edge Storage Policy” –

We need to utilize a Rule Set that’s based on “Tag based placement” and utilizes the Storage category (this is what I utilized when creating the tag). From there, I selected my “Edge Cluster” tag –

On Storage compatibility screen, we can verify that it is reflecting my selected (tagged) datastore and we’re good to go from here –

Next, we need to prepare our Edge Clusters for NSX. I’m not going to walk through the steps required for this (installing VIBs, adding VTEP, etc.) however it is necessary that we add it to the respective Transport Zone that vCD consumes for cloud services. For my lab environment, I am using “Gold-PVDC-VXLAN-NP” for this configuration –

Final step to prepare for instantiation inside of vCD. We need to refresh the storage policies and network pools. Navigate to Network Pools -> right click and Sync while right clicking on the vCenter object and “Refresh Storage Policies” –

Creation of Edge Clusters in vCloud Director

Now we are ready to create our initial Edge Clusters inside of my vCD instance. As stated before, we support up to 10 Edge Clusters, but I will be adding two to my environment for availability purposes.

I will be utilizing Postman as it’s my preferred method to work with the API. Also note we will be utilizing the new “cloudapi” which requires a bearer token authentication configuration. If you need further guidance and an easy way to set this up, please check out my esteemed colleague Tom Fojta wrote on his tutorial.

Once we have our bearer token, ensure your version is set to 32 – this is required so we can work with the new networkProfile and EdgeCluster constructs.

Let’s check out what’s currently configured to the “edgeClusters” section –

GET https://vcd-url/cloudapi/1.0.0/edgeClusters

As we can see, there’s nothing configured currently with EdgeClusters to my vCD instance.

Let’s prepare for what we need to create the body of our post. We will need the following attributes:

  • Name – Edge Cluster name
  • Description – friendly descriptor of the Edge Cluster
  • Under the resourcePool frame, we need the following:
    • Moref – Managed Object Identifier
    • vcId – vCD’s identifier of the vCenter Server
  • storageProfileName – name of the storage profile inside of vCD (vCenter)

While Name, Description, and Storage Policy (or profile) are pretty straight forward, let’s figure out how we can get the object identifier and the vcId.

I utilized the Managed Object Browser of the vCenter to figure out the resource pool ID – remember, you want to find out the explicit ID of the created RP –

To ascertain the vcId, we will browse the vCD API and look at “vimServerReferences” – for my environment, I had a single vCenter server attached to this vCD instance.

GET https://vcd-url/api/admin/extension/vimServerReferences

The highlighted portion shows the exact ID required for the vcId portion – this starts at the “urn” prefix.

Okay, let’s go ahead and build the JSON body. Note that I am selecting raw and JSON as my application to successfully post this

POST https://vcd-url/cloudapi/1.0.0/edgeClusters

Once executed, one can check the status of the task inside of vCD.

If we do a GET on this location, we can see that the task was successful.

Now, if we do a GET on the edgeClusters location, we should see our first Edge Cluster. Excellent!

Now, I’m going to go ahead and build my POST body for the 2nd Edge Cluster.

It was a success…

Now, if we do a GET on the edgeClusters, we can see both Edge Clusters registered to vCD.

One can see that there’s an ID generated for each Edge Cluster. We will need this information for configuring each oVDC. Therefore, I created a notepad entry that depicts each of these values and what I intend to establish – Edge01 is my primary Edge Cluster while Edge02 is secondary.

Applying Edge Cluster Configuration to Tenant Organization VDC

Now we are ready to apply this newly created Edge Cluster configuration to one of my tenants in my vCD instance. In this example, I am going to be configuring my organization VDC “Public-OVDC” with this new Edge Cluster –

First, let’s verify that I am using the correct network pool – yes, I see “Gold-PDVC-VXLAN-NP” configured for this oVDC –

Back to Postman – now we need to browse to this specific oVDC so we can configure the networkProfile information.

GET https://vcd-url/api/admin/org/<org-id>

Search for “vdc” in the received body – we are looking for the HREF link so we can browse to that –

Doing a GET on this VDC and searching for “NetworkProfile” will provide us the link we need for manipulating the configuration –

GET https://vcd-url/api/admin/vdc/<vdc-id>

If we take this newly found HREF and do a get, we should see a clean configuration for the EdgeCluster –

Great! Now we are ready to build a body for a PUT operation. Again, I referenced my notepad entry so I can build out each respective ID –

Let’s check the status of the task…looks good…

Now, let’s do a GET on the networkProfile. We see the newly configured Edge Clusters!


Going back to my Public-OVDC, let’s go ahead and create a new Edge.

Going through the new H5 wizard for creation of an Edge –

Let’s confirm….

Now, let’s take a look at the vCenter and see what’s happening. As we can see, the Edge is being deployed to EDGE01 which is my configured primary. Great!

Now, deployment is complete, but I want to turn on Edge HA to ensure it deploys the secondary instance to EDGE02 cluster.



Now, Public-OVDC has the ability to consume distinct Edge services between these two Edge Clusters.


In summary, the use of Edge Clusters provides distinct control of Edge placement while optimizing network traffic. I will more on this soon. Thanks!


Five Quick Tips on VMware vCloud Availability 3.0

I wanted to summarize a few things I’ve found when working with VMware vCloud Availability 3.0 (vCAv) over the past few months that will be helpful to providers and tenants. I’m sure there’s others, but these are the ones that come to mind.

Tunnel (Public API) Endpoint

This is a very important step for production deployments – setting the URL endpoint to ensure proper cloud access from tenants and other vCD instances.

The public API endpoint can be configured from the Cloud Replication Management (CRM) under Configuration:

Or can be configured directly on the tunnel:

When one configures it from the CRM, it does push this change to the tunnel appliance over port 8047 (internal communication port between CRM and the tunnel appliance).

Note that you will need a proper DNS or public IP address along with the DNAT port utilized (such as 443). Any configuration or reconfiguration requires a service restart on the CRM and Tunnel appliance –

Port Requirements

The above table summarizes all necessary ports for proper vCAv management. While this is ingress communication to the vCAv appliances, utilizing the combined appliance does present a different path for configuration. Below are the explicit ports for configuration of each role inside of a single appliance:

  • vApp Replication Manager + Portal – https://appliance-ip:8046/ui/admin
    • Provides the main interface for the cloud-to-cloud replication operations. It understands the vCloud Director level concepts and works with vApps and virtual machines.
  • Replication Manager – https://appliance-ip:8044/ui/admin
    • A management service operating on the vCenter Server level. It understands the vCenter Server level concepts for starting the replication workflow for the virtual machines.
  • Replicatorhttps://appliance-ip:8043/ui/admin
    • Exposes the low-level HBR primitives as REST APIs.
  • Tunnelhttps://appliance-ip:8047/ui/admin
    • Simplifies provider networking setup by channeling all incoming and outgoing traffic for a site through a single point

HBR Management

In my lab, I’ve rebuilt vCAv many times and in some scenarios, I’ve forgot that I had an active replication/protection on a VM. When re-enabling protection, one would receive an error stating “This VM or vApp is already replicated” and fails to protect.

This is due to the host-based replication process still enabled on this VM. To disable it, the provider will need to locate the VM and SSH to the ESXi host. From there, we can utilize the “vim-cmd” process and utilize the hbrsvc scope.

There’s two command you’ll need to know:

  • vim-cmd hbrsvc/getallvms
    • This will get the world ID of the VM’s so you can utilize this on the next command.
  • vim-cmd hbrsvc/vmreplica disable <VM_ID>
    • disables HBR on said VM

From there, one can successfully re-enable protection for this VM.

Plugin Management

There are two things I want to discuss:

  1. Plugin Visibility
  2. Changes to the vCAv Configuration and how it relates to the Availability Plugin

Plugin Visibility

When the provider configures vCAv for the first time and connects the vCD instance, it immediately makes an API call to push the Availability plugin. This is available to all by default –

You have two options on configuring visibility to the Availability plugin: 1) in vCD 9.7, utilize the Customize Portal plugin or 2) utilize the API calls to restrict access.

For 9.7 installs, it’s fairly easy -go to Customize Portal -> select vCloud Availability -> Publish and remove the selected tenants:

Pre 9.7, we will need to utilize the API to manage accessibility. My esteemed peer Chris Johnson did a writeup on managing access to vCAv 3.0 recently, but my older vCAv C2C 1.5 article also applies.

vCloud Availability tenants list

Changes to the vCAv Configuration

When the provider changes a vCAv service/system configuration, the plugin must also be updated with this new information. This is important as the change could have a new tunnel address.

This is very easy – all we need to do is re-register the vCD instance from vCAv. When we re-authenticate, vCAv will push the updated plugin to vCD.

Provider and Tenant Diagrams

The below are two diagrams that depict port communication for a provider and tenant environment. This is very helpful to understand what is required from an ingress and egress perspective –


Tenant – as discussed before, there is no need for a DNAT rule as all traffic originates from the on-premises tunnel to communicate to the provider vCAv environment.

That’s all for now.


Removing VMware vCloud Availability 3.0 Plugin from vCenter

Recently, I had this come up where I had to remove the vCloud Availability 3.0 (vCAv) plugin from my lab vCenter. Today, there is not a way to do this through the vCAv on-premises appliance UI – it must be done directly on the vCenter. Therefore, after speaking with a colleague (Bill Leck), I received the steps on removing it from the vCenter instance.

The following steps will work with vSphere 6.5x and 6.7U1. With 6.7U2, you can skip step 2 – thanks Vladimir Velikov.

Here are the high-level steps:

  1. SSH to the vCenter
  2. Remove packages
  3. Remove endpoints from the lookup service
  4. Restart vCenter UI services

This is a very easy and straight forward process. I’ve documented the step by step directions below.

First, I see my vCloud Availability plugin on vCenter instance –

Step 1

Let’s SSH to my vCenter…

Step 2

UI packages are under the /etc/vmware/vsphere-ui/cm-service-packages folder. We need to remove the specific vCAv packages from this folder. Below is the two packages to remove:



Step 3

Next, we need to remove the vCAv entity from the lookupservice SDK. First, we need to get the ID of the vCAV endpoint, and then unregister it. Below are the two commands we will utilize. Note the space between the URL and the 2>/dev/null.

/usr/lib/vmidentity/tools/scripts/ list --ep-type com.vmware.vcav.endpoint --url http://localhost:7080/lookupservice/sdk 2>/dev/null

/usr/lib/vmidentity/tools/scripts/ unregister  --url http://localhost:7080/lookupservice/sdk --user ‘<SSO User>’ --password ‘<SSO User password>’ --id <ID of vCAv service identified by the above command> 2>/dev/null

In my environment, we can see the following when I run the first command:

I’ve also highlighted the service ID as we will need that for the next command.

Now, inputting in the second command and copying the service ID, it successfully removes the endpoint –

Last of all, when attempting to hit the lookup service for it, nothing is listed for ‘vcav’ anymore –

Step 4

Last of all, we will want to restart the vSphere UI services. 6.5 and 6.7 operate a little differently, so the syntax is listed below.

vSphere 6.5 - execute “service-control --stop vsphere-ui”, followed by “service-control --start vsphere-ui”
vSphere 6.7 - execute  “vmon-cli -r vsphere-ui”

I am running 6.5, so let me go ahead and stop and start the UI services.


Logging into my vCenter instance, the vCAv plugin is now removed from the Menu and shortcuts window.

Very easy process on removal. If you need to re-install the plugin, please do so through the vCAv on-premises appliance registration. Big thanks to Bill Leck for his guidance.


Overview of VMware vCloud Availability 3.0 – Tenant Deployment, Protection Workflow, Resources

Once the provider site is operational, we are ready to bring the on-premises / Tenant site online for VMware vCloud Availability 3.0 (vCAv). Again, recap of the deployment steps:

  1. Deploy vCAv On-Premises Appliance
  2. Start configuration wizard
  3. Connect to vCAv Cloud Tunnel
  4. Configuration of local placement
  5. Validation and vSphere UI

Before we get started, let’s take a look at a port mapping diagram.

What’s interesting is one does not need a DNAT rule for tunnel traffic. The reason is any traffic is initiated from the on-prem site negating any ingress traffic (everything flows outbound), hence a standard SNAT (route) is sufficient. This is great as we do not need any network changes on the client side.

Deploy vCAv On-Premises Appliance

Deploying the appliance is very similar to the provider side. We have packaged up a standalone on-premises appliance that does not have the selection of the roles (and minimizes any client confusion). In the on-premises version, one does not have a dropdown of the service role selection, but just a acceptance and typical OVF deployment –

So again, very easy and similar to typical VMware OVF deployments.

Start Configuration Wizard

Let’s open a browser to https://onprem-fqdn/ui/admin and login –

You will be prompted to change the password to the appliance. From there, let’s hit the initial setup wizard –

Set your site name and any pertinent description. Click Next when complete.

As expected, we need to establish the lookup service along with SSO credentials.

On Cloud Details, this is where we pair with our vCloud/vCAv site. Configure the public API endpoint (for my lab, I am using 8048 but I showed earlier on utilizing 443) along with your organization administrative credentials.

Toggle the “Allow Access from Cloud” option if you want users from vCD to have the ability to browse and configure VMs locally from this site.

Accept (or remove) the CEIP and let’s take a look at the final completion screen –

Before hitting Finish, let’s toggle the “Configure local placement now” option to knock this out.

Local placement sets the vCenter/resource hierarchy for cloud to on-premises / failback protection.

Next, we will see a 5 step process for Local Placement – walk through the UI and select the hierarchy objects.

Validate and hit the Finish button.

Validation and vCenter UI

From our Cloud site, we can now see a new On-Prem Site and shows a status of OK.

Re-logging into the on-premises appliance, we can see the Manager and Cloud status as healthy also –

From the vSphere Client, we can also see vCloud Availability available –

Protection Workflow

We have two operations available: Protection and Migration. In the two below screenshots, our options change based on what button is selected.

One can establish incoming or outgoing replications between cloud or on-prem –

While I am not going to exhaustively go through every permutation, one can see how intuitive it is to protect or migrate workloads.

Protection from On-Premises

In my source site, I only have one choice as I select from On-Prem and have a single paired vCenter/Tenant site.

From here, I select the VMs I want to protect.

Select the Target oVDC –

If there’s a Seed VM available, select it.

Now I can specify my protection settings: my RPO, storage policy, retention policy for point in time instances, and if I want to quiesce/compress the instance and traffic.

Scheduling can be defined –

Finally, we get to see our validation.

Protection Settings – Viewing, Re-Addressing

Reviewing the current state, one can ascertain the health of the current protected workload with my source, destination, and RPO –

Clicking on the Networks button brings up our menu on what we want to do on Migrate/Failover or Test Failover –

This can be applied to all associated vApps or VMs, or explicitly broken down on per vNIC basis. One can also reset the MAC. Note that all of the same vCD guidelines apply – can’t set a manual IP outside of the CIDR block of that oVDC network, etc.

Clicking the sub-button Test Failover presents similar options, but one can copy from the Migrate/Failover menu to get started.

If we need to change the Replication Owner, we can click the Owner and select the new organization owner.

Migrate/Failover/Test Failover

Going through the Migrate, Failover, and Test Failover options are very intuitive.

For Migrate, we can select to power on the recovered vApps and apply the specific preconfigured network settings (or override that and select a specific network) –

For Failover, very similar to migrate, but we can drill into a Recovery Instance –

Lastly, Test Failover provides the ability to test a VM/workload without impacting production. This can be associated to a “bubble/fenced” network and tested by the application team to verify functionality.


As a final thought, I want to say how it’s been a pleasure working with the team to see this to fruition and public release. I believe this is going to be an extremely powerful platform and this is just the start.

After vCAv 3.0 is released, I will have more material along with many of my peers who will be discussing vCAv further. Below are some lightboard videos that introduces some of the concepts through these posts. Enjoy!