Setting up Edge Clusters in VMware vCloud Director 9.7

Currently, I am working on some overall design content for Edge Clusters inside of VMware vCloud Director 9.7. However, I wanted to share a post on providing a step by step guide on establishing an Edge Cluster inside of vCD. I will much more to share on our corporate blog shortly, but this should start some thoughtful discussions.

Quick Intro to Edge Clusters

So what’s the deal with Edge Clusters? Edge Clusters now allow a provider to discrete control of tenant Edge placement. Previously, this was rather limited and only controlled at the Provider Virtual Data Center (pVDC) layer. With Edge Clusters, we now can establish this on a per-oVDC basis. In essence, the three main value points of Edge Clusters:

  1. Consumption of dedicated Edge Clusters for North/South traffic – optimized traffic flow while minimizing the span of Layer 2 broadcast traffic.
  2. Provide a higher level of availability to Edge nodes that can distinctly fail between two clusters.
  3. Ability to balance organization Edge services between multiple Edge Clusters – I do not have to use the “same” Primary and Secondary Edge Cluster for every org VDC. This can be configured on a per orgVDC basis.

Below is a overall high level design of Edge Clusters from a physical and logical layer –

This is now available based on a new construct called VDC Network Profiles. Network Profiles allow us to define org-VDC specific network configurations – this is starting now with Edge Clusters: a Primary and/or Secondary location.

The configuration of Edge Clusters on a per orgVDC basis is all completed via the API. The steps below will show the process on instantiating Edge Clusters inside of a vCD instance while configuring it on a per-tenant basis. However, the diagram below shows what it looks like from a visual representation. All of this is configured utilizing a JSON body.

Much more on design considerations and further insight too – let’s get to how to configure this for a vCD instance.

Setting up Edge Cluster in vCD – Step by Step Configuration

In this section, we will review the necessary steps to instantiate Edge Clusters inside of a vCloud Director instance. We will break this down to manageable sections that can be easily followed.

Edge Cluster Preparation

First off, we need to prepare our newly created Edge Cluster in our vCenter along with creating a resource pool. Currently, I am using two Edge Clusters each with two nodes for my lab – RegionA01-EDGE01 and RegionA01-EDGE02. We can have up to ten (10) Edge Clusters registered to a vCD instance, however, I am utilizing two for high availability purposes.

Let’s go ahead and create my resource pool – naming it respectively Edge-RP-01 and Edge-RP-02 –

Next, we need to create a storage policy and a tag inside of the vCenter where the Edge Cluster is located. If the provider is utilizing an existing storage policy that will be used for Edge Cluster consumption, one can skip this step. However, let’s assume this is a greenfield Edge Cluster deployment.

First, let’s create a new tag called “Edge Cluster”

Then we need to tag our datastore that resides on the Edge Clusters with this specific tag. For my lab environment, I am using “RegionA01-ISCSI01-COMP02” for Edges.

Now, let’s create a new storage policy, we will call this “Edge Storage Policy” –

We need to utilize a Rule Set that’s based on “Tag based placement” and utilizes the Storage category (this is what I utilized when creating the tag). From there, I selected my “Edge Cluster” tag –

On Storage compatibility screen, we can verify that it is reflecting my selected (tagged) datastore and we’re good to go from here –

Next, we need to prepare our Edge Clusters for NSX. I’m not going to walk through the steps required for this (installing VIBs, adding VTEP, etc.) however it is necessary that we add it to the respective Transport Zone that vCD consumes for cloud services. For my lab environment, I am using “Gold-PVDC-VXLAN-NP” for this configuration –

Final step to prepare for instantiation inside of vCD. We need to refresh the storage policies and network pools. Navigate to Network Pools -> right click and Sync while right clicking on the vCenter object and “Refresh Storage Policies” –

Creation of Edge Clusters in vCloud Director

Now we are ready to create our initial Edge Clusters inside of my vCD instance. As stated before, we support up to 10 Edge Clusters, but I will be adding two to my environment for availability purposes.

I will be utilizing Postman as it’s my preferred method to work with the API. Also note we will be utilizing the new “cloudapi” which requires a bearer token authentication configuration. If you need further guidance and an easy way to set this up, please check out my esteemed colleague Tom Fojta wrote on his tutorial.

Once we have our bearer token, ensure your version is set to 32 – this is required so we can work with the new networkProfile and EdgeCluster constructs.

Let’s check out what’s currently configured to the “edgeClusters” section –

GET https://vcd-url/cloudapi/1.0.0/edgeClusters

As we can see, there’s nothing configured currently with EdgeClusters to my vCD instance.

Let’s prepare for what we need to create the body of our post. We will need the following attributes:

  • Name – Edge Cluster name
  • Description – friendly descriptor of the Edge Cluster
  • Under the resourcePool frame, we need the following:
    • Moref – Managed Object Identifier
    • vcId – vCD’s identifier of the vCenter Server
  • storageProfileName – name of the storage profile inside of vCD (vCenter)

While Name, Description, and Storage Policy (or profile) are pretty straight forward, let’s figure out how we can get the object identifier and the vcId.

I utilized the Managed Object Browser of the vCenter to figure out the resource pool ID – remember, you want to find out the explicit ID of the created RP –

To ascertain the vcId, we will browse the vCD API and look at “vimServerReferences” – for my environment, I had a single vCenter server attached to this vCD instance.

GET https://vcd-url/api/admin/extension/vimServerReferences

The highlighted portion shows the exact ID required for the vcId portion – this starts at the “urn” prefix.

Okay, let’s go ahead and build the JSON body. Note that I am selecting raw and JSON as my application to successfully post this

POST https://vcd-url/cloudapi/1.0.0/edgeClusters

Once executed, one can check the status of the task inside of vCD.

If we do a GET on this location, we can see that the task was successful.

Now, if we do a GET on the edgeClusters location, we should see our first Edge Cluster. Excellent!

Now, I’m going to go ahead and build my POST body for the 2nd Edge Cluster.

It was a success…

Now, if we do a GET on the edgeClusters, we can see both Edge Clusters registered to vCD.

One can see that there’s an ID generated for each Edge Cluster. We will need this information for configuring each oVDC. Therefore, I created a notepad entry that depicts each of these values and what I intend to establish – Edge01 is my primary Edge Cluster while Edge02 is secondary.

Applying Edge Cluster Configuration to Tenant Organization VDC

Now we are ready to apply this newly created Edge Cluster configuration to one of my tenants in my vCD instance. In this example, I am going to be configuring my organization VDC “Public-OVDC” with this new Edge Cluster –

First, let’s verify that I am using the correct network pool – yes, I see “Gold-PDVC-VXLAN-NP” configured for this oVDC –

Back to Postman – now we need to browse to this specific oVDC so we can configure the networkProfile information.

GET https://vcd-url/api/admin/org/<org-id>

Search for “vdc” in the received body – we are looking for the HREF link so we can browse to that –

Doing a GET on this VDC and searching for “NetworkProfile” will provide us the link we need for manipulating the configuration –

GET https://vcd-url/api/admin/vdc/<vdc-id>

If we take this newly found HREF and do a get, we should see a clean configuration for the EdgeCluster –

Great! Now we are ready to build a body for a PUT operation. Again, I referenced my notepad entry so I can build out each respective ID –

Let’s check the status of the task…looks good…

Now, let’s do a GET on the networkProfile. We see the newly configured Edge Clusters!


Going back to my Public-OVDC, let’s go ahead and create a new Edge.

Going through the new H5 wizard for creation of an Edge –

Let’s confirm….

Now, let’s take a look at the vCenter and see what’s happening. As we can see, the Edge is being deployed to EDGE01 which is my configured primary. Great!

Now, deployment is complete, but I want to turn on Edge HA to ensure it deploys the secondary instance to EDGE02 cluster.



Now, Public-OVDC has the ability to consume distinct Edge services between these two Edge Clusters.


In summary, the use of Edge Clusters provides distinct control of Edge placement while optimizing network traffic. I will more on this soon. Thanks!


Removing VMware vCloud Availability 3.0 Plugin from vCenter

Recently, I had this come up where I had to remove the vCloud Availability 3.0 (vCAv) plugin from my lab vCenter. Today, there is not a way to do this through the vCAv on-premises appliance UI – it must be done directly on the vCenter. Therefore, after speaking with a colleague (Bill Leck), I received the steps on removing it from the vCenter instance.

The following steps will work with vSphere 6.5x and 6.7U1. With 6.7U2, you can skip step 2 – thanks Vladimir Velikov.

Here are the high-level steps:

  1. SSH to the vCenter
  2. Remove packages
  3. Remove endpoints from the lookup service
  4. Restart vCenter UI services

This is a very easy and straight forward process. I’ve documented the step by step directions below.

First, I see my vCloud Availability plugin on vCenter instance –

Step 1

Let’s SSH to my vCenter…

Step 2

UI packages are under the /etc/vmware/vsphere-ui/cm-service-packages folder. We need to remove the specific vCAv packages from this folder. Below is the two packages to remove:



Step 3

Next, we need to remove the vCAv entity from the lookupservice SDK. First, we need to get the ID of the vCAV endpoint, and then unregister it. Below are the two commands we will utilize. Note the space between the URL and the 2>/dev/null.

/usr/lib/vmidentity/tools/scripts/ list --ep-type com.vmware.vcav.endpoint --url http://localhost:7080/lookupservice/sdk 2>/dev/null

/usr/lib/vmidentity/tools/scripts/ unregister  --url http://localhost:7080/lookupservice/sdk --user ‘<SSO User>’ --password ‘<SSO User password>’ --id <ID of vCAv service identified by the above command> 2>/dev/null

In my environment, we can see the following when I run the first command:

I’ve also highlighted the service ID as we will need that for the next command.

Now, inputting in the second command and copying the service ID, it successfully removes the endpoint –

Last of all, when attempting to hit the lookup service for it, nothing is listed for ‘vcav’ anymore –

Step 4

Last of all, we will want to restart the vSphere UI services. 6.5 and 6.7 operate a little differently, so the syntax is listed below.

vSphere 6.5 - execute “service-control --stop vsphere-ui”, followed by “service-control --start vsphere-ui”
vSphere 6.7 - execute  “vmon-cli -r vsphere-ui”

I am running 6.5, so let me go ahead and stop and start the UI services.


Logging into my vCenter instance, the vCAv plugin is now removed from the Menu and shortcuts window.

Very easy process on removal. If you need to re-install the plugin, please do so through the vCAv on-premises appliance registration. Big thanks to Bill Leck for his guidance.


Overview of VMware vCloud Availability 3.0 – Tenant Deployment, Protection Workflow, Resources

Once the provider site is operational, we are ready to bring the on-premises / Tenant site online for VMware vCloud Availability 3.0 (vCAv). Again, recap of the deployment steps:

  1. Deploy vCAv On-Premises Appliance
  2. Start configuration wizard
  3. Connect to vCAv Cloud Tunnel
  4. Configuration of local placement
  5. Validation and vSphere UI

Before we get started, let’s take a look at a port mapping diagram.

What’s interesting is one does not need a DNAT rule for tunnel traffic. The reason is any traffic is initiated from the on-prem site negating any ingress traffic (everything flows outbound), hence a standard SNAT (route) is sufficient. This is great as we do not need any network changes on the client side.

Deploy vCAv On-Premises Appliance

Deploying the appliance is very similar to the provider side. We have packaged up a standalone on-premises appliance that does not have the selection of the roles (and minimizes any client confusion). In the on-premises version, one does not have a dropdown of the service role selection, but just a acceptance and typical OVF deployment –

So again, very easy and similar to typical VMware OVF deployments.

Start Configuration Wizard

Let’s open a browser to https://onprem-fqdn/ui/admin and login –

You will be prompted to change the password to the appliance. From there, let’s hit the initial setup wizard –

Set your site name and any pertinent description. Click Next when complete.

As expected, we need to establish the lookup service along with SSO credentials.

On Cloud Details, this is where we pair with our vCloud/vCAv site. Configure the public API endpoint (for my lab, I am using 8048 but I showed earlier on utilizing 443) along with your organization administrative credentials.

Toggle the “Allow Access from Cloud” option if you want users from vCD to have the ability to browse and configure VMs locally from this site.

Accept (or remove) the CEIP and let’s take a look at the final completion screen –

Before hitting Finish, let’s toggle the “Configure local placement now” option to knock this out.

Local placement sets the vCenter/resource hierarchy for cloud to on-premises / failback protection.

Next, we will see a 5 step process for Local Placement – walk through the UI and select the hierarchy objects.

Validate and hit the Finish button.

Validation and vCenter UI

From our Cloud site, we can now see a new On-Prem Site and shows a status of OK.

Re-logging into the on-premises appliance, we can see the Manager and Cloud status as healthy also –

From the vSphere Client, we can also see vCloud Availability available –

Protection Workflow

We have two operations available: Protection and Migration. In the two below screenshots, our options change based on what button is selected.

One can establish incoming or outgoing replications between cloud or on-prem –

While I am not going to exhaustively go through every permutation, one can see how intuitive it is to protect or migrate workloads.

Protection from On-Premises

In my source site, I only have one choice as I select from On-Prem and have a single paired vCenter/Tenant site.

From here, I select the VMs I want to protect.

Select the Target oVDC –

If there’s a Seed VM available, select it.

Now I can specify my protection settings: my RPO, storage policy, retention policy for point in time instances, and if I want to quiesce/compress the instance and traffic.

Scheduling can be defined –

Finally, we get to see our validation.

Protection Settings – Viewing, Re-Addressing

Reviewing the current state, one can ascertain the health of the current protected workload with my source, destination, and RPO –

Clicking on the Networks button brings up our menu on what we want to do on Migrate/Failover or Test Failover –

This can be applied to all associated vApps or VMs, or explicitly broken down on per vNIC basis. One can also reset the MAC. Note that all of the same vCD guidelines apply – can’t set a manual IP outside of the CIDR block of that oVDC network, etc.

Clicking the sub-button Test Failover presents similar options, but one can copy from the Migrate/Failover menu to get started.

If we need to change the Replication Owner, we can click the Owner and select the new organization owner.

Migrate/Failover/Test Failover

Going through the Migrate, Failover, and Test Failover options are very intuitive.

For Migrate, we can select to power on the recovered vApps and apply the specific preconfigured network settings (or override that and select a specific network) –

For Failover, very similar to migrate, but we can drill into a Recovery Instance –

Lastly, Test Failover provides the ability to test a VM/workload without impacting production. This can be associated to a “bubble/fenced” network and tested by the application team to verify functionality.


As a final thought, I want to say how it’s been a pleasure working with the team to see this to fruition and public release. I believe this is going to be an extremely powerful platform and this is just the start.

After vCAv 3.0 is released, I will have more material along with many of my peers who will be discussing vCAv further. Below are some lightboard videos that introduces some of the concepts through these posts. Enjoy!


Overview of VMware vCloud Availability 3.0 – Provider Deployment

In this post, we will be reviewing the steps on setting up and operationalizing vCloud Availability 3.0 (vCAv) for a provider site.

There is a presumption that you will be deploying for production, so that is what I’ll be reviewing. The consolidated (combined) appliance would be an easier deployment, but still requires the below configurations post-deployment.

Recap of the Provider steps:

  1. Deployment of Cloud Replication Management (CRM) Instance
    1. Initial Replication Management Setup
    2. Initial Setup Wizard
  2. Deploy vCAv Replicator(s)
  3. Deploy vCAv Tunnel
  4. Configuration of CRM instance and start of site wizard
  5. Configuration of Replicator
    1. Pairing Replicator with Replication Manager
  6. Configuration of Tunnel
  7. Validation


  1. Available DNS and NTP server
  2. SSO Lookup Service Address
  3. Routing and Firewall Ports in place – see below for further insight
  4. vCenter and vCD on interoperability matrix
  5. Certificate Management – all certificates can be managed via the UI utilizing PKCS#12 certificates. Services must be restarted post-import.

Provider Port Mapping

Below is a diagram my esteemed peer, Chris Johnson, worked up for our upcoming EMPOWER presentation.


  1. Establishing a DNAT rule from 443 to 8048 is crucial for tunnel connectivity. This also has to be set as the API endpoint and will be pushed from the CRM instance.
  2. Ensure we can route and have direct port access between payload/resource vCenters, replicators, and Cloud Management.

Deployment of Cloud Replication Management (CRM) Instance

All of the roles we deploy for the provider will be coming from a single OVF – this is very similar to other VMware based virtual appliances. However, during the OVF deployment process, you will be prompted for the below role selection. For deployment of CRM, select Cloud Replication Management.

Initial Replication Manager Setup

Wait a few moments post-power on for initial configuration to take place, then open a browser to https://crm-fqdn:8441 so we can set the initial lookup service configuration.

We will be prompted for changing the default password. Note this is the same process for any newly deployed vCAv appliance and must be done on initial login –

From our initial screen, we can see that we have two issues: 1) missing Lookup Service settings and 2) Configured Replicators – there is none. The latter is fine for now, we will pair the replicator once we are done with the site wizard.

Let’s go over to Configuration and set the lookup service –

Accept the certificate…

As discussed prior, we will not see any replicators right now and will come back at a later time.

Initial Setup Wizard

Open a new tab to https://crm-fqdn/ui/admin and log in with your root account.

From here, we can see a link to run the initial setup wizard –

This is a very simple wizard that brings us through the site setup. From the beginning, we need to set a sitename. Note that you cannot utilize spaces and it is case-sensitive.

Second, set your public API endpoint address. Note this is where the traffic will ingress in from your tunnel node. In my lab environment, I will be directly connecting over 8048 (compared to traditional perimeter environment that would utilize 443 and DNAT rule to forward that traffic).

Here’s what I would if that was the case.

Next, lookup service address. You’ll be setting this quite a bit. 🙂

vCD configuration – note that you must include /api after the vCD FQDN. Also, during this initial setup, vCAv will take care of publishing the Availability plugin to your vCD instance. On boot of the CRM vCAv appliance (or during any upgrade), the plugin will refresh or push an update if required – very nice.

Apply your vCAv license key –

Consent or remove the check for the VMware Customer Experience Improvement Program (CEIP) –

Finally, we review our desired state. Verify everything looks to your specification, and hit complete.

This will take a few moments for the configuration. You will be prompted to log back in and you will be brought to the vApp Replication Manager Admin UI page. You can now utilize vCD administrative credentials too!

Let’s click on the Configuration link on the left side. As we can see, we still have some work to do for the Replicator and Tunnel configuration.

Deploy vCAv Replicator(s)

Next up, let’s configure the Replicator instance. Repeat this process for every required Replicator needed for your environment.

Open a tab to https://replicator-fqdn/ui/admin

After setting your password, you will be prompted to set the lookup service address –

That’s it for the replicator! Now, we are ready to pair this replicator with the Replication Manager.

Pairing Replicator with Replication Manager

Open your tab to https://crm-fqdn:8441 and browse to Replicators on the left side –

Let’s click the New button and open up the wizard –

We need to provide the fully qualified domain name along with port 8043 (this is what’s utilized for the Replication Manager to Replicator API connectivity) along with the appliance password and SSO administrator credentials.

Once paired, we will see it in the list. Repeat this process for any additional replicators.

Now, from the CRM Provider UI, we can see a newly added Replicator instance. Next up, Tunnel configuration.

Configuration of Tunnel

Final configuration – let’s configure the tunnel for inbound and outbound connectivity. Browse to https://tunnel-fqdn/ui/admin and login –

Once you set your password, you will be prompted to set two things: 1) lookup service address and 2) Public API endpoint

As discussed before, the public API endpoint will be based off of your network topology. For my lab, I am using direct 8048 access. However, if I was going to DNAT from a public IP/FQDN utilizing 443, I would have the following –

Once completed, we will see the two fields completed.

Let’s hop over to the CRM Provider UI configuration and configure the tunnel –

From here, we need to establish CRM to Tunnel API communication, which happens on port 8047 –

Type in the appliance password. Once applied, we will see a tunnel configuration (again, I was using 443 for a period of time, but you will see 8048 for future configurations).

After any port changes, we recommend doing a service restart. This can be achieved by going to System Monitoring and clicking Restart Service –


After a site deployment and configuration, I always walk through to see service health.

From the main provider UI page, I can see overall system health –

From my System Monitoring page, we can see everything is green and I see my Tunnel and associated Replicators –

From vCloud Director, my plugin is also available too for self-service management.

Final Thoughts:

  1. As depicted above, deployment is rather straight forward and pretty seamless.
  2. Site Deployment must be done on a per-vCD instance basis. So if you have four sites, expect to do this four times.

Next up, Tenant/On-Premises setup.