Setting up Edge Clusters in VMware vCloud Director 9.7

Currently, I am working on some overall design content for Edge Clusters inside of VMware vCloud Director 9.7. However, I wanted to share a post on providing a step by step guide on establishing an Edge Cluster inside of vCD. I will much more to share on our corporate blog shortly, but this should start some thoughtful discussions.

Quick Intro to Edge Clusters

So what’s the deal with Edge Clusters? Edge Clusters now allow a provider to discrete control of tenant Edge placement. Previously, this was rather limited and only controlled at the Provider Virtual Data Center (pVDC) layer. With Edge Clusters, we now can establish this on a per-oVDC basis. In essence, the three main value points of Edge Clusters:

  1. Consumption of dedicated Edge Clusters for North/South traffic – optimized traffic flow while minimizing the span of Layer 2 broadcast traffic.
  2. Provide a higher level of availability to Edge nodes that can distinctly fail between two clusters.
  3. Ability to balance organization Edge services between multiple Edge Clusters – I do not have to use the “same” Primary and Secondary Edge Cluster for every org VDC. This can be configured on a per orgVDC basis.

Below is a overall high level design of Edge Clusters from a physical and logical layer –

This is now available based on a new construct called VDC Network Profiles. Network Profiles allow us to define org-VDC specific network configurations – this is starting now with Edge Clusters: a Primary and/or Secondary location.

The configuration of Edge Clusters on a per orgVDC basis is all completed via the API. The steps below will show the process on instantiating Edge Clusters inside of a vCD instance while configuring it on a per-tenant basis. However, the diagram below shows what it looks like from a visual representation. All of this is configured utilizing a JSON body.

Much more on design considerations and further insight too – let’s get to how to configure this for a vCD instance.

Setting up Edge Cluster in vCD – Step by Step Configuration

In this section, we will review the necessary steps to instantiate Edge Clusters inside of a vCloud Director instance. We will break this down to manageable sections that can be easily followed.

Edge Cluster Preparation

First off, we need to prepare our newly created Edge Cluster in our vCenter along with creating a resource pool. Currently, I am using two Edge Clusters each with two nodes for my lab – RegionA01-EDGE01 and RegionA01-EDGE02. We can have up to ten (10) Edge Clusters registered to a vCD instance, however, I am utilizing two for high availability purposes.

Let’s go ahead and create my resource pool – naming it respectively Edge-RP-01 and Edge-RP-02 –

Next, we need to create a storage policy and a tag inside of the vCenter where the Edge Cluster is located. If the provider is utilizing an existing storage policy that will be used for Edge Cluster consumption, one can skip this step. However, let’s assume this is a greenfield Edge Cluster deployment.

First, let’s create a new tag called “Edge Cluster”

Then we need to tag our datastore that resides on the Edge Clusters with this specific tag. For my lab environment, I am using “RegionA01-ISCSI01-COMP02” for Edges.

Now, let’s create a new storage policy, we will call this “Edge Storage Policy” –

We need to utilize a Rule Set that’s based on “Tag based placement” and utilizes the Storage category (this is what I utilized when creating the tag). From there, I selected my “Edge Cluster” tag –

On Storage compatibility screen, we can verify that it is reflecting my selected (tagged) datastore and we’re good to go from here –

Next, we need to prepare our Edge Clusters for NSX. I’m not going to walk through the steps required for this (installing VIBs, adding VTEP, etc.) however it is necessary that we add it to the respective Transport Zone that vCD consumes for cloud services. For my lab environment, I am using “Gold-PVDC-VXLAN-NP” for this configuration –

Final step to prepare for instantiation inside of vCD. We need to refresh the storage policies and network pools. Navigate to Network Pools -> right click and Sync while right clicking on the vCenter object and “Refresh Storage Policies” –

Creation of Edge Clusters in vCloud Director

Now we are ready to create our initial Edge Clusters inside of my vCD instance. As stated before, we support up to 10 Edge Clusters, but I will be adding two to my environment for availability purposes.

I will be utilizing Postman as it’s my preferred method to work with the API. Also note we will be utilizing the new “cloudapi” which requires a bearer token authentication configuration. If you need further guidance and an easy way to set this up, please check out my esteemed colleague Tom Fojta wrote on his tutorial.

Once we have our bearer token, ensure your version is set to 32 – this is required so we can work with the new networkProfile and EdgeCluster constructs.

Let’s check out what’s currently configured to the “edgeClusters” section –

GET https://vcd-url/cloudapi/1.0.0/edgeClusters

As we can see, there’s nothing configured currently with EdgeClusters to my vCD instance.

Let’s prepare for what we need to create the body of our post. We will need the following attributes:

  • Name – Edge Cluster name
  • Description – friendly descriptor of the Edge Cluster
  • Under the resourcePool frame, we need the following:
    • Moref – Managed Object Identifier
    • vcId – vCD’s identifier of the vCenter Server
  • storageProfileName – name of the storage profile inside of vCD (vCenter)

While Name, Description, and Storage Policy (or profile) are pretty straight forward, let’s figure out how we can get the object identifier and the vcId.

I utilized the Managed Object Browser of the vCenter to figure out the resource pool ID – remember, you want to find out the explicit ID of the created RP –

To ascertain the vcId, we will browse the vCD API and look at “vimServerReferences” – for my environment, I had a single vCenter server attached to this vCD instance.

GET https://vcd-url/api/admin/extension/vimServerReferences

The highlighted portion shows the exact ID required for the vcId portion – this starts at the “urn” prefix.

Okay, let’s go ahead and build the JSON body. Note that I am selecting raw and JSON as my application to successfully post this

POST https://vcd-url/cloudapi/1.0.0/edgeClusters

Once executed, one can check the status of the task inside of vCD.

If we do a GET on this location, we can see that the task was successful.

Now, if we do a GET on the edgeClusters location, we should see our first Edge Cluster. Excellent!

Now, I’m going to go ahead and build my POST body for the 2nd Edge Cluster.

It was a success…

Now, if we do a GET on the edgeClusters, we can see both Edge Clusters registered to vCD.

One can see that there’s an ID generated for each Edge Cluster. We will need this information for configuring each oVDC. Therefore, I created a notepad entry that depicts each of these values and what I intend to establish – Edge01 is my primary Edge Cluster while Edge02 is secondary.

Applying Edge Cluster Configuration to Tenant Organization VDC

Now we are ready to apply this newly created Edge Cluster configuration to one of my tenants in my vCD instance. In this example, I am going to be configuring my organization VDC “Public-OVDC” with this new Edge Cluster –

First, let’s verify that I am using the correct network pool – yes, I see “Gold-PDVC-VXLAN-NP” configured for this oVDC –

Back to Postman – now we need to browse to this specific oVDC so we can configure the networkProfile information.

GET https://vcd-url/api/admin/org/<org-id>

Search for “vdc” in the received body – we are looking for the HREF link so we can browse to that –

Doing a GET on this VDC and searching for “NetworkProfile” will provide us the link we need for manipulating the configuration –

GET https://vcd-url/api/admin/vdc/<vdc-id>

If we take this newly found HREF and do a get, we should see a clean configuration for the EdgeCluster –

Great! Now we are ready to build a body for a PUT operation. Again, I referenced my notepad entry so I can build out each respective ID –

Let’s check the status of the task…looks good…

Now, let’s do a GET on the networkProfile. We see the newly configured Edge Clusters!

Validation

Going back to my Public-OVDC, let’s go ahead and create a new Edge.

Going through the new H5 wizard for creation of an Edge –

Let’s confirm….

Now, let’s take a look at the vCenter and see what’s happening. As we can see, the Edge is being deployed to EDGE01 which is my configured primary. Great!

Now, deployment is complete, but I want to turn on Edge HA to ensure it deploys the secondary instance to EDGE02 cluster.

Deploying…

Operational!

Now, Public-OVDC has the ability to consume distinct Edge services between these two Edge Clusters.

Summary

In summary, the use of Edge Clusters provides distinct control of Edge placement while optimizing network traffic. I will more on this soon. Thanks!

-Daniel

Five Quick Tips on VMware vCloud Availability 3.0

I wanted to summarize a few things I’ve found when working with VMware vCloud Availability 3.0 (vCAv) over the past few months that will be helpful to providers and tenants. I’m sure there’s others, but these are the ones that come to mind.

Tunnel (Public API) Endpoint

This is a very important step for production deployments – setting the URL endpoint to ensure proper cloud access from tenants and other vCD instances.

The public API endpoint can be configured from the Cloud Replication Management (CRM) under Configuration:

Or can be configured directly on the tunnel:

When one configures it from the CRM, it does push this change to the tunnel appliance over port 8047 (internal communication port between CRM and the tunnel appliance).

Note that you will need a proper DNS or public IP address along with the DNAT port utilized (such as 443). Any configuration or reconfiguration requires a service restart on the CRM and Tunnel appliance –

Port Requirements

The above table summarizes all necessary ports for proper vCAv management. While this is ingress communication to the vCAv appliances, utilizing the combined appliance does present a different path for configuration. Below are the explicit ports for configuration of each role inside of a single appliance:

  • vApp Replication Manager + Portal – https://appliance-ip:8046/ui/admin
    • Provides the main interface for the cloud-to-cloud replication operations. It understands the vCloud Director level concepts and works with vApps and virtual machines.
  • Replication Manager – https://appliance-ip:8044/ui/admin
    • A management service operating on the vCenter Server level. It understands the vCenter Server level concepts for starting the replication workflow for the virtual machines.
  • Replicatorhttps://appliance-ip:8043/ui/admin
    • Exposes the low-level HBR primitives as REST APIs.
  • Tunnelhttps://appliance-ip:8047/ui/admin
    • Simplifies provider networking setup by channeling all incoming and outgoing traffic for a site through a single point

HBR Management

In my lab, I’ve rebuilt vCAv many times and in some scenarios, I’ve forgot that I had an active replication/protection on a VM. When re-enabling protection, one would receive an error stating “This VM or vApp is already replicated” and fails to protect.

This is due to the host-based replication process still enabled on this VM. To disable it, the provider will need to locate the VM and SSH to the ESXi host. From there, we can utilize the “vim-cmd” process and utilize the hbrsvc scope.

There’s two command you’ll need to know:

  • vim-cmd hbrsvc/getallvms
    • This will get the world ID of the VM’s so you can utilize this on the next command.
  • vim-cmd hbrsvc/vmreplica disable <VM_ID>
    • disables HBR on said VM

From there, one can successfully re-enable protection for this VM.

Plugin Management

There are two things I want to discuss:

  1. Plugin Visibility
  2. Changes to the vCAv Configuration and how it relates to the Availability Plugin

Plugin Visibility

When the provider configures vCAv for the first time and connects the vCD instance, it immediately makes an API call to push the Availability plugin. This is available to all by default –

You have two options on configuring visibility to the Availability plugin: 1) in vCD 9.7, utilize the Customize Portal plugin or 2) utilize the API calls to restrict access.

For 9.7 installs, it’s fairly easy -go to Customize Portal -> select vCloud Availability -> Publish and remove the selected tenants:

Pre 9.7, we will need to utilize the API to manage accessibility. My esteemed peer Chris Johnson did a writeup on managing access to vCAv 3.0 recently, but my older vCAv C2C 1.5 article also applies.

vCloud Availability tenants list

Changes to the vCAv Configuration

When the provider changes a vCAv service/system configuration, the plugin must also be updated with this new information. This is important as the change could have a new tunnel address.

This is very easy – all we need to do is re-register the vCD instance from vCAv. When we re-authenticate, vCAv will push the updated plugin to vCD.

Provider and Tenant Diagrams

The below are two diagrams that depict port communication for a provider and tenant environment. This is very helpful to understand what is required from an ingress and egress perspective –

Provider

Tenant – as discussed before, there is no need for a DNAT rule as all traffic originates from the on-premises tunnel to communicate to the provider vCAv environment.

That’s all for now.

-Daniel

Overview of VMware vCloud Availability 3.0 – Tenant Deployment, Protection Workflow, Resources

Once the provider site is operational, we are ready to bring the on-premises / Tenant site online for VMware vCloud Availability 3.0 (vCAv). Again, recap of the deployment steps:

  1. Deploy vCAv On-Premises Appliance
  2. Start configuration wizard
  3. Connect to vCAv Cloud Tunnel
  4. Configuration of local placement
  5. Validation and vSphere UI

Before we get started, let’s take a look at a port mapping diagram.

What’s interesting is one does not need a DNAT rule for tunnel traffic. The reason is any traffic is initiated from the on-prem site negating any ingress traffic (everything flows outbound), hence a standard SNAT (route) is sufficient. This is great as we do not need any network changes on the client side.

Deploy vCAv On-Premises Appliance

Deploying the appliance is very similar to the provider side. We have packaged up a standalone on-premises appliance that does not have the selection of the roles (and minimizes any client confusion). In the on-premises version, one does not have a dropdown of the service role selection, but just a acceptance and typical OVF deployment –

So again, very easy and similar to typical VMware OVF deployments.

Start Configuration Wizard

Let’s open a browser to https://onprem-fqdn/ui/admin and login –

You will be prompted to change the password to the appliance. From there, let’s hit the initial setup wizard –

Set your site name and any pertinent description. Click Next when complete.

As expected, we need to establish the lookup service along with SSO credentials.

On Cloud Details, this is where we pair with our vCloud/vCAv site. Configure the public API endpoint (for my lab, I am using 8048 but I showed earlier on utilizing 443) along with your organization administrative credentials.

Toggle the “Allow Access from Cloud” option if you want users from vCD to have the ability to browse and configure VMs locally from this site.

Accept (or remove) the CEIP and let’s take a look at the final completion screen –

Before hitting Finish, let’s toggle the “Configure local placement now” option to knock this out.

Local placement sets the vCenter/resource hierarchy for cloud to on-premises / failback protection.

Next, we will see a 5 step process for Local Placement – walk through the UI and select the hierarchy objects.

Validate and hit the Finish button.

Validation and vCenter UI

From our Cloud site, we can now see a new On-Prem Site and shows a status of OK.

Re-logging into the on-premises appliance, we can see the Manager and Cloud status as healthy also –

From the vSphere Client, we can also see vCloud Availability available –

Protection Workflow

We have two operations available: Protection and Migration. In the two below screenshots, our options change based on what button is selected.

One can establish incoming or outgoing replications between cloud or on-prem –

While I am not going to exhaustively go through every permutation, one can see how intuitive it is to protect or migrate workloads.

Protection from On-Premises

In my source site, I only have one choice as I select from On-Prem and have a single paired vCenter/Tenant site.

From here, I select the VMs I want to protect.

Select the Target oVDC –

If there’s a Seed VM available, select it.

Now I can specify my protection settings: my RPO, storage policy, retention policy for point in time instances, and if I want to quiesce/compress the instance and traffic.

Scheduling can be defined –

Finally, we get to see our validation.

Protection Settings – Viewing, Re-Addressing

Reviewing the current state, one can ascertain the health of the current protected workload with my source, destination, and RPO –

Clicking on the Networks button brings up our menu on what we want to do on Migrate/Failover or Test Failover –

This can be applied to all associated vApps or VMs, or explicitly broken down on per vNIC basis. One can also reset the MAC. Note that all of the same vCD guidelines apply – can’t set a manual IP outside of the CIDR block of that oVDC network, etc.

Clicking the sub-button Test Failover presents similar options, but one can copy from the Migrate/Failover menu to get started.

If we need to change the Replication Owner, we can click the Owner and select the new organization owner.

Migrate/Failover/Test Failover

Going through the Migrate, Failover, and Test Failover options are very intuitive.

For Migrate, we can select to power on the recovered vApps and apply the specific preconfigured network settings (or override that and select a specific network) –

For Failover, very similar to migrate, but we can drill into a Recovery Instance –

Lastly, Test Failover provides the ability to test a VM/workload without impacting production. This can be associated to a “bubble/fenced” network and tested by the application team to verify functionality.

Resources

As a final thought, I want to say how it’s been a pleasure working with the team to see this to fruition and public release. I believe this is going to be an extremely powerful platform and this is just the start.

After vCAv 3.0 is released, I will have more material along with many of my peers who will be discussing vCAv further. Below are some lightboard videos that introduces some of the concepts through these posts. Enjoy!

-Daniel

Overview of VMware vCloud Availability 3.0 – Provider Deployment

In this post, we will be reviewing the steps on setting up and operationalizing vCloud Availability 3.0 (vCAv) for a provider site.

There is a presumption that you will be deploying for production, so that is what I’ll be reviewing. The consolidated (combined) appliance would be an easier deployment, but still requires the below configurations post-deployment.

Recap of the Provider steps:

  1. Deployment of Cloud Replication Management (CRM) Instance
    1. Initial Replication Management Setup
    2. Initial Setup Wizard
  2. Deploy vCAv Replicator(s)
  3. Deploy vCAv Tunnel
  4. Configuration of CRM instance and start of site wizard
  5. Configuration of Replicator
    1. Pairing Replicator with Replication Manager
  6. Configuration of Tunnel
  7. Validation

Prerequisites

  1. Available DNS and NTP server
  2. SSO Lookup Service Address
  3. Routing and Firewall Ports in place – see below for further insight
  4. vCenter and vCD on interoperability matrix
  5. Certificate Management – all certificates can be managed via the UI utilizing PKCS#12 certificates. Services must be restarted post-import.

Provider Port Mapping

Below is a diagram my esteemed peer, Chris Johnson, worked up for our upcoming EMPOWER presentation.

Takeaways:

  1. Establishing a DNAT rule from 443 to 8048 is crucial for tunnel connectivity. This also has to be set as the API endpoint and will be pushed from the CRM instance.
  2. Ensure we can route and have direct port access between payload/resource vCenters, replicators, and Cloud Management.

Deployment of Cloud Replication Management (CRM) Instance

All of the roles we deploy for the provider will be coming from a single OVF – this is very similar to other VMware based virtual appliances. However, during the OVF deployment process, you will be prompted for the below role selection. For deployment of CRM, select Cloud Replication Management.

Initial Replication Manager Setup

Wait a few moments post-power on for initial configuration to take place, then open a browser to https://crm-fqdn:8441 so we can set the initial lookup service configuration.

We will be prompted for changing the default password. Note this is the same process for any newly deployed vCAv appliance and must be done on initial login –

From our initial screen, we can see that we have two issues: 1) missing Lookup Service settings and 2) Configured Replicators – there is none. The latter is fine for now, we will pair the replicator once we are done with the site wizard.

Let’s go over to Configuration and set the lookup service –

Accept the certificate…

As discussed prior, we will not see any replicators right now and will come back at a later time.

Initial Setup Wizard

Open a new tab to https://crm-fqdn/ui/admin and log in with your root account.

From here, we can see a link to run the initial setup wizard –

This is a very simple wizard that brings us through the site setup. From the beginning, we need to set a sitename. Note that you cannot utilize spaces and it is case-sensitive.

Second, set your public API endpoint address. Note this is where the traffic will ingress in from your tunnel node. In my lab environment, I will be directly connecting over 8048 (compared to traditional perimeter environment that would utilize 443 and DNAT rule to forward that traffic).

Here’s what I would if that was the case.

Next, lookup service address. You’ll be setting this quite a bit. 🙂

vCD configuration – note that you must include /api after the vCD FQDN. Also, during this initial setup, vCAv will take care of publishing the Availability plugin to your vCD instance. On boot of the CRM vCAv appliance (or during any upgrade), the plugin will refresh or push an update if required – very nice.

Apply your vCAv license key –

Consent or remove the check for the VMware Customer Experience Improvement Program (CEIP) –

Finally, we review our desired state. Verify everything looks to your specification, and hit complete.

This will take a few moments for the configuration. You will be prompted to log back in and you will be brought to the vApp Replication Manager Admin UI page. You can now utilize vCD administrative credentials too!

Let’s click on the Configuration link on the left side. As we can see, we still have some work to do for the Replicator and Tunnel configuration.

Deploy vCAv Replicator(s)

Next up, let’s configure the Replicator instance. Repeat this process for every required Replicator needed for your environment.

Open a tab to https://replicator-fqdn/ui/admin

After setting your password, you will be prompted to set the lookup service address –

That’s it for the replicator! Now, we are ready to pair this replicator with the Replication Manager.

Pairing Replicator with Replication Manager

Open your tab to https://crm-fqdn:8441 and browse to Replicators on the left side –

Let’s click the New button and open up the wizard –

We need to provide the fully qualified domain name along with port 8043 (this is what’s utilized for the Replication Manager to Replicator API connectivity) along with the appliance password and SSO administrator credentials.

Once paired, we will see it in the list. Repeat this process for any additional replicators.

Now, from the CRM Provider UI, we can see a newly added Replicator instance. Next up, Tunnel configuration.

Configuration of Tunnel

Final configuration – let’s configure the tunnel for inbound and outbound connectivity. Browse to https://tunnel-fqdn/ui/admin and login –

Once you set your password, you will be prompted to set two things: 1) lookup service address and 2) Public API endpoint

As discussed before, the public API endpoint will be based off of your network topology. For my lab, I am using direct 8048 access. However, if I was going to DNAT from a public IP/FQDN utilizing 443, I would have the following –

Once completed, we will see the two fields completed.

Let’s hop over to the CRM Provider UI configuration and configure the tunnel –

From here, we need to establish CRM to Tunnel API communication, which happens on port 8047 –

Type in the appliance password. Once applied, we will see a tunnel configuration (again, I was using 443 for a period of time, but you will see 8048 for future configurations).

After any port changes, we recommend doing a service restart. This can be achieved by going to System Monitoring and clicking Restart Service –

Validation

After a site deployment and configuration, I always walk through to see service health.

From the main provider UI page, I can see overall system health –

From my System Monitoring page, we can see everything is green and I see my Tunnel and associated Replicators –

From vCloud Director, my plugin is also available too for self-service management.

Final Thoughts:

  1. As depicted above, deployment is rather straight forward and pretty seamless.
  2. Site Deployment must be done on a per-vCD instance basis. So if you have four sites, expect to do this four times.

Next up, Tenant/On-Premises setup.

-Daniel