Five Quick Tips on VMware vCloud Availability 3.0

I wanted to summarize a few things I’ve found when working with VMware vCloud Availability 3.0 (vCAv) over the past few months that will be helpful to providers and tenants. I’m sure there’s others, but these are the ones that come to mind.

Tunnel (Public API) Endpoint

This is a very important step for production deployments – setting the URL endpoint to ensure proper cloud access from tenants and other vCD instances.

The public API endpoint can be configured from the Cloud Replication Management (CRM) under Configuration:

Or can be configured directly on the tunnel:

When one configures it from the CRM, it does push this change to the tunnel appliance over port 8047 (internal communication port between CRM and the tunnel appliance).

Note that you will need a proper DNS or public IP address along with the DNAT port utilized (such as 443). Any configuration or reconfiguration requires a service restart on the CRM and Tunnel appliance –

Port Requirements

The above table summarizes all necessary ports for proper vCAv management. While this is ingress communication to the vCAv appliances, utilizing the combined appliance does present a different path for configuration. Below are the explicit ports for configuration of each role inside of a single appliance:

  • vApp Replication Manager + Portal – https://appliance-ip:8046/ui/admin
    • Provides the main interface for the cloud-to-cloud replication operations. It understands the vCloud Director level concepts and works with vApps and virtual machines.
  • Replication Manager – https://appliance-ip:8044/ui/admin
    • A management service operating on the vCenter Server level. It understands the vCenter Server level concepts for starting the replication workflow for the virtual machines.
  • Replicatorhttps://appliance-ip:8043/ui/admin
    • Exposes the low-level HBR primitives as REST APIs.
  • Tunnelhttps://appliance-ip:8047/ui/admin
    • Simplifies provider networking setup by channeling all incoming and outgoing traffic for a site through a single point

HBR Management

In my lab, I’ve rebuilt vCAv many times and in some scenarios, I’ve forgot that I had an active replication/protection on a VM. When re-enabling protection, one would receive an error stating “This VM or vApp is already replicated” and fails to protect.

This is due to the host-based replication process still enabled on this VM. To disable it, the provider will need to locate the VM and SSH to the ESXi host. From there, we can utilize the “vim-cmd” process and utilize the hbrsvc scope.

There’s two command you’ll need to know:

  • vim-cmd hbrsvc/getallvms
    • This will get the world ID of the VM’s so you can utilize this on the next command.
  • vim-cmd hbrsvc/vmreplica disable <VM_ID>
    • disables HBR on said VM

From there, one can successfully re-enable protection for this VM.

Plugin Management

There are two things I want to discuss:

  1. Plugin Visibility
  2. Changes to the vCAv Configuration and how it relates to the Availability Plugin

Plugin Visibility

When the provider configures vCAv for the first time and connects the vCD instance, it immediately makes an API call to push the Availability plugin. This is available to all by default –

You have two options on configuring visibility to the Availability plugin: 1) in vCD 9.7, utilize the Customize Portal plugin or 2) utilize the API calls to restrict access.

For 9.7 installs, it’s fairly easy -go to Customize Portal -> select vCloud Availability -> Publish and remove the selected tenants:

Pre 9.7, we will need to utilize the API to manage accessibility. My esteemed peer Chris Johnson did a writeup on managing access to vCAv 3.0 recently, but my older vCAv C2C 1.5 article also applies.

vCloud Availability tenants list

Changes to the vCAv Configuration

When the provider changes a vCAv service/system configuration, the plugin must also be updated with this new information. This is important as the change could have a new tunnel address.

This is very easy – all we need to do is re-register the vCD instance from vCAv. When we re-authenticate, vCAv will push the updated plugin to vCD.

Provider and Tenant Diagrams

The below are two diagrams that depict port communication for a provider and tenant environment. This is very helpful to understand what is required from an ingress and egress perspective –

Provider

Tenant – as discussed before, there is no need for a DNAT rule as all traffic originates from the on-premises tunnel to communicate to the provider vCAv environment.

That’s all for now.

-Daniel

Removing VMware vCloud Availability 3.0 Plugin from vCenter

Recently, I had this come up where I had to remove the vCloud Availability 3.0 (vCAv) plugin from my lab vCenter. Today, there is not a way to do this through the vCAv on-premises appliance UI – it must be done directly on the vCenter. Therefore, after speaking with a colleague (Bill Leck), I received the steps on removing it from the vCenter instance.

The following steps will work with vSphere 6.5x and 6.7U1. With 6.7U2, you can skip step 2 – thanks Vladimir Velikov.

Here are the high-level steps:

  1. SSH to the vCenter
  2. Remove packages
  3. Remove endpoints from the lookup service
  4. Restart vCenter UI services

This is a very easy and straight forward process. I’ve documented the step by step directions below.

First, I see my vCloud Availability plugin on vCenter instance –

Step 1

Let’s SSH to my vCenter…

Step 2

UI packages are under the /etc/vmware/vsphere-ui/cm-service-packages folder. We need to remove the specific vCAv packages from this folder. Below is the two packages to remove:

rm
-rf
/etc/vmware/vsphere-ui/cm-service-packages/com.vmware.cis.vsphereclient.plugin/com.vmware.h4.vsphere.client-3.0.0
/etc/vmware/vsphere-ui/cm-service-packages/com.vmware.cis.vsphereclient.plugin/com.vmware.h4.ngc.client-3.0.0

Removed….

Step 3

Next, we need to remove the vCAv entity from the lookupservice SDK. First, we need to get the ID of the vCAV endpoint, and then unregister it. Below are the two commands we will utilize. Note the space between the URL and the 2>/dev/null.

/usr/lib/vmidentity/tools/scripts/lstool.py list --ep-type com.vmware.vcav.endpoint --url http://localhost:7080/lookupservice/sdk 2>/dev/null

/usr/lib/vmidentity/tools/scripts/lstool.py unregister  --url http://localhost:7080/lookupservice/sdk --user ‘<SSO User>’ --password ‘<SSO User password>’ --id <ID of vCAv service identified by the above command> 2>/dev/null

In my environment, we can see the following when I run the first command:

I’ve also highlighted the service ID as we will need that for the next command.

Now, inputting in the second command and copying the service ID, it successfully removes the endpoint –

Last of all, when attempting to hit the lookup service for it, nothing is listed for ‘vcav’ anymore –

Step 4

Last of all, we will want to restart the vSphere UI services. 6.5 and 6.7 operate a little differently, so the syntax is listed below.

vSphere 6.5 - execute “service-control --stop vsphere-ui”, followed by “service-control --start vsphere-ui”
vSphere 6.7 - execute  “vmon-cli -r vsphere-ui”

I am running 6.5, so let me go ahead and stop and start the UI services.

Verification

Logging into my vCenter instance, the vCAv plugin is now removed from the Menu and shortcuts window.

Very easy process on removal. If you need to re-install the plugin, please do so through the vCAv on-premises appliance registration. Big thanks to Bill Leck for his guidance.

-Daniel

Overview of VMware vCloud Availability 3.0 – Tenant Deployment, Protection Workflow, Resources

Once the provider site is operational, we are ready to bring the on-premises / Tenant site online for VMware vCloud Availability 3.0 (vCAv). Again, recap of the deployment steps:

  1. Deploy vCAv On-Premises Appliance
  2. Start configuration wizard
  3. Connect to vCAv Cloud Tunnel
  4. Configuration of local placement
  5. Validation and vSphere UI

Before we get started, let’s take a look at a port mapping diagram.

What’s interesting is one does not need a DNAT rule for tunnel traffic. The reason is any traffic is initiated from the on-prem site negating any ingress traffic (everything flows outbound), hence a standard SNAT (route) is sufficient. This is great as we do not need any network changes on the client side.

Deploy vCAv On-Premises Appliance

Deploying the appliance is very similar to the provider side. We have packaged up a standalone on-premises appliance that does not have the selection of the roles (and minimizes any client confusion). In the on-premises version, one does not have a dropdown of the service role selection, but just a acceptance and typical OVF deployment –

So again, very easy and similar to typical VMware OVF deployments.

Start Configuration Wizard

Let’s open a browser to https://onprem-fqdn/ui/admin and login –

You will be prompted to change the password to the appliance. From there, let’s hit the initial setup wizard –

Set your site name and any pertinent description. Click Next when complete.

As expected, we need to establish the lookup service along with SSO credentials.

On Cloud Details, this is where we pair with our vCloud/vCAv site. Configure the public API endpoint (for my lab, I am using 8048 but I showed earlier on utilizing 443) along with your organization administrative credentials.

Toggle the “Allow Access from Cloud” option if you want users from vCD to have the ability to browse and configure VMs locally from this site.

Accept (or remove) the CEIP and let’s take a look at the final completion screen –

Before hitting Finish, let’s toggle the “Configure local placement now” option to knock this out.

Local placement sets the vCenter/resource hierarchy for cloud to on-premises / failback protection.

Next, we will see a 5 step process for Local Placement – walk through the UI and select the hierarchy objects.

Validate and hit the Finish button.

Validation and vCenter UI

From our Cloud site, we can now see a new On-Prem Site and shows a status of OK.

Re-logging into the on-premises appliance, we can see the Manager and Cloud status as healthy also –

From the vSphere Client, we can also see vCloud Availability available –

Protection Workflow

We have two operations available: Protection and Migration. In the two below screenshots, our options change based on what button is selected.

One can establish incoming or outgoing replications between cloud or on-prem –

While I am not going to exhaustively go through every permutation, one can see how intuitive it is to protect or migrate workloads.

Protection from On-Premises

In my source site, I only have one choice as I select from On-Prem and have a single paired vCenter/Tenant site.

From here, I select the VMs I want to protect.

Select the Target oVDC –

If there’s a Seed VM available, select it.

Now I can specify my protection settings: my RPO, storage policy, retention policy for point in time instances, and if I want to quiesce/compress the instance and traffic.

Scheduling can be defined –

Finally, we get to see our validation.

Protection Settings – Viewing, Re-Addressing

Reviewing the current state, one can ascertain the health of the current protected workload with my source, destination, and RPO –

Clicking on the Networks button brings up our menu on what we want to do on Migrate/Failover or Test Failover –

This can be applied to all associated vApps or VMs, or explicitly broken down on per vNIC basis. One can also reset the MAC. Note that all of the same vCD guidelines apply – can’t set a manual IP outside of the CIDR block of that oVDC network, etc.

Clicking the sub-button Test Failover presents similar options, but one can copy from the Migrate/Failover menu to get started.

If we need to change the Replication Owner, we can click the Owner and select the new organization owner.

Migrate/Failover/Test Failover

Going through the Migrate, Failover, and Test Failover options are very intuitive.

For Migrate, we can select to power on the recovered vApps and apply the specific preconfigured network settings (or override that and select a specific network) –

For Failover, very similar to migrate, but we can drill into a Recovery Instance –

Lastly, Test Failover provides the ability to test a VM/workload without impacting production. This can be associated to a “bubble/fenced” network and tested by the application team to verify functionality.

Resources

As a final thought, I want to say how it’s been a pleasure working with the team to see this to fruition and public release. I believe this is going to be an extremely powerful platform and this is just the start.

After vCAv 3.0 is released, I will have more material along with many of my peers who will be discussing vCAv further. Below are some lightboard videos that introduces some of the concepts through these posts. Enjoy!

-Daniel

Overview of VMware vCloud Availability 3.0 – Provider Deployment

In this post, we will be reviewing the steps on setting up and operationalizing vCloud Availability 3.0 (vCAv) for a provider site.

There is a presumption that you will be deploying for production, so that is what I’ll be reviewing. The consolidated (combined) appliance would be an easier deployment, but still requires the below configurations post-deployment.

Recap of the Provider steps:

  1. Deployment of Cloud Replication Management (CRM) Instance
    1. Initial Replication Management Setup
    2. Initial Setup Wizard
  2. Deploy vCAv Replicator(s)
  3. Deploy vCAv Tunnel
  4. Configuration of CRM instance and start of site wizard
  5. Configuration of Replicator
    1. Pairing Replicator with Replication Manager
  6. Configuration of Tunnel
  7. Validation

Prerequisites

  1. Available DNS and NTP server
  2. SSO Lookup Service Address
  3. Routing and Firewall Ports in place – see below for further insight
  4. vCenter and vCD on interoperability matrix
  5. Certificate Management – all certificates can be managed via the UI utilizing PKCS#12 certificates. Services must be restarted post-import.

Provider Port Mapping

Below is a diagram my esteemed peer, Chris Johnson, worked up for our upcoming EMPOWER presentation.

Takeaways:

  1. Establishing a DNAT rule from 443 to 8048 is crucial for tunnel connectivity. This also has to be set as the API endpoint and will be pushed from the CRM instance.
  2. Ensure we can route and have direct port access between payload/resource vCenters, replicators, and Cloud Management.

Deployment of Cloud Replication Management (CRM) Instance

All of the roles we deploy for the provider will be coming from a single OVF – this is very similar to other VMware based virtual appliances. However, during the OVF deployment process, you will be prompted for the below role selection. For deployment of CRM, select Cloud Replication Management.

Initial Replication Manager Setup

Wait a few moments post-power on for initial configuration to take place, then open a browser to https://crm-fqdn:8441 so we can set the initial lookup service configuration.

We will be prompted for changing the default password. Note this is the same process for any newly deployed vCAv appliance and must be done on initial login –

From our initial screen, we can see that we have two issues: 1) missing Lookup Service settings and 2) Configured Replicators – there is none. The latter is fine for now, we will pair the replicator once we are done with the site wizard.

Let’s go over to Configuration and set the lookup service –

Accept the certificate…

As discussed prior, we will not see any replicators right now and will come back at a later time.

Initial Setup Wizard

Open a new tab to https://crm-fqdn/ui/admin and log in with your root account.

From here, we can see a link to run the initial setup wizard –

This is a very simple wizard that brings us through the site setup. From the beginning, we need to set a sitename. Note that you cannot utilize spaces and it is case-sensitive.

Second, set your public API endpoint address. Note this is where the traffic will ingress in from your tunnel node. In my lab environment, I will be directly connecting over 8048 (compared to traditional perimeter environment that would utilize 443 and DNAT rule to forward that traffic).

Here’s what I would if that was the case.

Next, lookup service address. You’ll be setting this quite a bit. 🙂

vCD configuration – note that you must include /api after the vCD FQDN. Also, during this initial setup, vCAv will take care of publishing the Availability plugin to your vCD instance. On boot of the CRM vCAv appliance (or during any upgrade), the plugin will refresh or push an update if required – very nice.

Apply your vCAv license key –

Consent or remove the check for the VMware Customer Experience Improvement Program (CEIP) –

Finally, we review our desired state. Verify everything looks to your specification, and hit complete.

This will take a few moments for the configuration. You will be prompted to log back in and you will be brought to the vApp Replication Manager Admin UI page. You can now utilize vCD administrative credentials too!

Let’s click on the Configuration link on the left side. As we can see, we still have some work to do for the Replicator and Tunnel configuration.

Deploy vCAv Replicator(s)

Next up, let’s configure the Replicator instance. Repeat this process for every required Replicator needed for your environment.

Open a tab to https://replicator-fqdn/ui/admin

After setting your password, you will be prompted to set the lookup service address –

That’s it for the replicator! Now, we are ready to pair this replicator with the Replication Manager.

Pairing Replicator with Replication Manager

Open your tab to https://crm-fqdn:8441 and browse to Replicators on the left side –

Let’s click the New button and open up the wizard –

We need to provide the fully qualified domain name along with port 8043 (this is what’s utilized for the Replication Manager to Replicator API connectivity) along with the appliance password and SSO administrator credentials.

Once paired, we will see it in the list. Repeat this process for any additional replicators.

Now, from the CRM Provider UI, we can see a newly added Replicator instance. Next up, Tunnel configuration.

Configuration of Tunnel

Final configuration – let’s configure the tunnel for inbound and outbound connectivity. Browse to https://tunnel-fqdn/ui/admin and login –

Once you set your password, you will be prompted to set two things: 1) lookup service address and 2) Public API endpoint

As discussed before, the public API endpoint will be based off of your network topology. For my lab, I am using direct 8048 access. However, if I was going to DNAT from a public IP/FQDN utilizing 443, I would have the following –

Once completed, we will see the two fields completed.

Let’s hop over to the CRM Provider UI configuration and configure the tunnel –

From here, we need to establish CRM to Tunnel API communication, which happens on port 8047 –

Type in the appliance password. Once applied, we will see a tunnel configuration (again, I was using 443 for a period of time, but you will see 8048 for future configurations).

After any port changes, we recommend doing a service restart. This can be achieved by going to System Monitoring and clicking Restart Service –

Validation

After a site deployment and configuration, I always walk through to see service health.

From the main provider UI page, I can see overall system health –

From my System Monitoring page, we can see everything is green and I see my Tunnel and associated Replicators –

From vCloud Director, my plugin is also available too for self-service management.

Final Thoughts:

  1. As depicted above, deployment is rather straight forward and pretty seamless.
  2. Site Deployment must be done on a per-vCD instance basis. So if you have four sites, expect to do this four times.

Next up, Tenant/On-Premises setup.

-Daniel