VMware vCloud Availability for Cloud-to-Cloud DR – Installation and Setup

As discussed in my introduction post about vCloud Availability for Cloud-to-Cloud DR (vCAv-C2C), I am going to do a high-level write-up of my installation/deployment process for this new vCD site-to-site replication and migration solution.

While this write-up does cover the deployment methodology, a production state will need to be rationalized inclusive of certificates, ports, and firewall rules that will be required.

Below are the high-level steps I took for deployment. VMware’s documentation does go in a different order, but this is the way I rationalized it along with a combined role (all-in-one appliance).

High-Level Steps for Installation and Setup:

  1. Download OVF and deploy at each respective site
  2. Configure vCAv-C2C Replication Manager
  3. Configure vCAv-C2C Replicator
  4. Pair Replication Manager with Replicator
  5. Configure vCAv-C2C Replication Service/Manager
  6. Configure vCAv-C2C Availability Portal

Again, my high-level architecture –

vCenter Deployment

  1. Let’s grab the OVF (remember, we only need one for vCAv-C2C), and deploy it to my management cluster – 
  2.  We can see it’s the new Cloud to Cloud appliance – 
  3. As we discussed before, the team has done a great job of simplifying the deployment of vCAv-C2C to a simple and single OVF for all of the roles required for Cloud to Cloud. We can see the drop-down for each respective role. For my deployment, we will be selecting a combined since this is a lab environment. 
  4. We now have to put in the required network configuration while enabling SSH (this is not a mandated requirement, but nice to have in my lab environment). One thing to note – the root password is temporary. When we get into the initial portal configuration, it will prompt us to change it to a new password from a security perspective. 
  5. Final screen to complete the deployment – 
  6. Alright, initial appliance deployment is done! Now off to the configuration.

Configure vCAv-C2C Replication Manager

  1. For the initial configuration, we will want to open a browser to “https://Appliance-IP-address:8044” or in my case, https://vcav-repmgr-01a.corp.local:8044 – 
  2. Let’s click on the Configuration Portal link and now set the new password after a successful login with the password we set in the OVF deployment – 
  3. We can see we have a red x where the lookup service is missing. This is for us to point to the resource vCenter (remember, resource and management vCenters must be in the same SSO domain at each site). 
  4. Let’s go to Configuration -> Set lookup service and put in the lookup service FQDN and we should see a successful message. 
  5. Back to the diagnostics tab, we can now see the lookup service is green!
  6. Now, the next step is correctly set up the replication instance and then come back to the Replication Manager for pairing.

Configure vCAv-C2C Replicator

  1. Open a browser to “https://Appliance-IP-address:8043” or in my case, https://vcav-repmgr-01a.corp.local:8043 – 
  2. Change password – 
  3. Set lookup service – again, I am using a consolidated vCenter instance for resource and management so this is pretty straightforward – 
  4. Success! 
  5. Now, back to the Replication Manager for pairing.

Pair Replication Manager with Replicator

  1. On the Replication Manager, let’s go to Replicators -> New replicator while giving the full FQDN along with the password (remember, new password!) and SSO credentials –
  2. Accept the cert and we should see the success message shortly.
  3. Now, let’s click on Show all managers and we can see the replication instance is now registered – 
  4. While from the replication instance, we can see the replication manager also configured – 
  5. Excellent! Now, off to the Replication Service Manager

Configure vCAv-C2C Replication Service/Manager

  1. Open a browser to “https://Appliance-IP-address:8046” or in my case, https://vcav-repmgr-01a.corp.local:8046 – this time, I used the new password as it seems to have propagated to the other roles. 
  2. Now, we get a nice clean wizard for the setup. 
  3. Let’s put in the Site information, this is going to be my SiteA. 
  4. Next, lookup service again with accepting the certificate – 
  5. Now, we are going to point it to the Replication Manager we previously setup. This is on port 8044, so let’s put that full FQDN along with port 8044 – 
  6. Now, we get to setup vCloud Director. I used the manual configuration here to verify everything was good to go. 
  7. Finally, we get a summary screen to show the stated configuration – 
  8. Excellent! All green and expanding out the Manager data shows our registered replication instance too. Very slick interface. 
  9. Now, off to the Availability Portal configuration as the last setup step.

Configure vCAv-C2C Availability Portal

  1. Home stretch for the first site installation and configuration – let’s open a browser to “https://Appliance-IP-address:5480” or in my case, https://vcav-repmgr-01a.corp.local:5480 
  2. I am prompted for vApp Replication Manager / vCD Connection information. By default, I did see a “” but decided to change it to the FQDN of my combined instance. Put in the vCD credentials (administrator@system) and hit connect to get a successful message – 
  3. Now, click Test to verify everything is operational from a vCD perspective – 
  4. We will be using the defaults for the database, but you are prompted to select a custom database if you so desire – 
  5. Alright, final step. Here’s where we can change the port for user access (remember, this is what will be publicly facing) along with the certificate. I am staying with the defaults for my installation. 
  6. Now, we are ready to hit the Start Service button. Running…
  7. Success! 
  8. Now, we get a nice, simple portal that’s used to verify all of our services are operational 

Wow, a very streamlined process that was very intuitive while using a sleek and simple interface.

I am going to check to verify my org user can log into the portal…

Success! Our vCD authentication was passed through and now I get the vCAv C2C interface. 

Before I can have the paired sites and start testing workload migration, I need to go set up my SiteB vCloud Director instance and deploy the combined appliance. On the next blog post, we will go through the Site pairing process and do a migration/replication.




What’s VMware vCloud Availability for Cloud-to-Cloud DR?

As of yesterday (May 17th, 2018), VMware announced the release of VMware vCloud Availability for Cloud-to-Cloud DR 1.0 – release notes here while my esteemed colleague, Tom Fojta, announced this on Twitter:

Many of you may be wondering, what is vCloud Availability for Cloud-to-Cloud and how may I use this in the VMware Cloud Provider Program?

To start off, vCloud Availability for Cloud-to-Cloud (vCAv-C2C) is VMware’s solution to vCloud Director instance to instance disaster recovery and migration. Here’s a nice summary of what vCAv-C2C provides:

  1. Replicate and recover vApps (VMs) between two vCD instances for migration, DR, and planned migration use cases.
  2. Complete self-serviceability for the provider and tenant administrator. A unified HTML5 portal that will be utilized alongside vCloud Director. Replication, migration, and failover can be managed completely by the tenant or provided as a managed service by the provider.
  3. A simplified and streamlined architecture to support vCloud Director 8.20, 9.0, and 9.1 while supporting vSphere 6.0U3 and 6.5U1.

vCloud Availability for Cloud to Cloud DR Installation Documentation

In my opinion, point #3 is one of the most critical benefits to both providers and tenants. When we discuss multi-tenant architecture, this does tend to add layers of complexity, but the VMware Cloud Service Provider Business Unit has done a great job of rationalizing the architecture and streamlining it for vCAv-C2C and future solutions. I will get to the architecture shortly.

Before we get into the details of vCAv-C2C, many of you have experienced our other migration or disaster recovery-based solutions. I made this simple chart to showcase each of our current VMware Cloud Provider (CSPBU) solutions and how they complement one another:

As you can see, vCAv-C2C will complement the traditional vCAv solution while vCD Extender can still be used for on-prem tenant migrations to a vCD instance. vCAv-C2C fills a void on migration between vCD instances, which is a much-needed capability for our Providers.

So let’s talk about the high-level architecture. As I mentioned before, a lot of thought and development went into vCAv-C2C to make the architecture simplified and seamless. With vCAv-C2C, everything is packaged into a simple OVA deployment – no need to manually/CLI configure a vCAv deployment anymore. I was fortunate enough to be part of the alpha testing team (along with Fojta and my other peer Fernando Escobar) and was very pleased with this capability – ease of deployment and configuration is something that is required for many of our Providers.

Furthermore, this single OVA has every role required for vCAv-C2C. Per the documentation, we have a few roles:

  1. Replication Manager
  2. Replicator Node (Large Replicator role available too)
  3. Tunnel Node

Best of all, there’s a Combined role now that can be utilized for smaller or proof of concept (PoC) deployments. This is what I’ll be using in my lab environment.

Let’s talk about a high-level architecture –

As you can see, this is an appliance-based architecture that will protect (or migrate) vApps between site to site. Moreover, we can simplify this for PoC/small deployments by using a combined vCAv-C2C appliance –

Cloud to Cloud tunneling is utilized if you are going over a public internet connection and do not have private (VPN or Direct Connect) connectivity between the two vCD instances. VMware’s documentation writeup is here along with a nice drawing that depicts the DNAT and port requirements.

As for scale and concurrency guidelines, the team did a great job with support a significant amount of replications/migrations. From the release notes –

  • Scale Limits
    • 300 active protections for a single tenant
    • 300 active protections using a single large vCloud Availability Replicator. For more information about the replicator types, see Deploy vCloud Availability for Cloud-to-Cloud DR Services by Using the vSphere Web Client.
    • 1300 active protections across 20 tenants
    • 20 tenants with active replications
    • 7 active vCloud Availability Replicator instances
    • Up to 2 TB size of protected workloads
  • Concurrency Limits
    • 60 concurrent Protect, Test Failover, Reverse Protect, Test Failback, and Failback operations
    • 110 Concurrent Failover operations

If you’re a provider, you might be wondering how do I download the bits so I can start testing it?! Well, reach out to your respective VMware Cloud Provider field team as this is going to be an initial release and we want to work with our providers on ensuring all vCAv-C2C requirements are met for a successful deployment. You can also reach out to me directly and I’ll be happy to put you in touch with your respective team.

Up next – high-level installation instructions for vCAv-C2C!


VMware Lightboard – Intro to vCloud Director Extender

This was my first VMware Lightboard and was a lot of fun! However, it was definitely more challenging than I anticipated. I started off twice and ended within 20 seconds each time because I didn’t like my initial dialog. However, what you see below is my third time. I think I did okay for the first lightboard.

We are definitely getting a lot of questions about the newest release of VMware vCloud Director and vCloud Director Extender. This lightboard should serve as an introduction to vCD Extender and what it can do for providers and consumers of vCloud environments.

I look forward to my next lightboard video! Oh, and there’s been a new release of vCloud Director Extender with some new editions – check it out here! 


A deeper look at vCloud Director and NSX Distributed Logical Router and Firewall Services with Usage Meter feature detection

Need to work on my titles, but that’s what I have for now. I want to spend some time reviewing how NSX Distributed, Edge, and DLR Firewall services behave in respect to vCloud Director and vCloud Usage Meter.

Some may know vCloud Usage Meter provides automated feature detection on a per-VM basis – essentially allowing granular-level billing for tenants. For example, tenant ACME may have 5 VM’s that need Distributed Firewall out of 20 – why charge them for all if they aren’t using it? Since version 3.6 of Usage Meter, this feature has been applied on a per VM level, for the most part.

There are three different versions of NSX available in the Cloud Provider Program –

We can see that your basic fundamentals are available in SP Base: your distributed switching and routing, Edge Firewall services, NAT’ing, etc. Advanced adds in Distributed Firewalling, Service Insertion and other advanced functionality while Enterprise is adding in X-vCenter NSX, HW VTEP integration.

Second, let’s take a look at the chart for NSX Editions by Feature. This comes from our vCloud Usage Meter Feature Detection whitepaper. Don’t have it? Get on VMware Partner Central and grab a copy.

Take note at the statement of “All VMs on all networks serviced by the edge” – this means even if you have a VM on that edge that does not use that feature, it will charge for it since it still has access to it. For example, if you set up a L2VPN on an Edge, be aware that any other connected networks will be charged for NSX Enterprise even though they may not be able to traverse to that tunnel!

Lab Setup

I wanted to design a DLR environment within vCD along with using Edge and Distributed Firewall rules. I came up the following design in my 9.x environment:

  1. Creating an NSX Distributed Logical Router (DLR) is pretty easy. Select the Advanced Gateway box and check Enable Distributed Routing. You can also enable Distributed Routing on any existing Advanced Edge – 
  2. The great thing about how vCloud Director creates Distributed Routing is it’s actually two Edges – a standard perimeter edge along with the DLR southbound. The transit interface is created as a /30 while applying static routes for any connected DLR interface. DLR’s default gateway is the Edge transit interface. Below is what I can see inside of NSX. 

Using the NSX Distributed Logical Router is in the NSX Base bundle and covered by Advanced SP Bundle, which is great for providers. However, Edge Firewall services are only available in the NSX Base bundle, where some providers may want to provide “high-powered” firewalling services.

The key difference between Edge and Distributed Firewall services is we do not hairpin for forwarding decisions in the DFW model: the decision is made inside of the hypervisor before the packet ever egresses anywhere. Hence, provides better throughput and capability compared to Edge or traditional firewall services – check out Wissam’s discussion on DFW capabilities here. 

vCD Access to Edge Firewall vs Distributed Firewall

I think Tomas did a great job of summarizing vCD Firewall capabilities and pointing out some of the architecture differences between the two firewall technologies.

From a vCD UI access point of view, you have to access the firewall services in two different places.

  1. Edge Firewall services are accessed by right-clicking on the Edge Gateway and selecting Edge Services. Makes sense, right? 
  2. Now Distributed Firewall management is accessed by right-clicking on the organization VDC and clicking on Manage Firewall. Not my favorite and doesn’t state DFW, but this is how it’s accessed today in the Flex UI. 

NSX Distributed Firewall Changes

Okay, let’s get into a few scenarios and test out how Usage Meter will put up the changes.

  1. I freshly created my three vApps: T1-App-1, T1-DB-01, T1-DLR-tclinux-01/02a (Web Servers). We can run a VM History Report and pick them up and see they all have been associated with the Advanced Bundle initially. I will try to point out the vROps and NSX column in the following discussion, but we will be looking for a checkmark and an “A” that refers to Advanced usage. 
  2. So let’s go ahead and crack open the DFW and start carving up some rules. This is a simple 3-tier architecture so I’m going to lay out allowing web, web to app traffic, app to DB, allowing SSH to App and DB, and blocking everything else in Web, App, and DB Tiers. 
  3. As you can see, the Applied To for my deny rule is only applied to my three Distributed Routing tiers (or routed Org VDC Networks). Not only is this a good NSX practice, this cuts down on who gets charged for NSX Advanced!
  4. Now looking at the DFW rules from vCenter, I can see vCD created the tenant folder and applied the policies in order. Very streamlined. 
  5. Now let’s check to see if my new rules work. Yep, I can still SSH to my VM’s but do not receive any ICMP replies. Perfect! 

How did Usage Meter detect the changes?

  1. So let’s now take a look at the Usage Meter VM History Report. What’s fascinating is Usage Meter tracks every move called state change. We can see in my tclinux-vApp (or Web) that it tracked when it started using Base NSX usage, then switched to Advanced, then my vROPs Enterprise instance picked it up and added it to its inventory. All in a very short timespan. Very useful! 
  2. With the DB and App layers, we see something very similar – vROps Enterprise, however, picked up these VM’s first and then we see Advanced NSX usage. 
  3. Now, I can see a few new line items populated on my Customer Monthly Usage and Monthly Usage report. I now see Advanced Bundle with Networking, Advanced Bundle with Management, Advanced Bundle with Networking and Management.
  4. This is to be expected since my VM’s went through a few iterations – my Web VM’s initially went with NSX Advanced without vROps which falls in the Advanced with Management while DB/App started using NSX Advanced without vROps registration (Advanced with Networking bundle). Last of all, all three are now assigned to the Advanced w/Networking and Management since we are using NSX Advanced features along with vROps Enterprise.
  5. While I see the new line items, I don’t see any units to be reported. Why? Well, I have some very small VM’s (256MB in vRAM) and they just ran a few hours during this testing. Keep in mind Usage Meter averages out the usage over the entire calendar month (720 hours for a 30 day period or 744 hours in a 31 day period). Again, to be expected but was pleased that new categories were added immediately.

In summary, Usage Meter does a great job of detecting NSX feature usage and bills it accordingly based on a specific service – Distributed Firewall being one of them. Thanks!