IGMP versions requirement for VXLAN logical networks

Recently we were working on an issue were VXLAN Transport Multicast traffic was not being passed on the upstream physical switches causing an outage for the virtual machines that were hosted on these virtual wires.

Some Background on the environment:

This was a vCNS 5.5 environment with VXLAN deployed in Multicast mode. This was quite a big environment with multiple virtual wires deployed and multiple virtual machines connected to these virtual wires.
Virtual Machines were not able to communicate because multicast traffic was not being passed on the physical switches.

Upon further investigation it was revealed that IGMPv3 join’s were being received by the physical switch and since the physical switch had IGMPv2 enabled it pruned and ignored the IGMPv3 joins. So to resolve the issue, IGMPv3 was enabled on the upstream switches and the ESX hosts were able to join their multicast groups and the virtual machines were reachable on the network.

Starting with ESX 5.5, the default IGMP version on the ESX host has been changed to v3. This option is configurable and can be reverted to IGMPv2 using the ESX advanced settings,

Configuration–>Advanced Settings–>Net–>Net.TcpipIGMPDefaultVersion

Hope this helps someone that run’s into a similar issue.

Troubleshooting vCloud Director Internal Networks

These days I’ve been spending my time working with NSX and integrating it with vCloud Director. During some of these tests I ran into an issue with network connectivity on internal networks in vCloud Director.
To expound on the issue, I created an Internal virtual datacenter network in vCloud Director and enabled DHCP services on the internal NSX Edge virtual machine that gets deployed for this internal network. I then deployed two Linux virtual machines connected to this internal network on 2 different ESX hosts. These virtual machines should have received an IP address from the DHCP scope configured on the Edge but for some reason these virtual machines were not getting an IP address and were unable to ping the gateway(Interface on the Edge device).

To isolate if this issue was something specific to the Linux guest I moved all the 3 virtual machines (2 Linux machines and the NSX Edge) to the same ESX host and restarted the networking service. The machine was assigned an IP address of the configured DHCP scope and was able to ping its gateway. So there is nothing wrong with the TCP/IP stack in the guest since network traffic on the same ESX hosts never traverses the external network and is in done in memory.

Digging a litte deeper: The arcane world of log analysis

Bringing out the geek in me and to start digging further I started trawling through the vmkernel logs on the ESX host to see what happens when the virtual machine powers up, i.e. Does it connect to the virtual port…

2014-05-19T09:34:02.904Z cpu15:29956759)World: vm 29956760: 1462: Starting world vmm0:org1-rhel2_(bc4599c3-ff8e-432b-863e-1cdcef544661) of type 8
2014-05-19T09:34:02.904Z cpu15:29956759)Sched: vm 29956760: 6410: Adding world ‘vmm0:org1-rhel2_(bc4599c3-ff8e-432b-863e-1cdcef544661)’, group ‘host/user/pool3’, cpu: shares=-3 min=200 minLimit=-1
max=1000, mem: shares=-3 min=3072 minLimit=-1 max=16384
2014-05-19T09:34:02.904Z cpu15:29956759)Sched: vm 29956760: 6425: renamed group 57859289 to vm.29956759
2014-05-19T09:34:02.904Z cpu15:29956759)Sched: vm 29956760: 6442: group 57859289 is located under group 54783989
2014-05-19T09:34:02.907Z cpu15:29956759)MemSched: vm 29956759: 8263: extended swap to 28290 pgs
2014-05-19T09:34:03.089Z cpu15:29956759)VSCSI: 3750: handle 8370(vscsi0:0):Using sync mode due to sparse disks
2014-05-19T09:34:03.089Z cpu15:29956759)VSCSI: 3792: handle 8370(vscsi0:0):Creating Virtual Device for world 29956760 (FSS handle 1150128849) numBlocks=4194304 (bs=512)
2014-05-19T09:34:03.244Z cpu4:29956760)Net: 2292: connected org1-rhel2 (bc4599c3-ff8e-432b-863e-1cdcef544661).eth0 eth0 to vDS, portID 0x30001e7
2014-05-19T09:34:03.244Z cpu4:29956760)Net: 3055: associated dvPort 1683 with portID 0x30001e7
2014-05-19T09:34:03.247Z cpu4:29956760)NetPort: 2862: resuming traffic on DV port 1683
2014-05-19T09:34:03.247Z cpu4:29956760)vxlan: VDL2_CPSetCPEnabled:2840: Control plane enabled on VXLAN network[5001]
.
.
2014-05-19T09:39:24.824Z cpu11:27610460)WARNING: vxlan: VDL2CPCheckConnUpCB:311: Control plane connection of VXLAN network[5001] is down

The above log snip tracks the power on task for the virtual machine(org2-rhel2) and its quite evident from the last line that the control plane connection is down. These Internal networks use VXLAN as their underlying transport and since VXLAN uses a controller in Unicast mode the next thing to check would be if the ESX hosts can communicate with the NSX controller.
On the ESX host using the esxcli command we can query the VDS for VXLAN configuration.

~ # esxcli network vswitch dvs vmware vxlan list –vds-name Nebula-Networks

VDS ID VDS Name MTU Segment ID Gateway IP Gateway MAC Network Count Vmknic Count
———————————————– ————— —- ———— ————– —————– ————- ————
d7 e6 3d 50 19 d7 02 36-f4 23 96 fe 64 46 1c 33 Nebula-Networks 1600 192.168.1.0 192.168.1.254 00:21:55:08:ec:40 2 1

Immediately noticed the the connection to the controller was down

~ # esxcli network vswitch dvs vmware vxlan network list –vds-name Nebula-Networks

VXLAN ID Multicast IP Control Plane Controller Connection Port Count MAC Entry Count ARP Entry Count
——– ————————- ————- ——————— ———- ————— —————
5000 N/A (headend replication) Enabled () 192.168.1.50 (down) 2 0 0
5001 N/A (headend replication) Enabled () 192.168.1.50 (down) 1 0 0

ESX hosts establishes a connection to the NSX controller using a user world daemon. The netcpa.log shows communication with the controller and also the updates that are pushed from the controller down to the ESX hosts. Looking at these logs its clear that the connection is down.

~ # tail -f /var/log/netcpa.log
2014-05-19T09:34:09.615Z [37281B70 info ‘Default’] Core: Sharding connection 192.168.1.50:0 is timeout
2014-05-19T09:34:09.615Z [37281B70 info ‘Default’] App CORE : 0 unregister connection to 192.168.1.50:0
2014-05-19T09:34:09.615Z [37281B70 info ‘Default’] User of connection 192.168.1.50:0
2014-05-19T09:34:09.615Z [37281B70 info ‘Default’] App CORE : 0 register connection to existing controller to 192.168.1.50 port 1234

To isolate further on comparing the MAC addresses for the controller IP’s it was found the the controllers IP had been assigned to another machine on the network. After shutting down the machine and restarting the netcpa agent the ESX hosts was able to re-establish a connection with the controller.

~ # tail -f /var/log/netcpa.log
2014-05-19T10:37:10.471Z [5DC5DB70 info ‘Default’] Core: ShardingSlice length of peer 192.168.1.50: 4194304
2014-05-19T10:37:10.471Z [5DC5DB70 info ‘Default’] Vxlan: core app ready on 192.168.1.50:0
2014-05-19T10:37:10.472Z [5DC5DB70 info ‘Default’] Vxlan: send VNI Membership Update(Join) to the controller: VNI 5000 controller 192.168.1.50
2014-05-19T10:37:10.472Z [5DC5DB70 info ‘Default’] Vxlan: send VNI Membership Update(Join) to the controller: VNI 5001 controller 192.168.1.50
2014-05-19T10:37:10.472Z [5DC5DB70 info ‘Default’] Core: Controller is ready: 192.168.1.50:0
2014-05-19T10:37:10.472Z [FFE59100 info ‘Default’] Core: Sharding Segment Update message: server 192.168.1.50 startSliceId 0 numSlices 1024
2014-05-19T10:37:10.473Z [FFE59100 info ‘Default’] Vxlan: receive VNI Membership Update(Join) from the controller: VNI 5000 controller 192.168.1.50 len 23
2014-05-19T10:37:10.473Z [FFE59100 info ‘Default’] Vxlan: set VNI 5000 (mcast proxy: Enabled, arp proxy: Enabled)
2014-05-19T10:37:10.474Z [FFE59100 info ‘Default’] Vxlan: receive VNI Membership Update(Join) from the controller: VNI 5001 controller 192.168.1.50 len 23
2014-05-19T10:37:10.474Z [FFE59100 info ‘Default’] Vxlan: set VNI 5001 (mcast proxy: Enabled, arp proxy: Enabled)

If the controller IP address gets changed and cannot be reverted to the original IP, the /etc/vmware/netcpa/config-by-vsm.xml file on the ESX host can be edited to add the new controller IP address.

While this issue may be quite simple and something that happens most of the time in a large network, I hope you found the approach to the problem useful. Feedback welcome!!

Unicast VXLAN:Integrating NSX and vCloud Director

One of challenges with implementing VXLAN is configuring Multicast on physical switches to support BUM traffic. With the release of NSX for vSphere, VXLAN can be deployed to work in Unicast mode with the help of the NSX controller. In this article we will look at deploying VXLAN with NSX and integrating vCloud Director to create logical networks.

This article assumes that vCloud Director is already installed. vCloud Director should be version 5.5.0 or higher

Deploying NSX Manager

First download the NSX Manager and deploy the OVF image on the Management Cluster. The deploy OVF wizard will require IP address details for the NSX Manager.Once the NSX manager is deployed connect to the WebUI of the NSX manager to register NSX Manager with vCenter Server and the vCenter Lookup Service. Once NSX Manager is successfully registered with vCenter the Networking & Security tab is displayed in the Web Client. All NSX configuration will be done using the Web Client. Under the Network & Security plugin the NSX Manager should be listed.

1

Note: An important caveat here is to use a user account with administrator privileges to register vCenter, else the NSX Manager will not displayed in the Web Client.

Deploying the NSX Controller

The next step is to deploy the NSX controller. Under the NSX Controller Nodes section, Click the ‘+’ to add the first NSX controller. The NSX controller provides a control plane to distribute network information down to the ESX hosts. The Controller can be clustered by deploying additional controllers to support scale out architecture and high availability.

controller

Prepare ESX hosts for VXLAN

To start deploying logical networks the ESX hosts need to be prepared for VXLAN. Under Installation->Host Preparation, Click Install against the cluster that will be prepared. The installation process pushes VXLAN vibs to the ESX hosts and enables the Distributed Firewall.

vxlan

Once the ESX hosts go into ‘Ready’ status we can configure VXLAN. Select the Virtual Distributed Switch, if VXLAN traffic needs to be isolated in a VLAN enter the VLAN number. Default MTU of 1600 should suffice, Either an IP pool or DHCP can be used to assign IP addresses to the VTEP interface that gets created as part of the VXLAN configuration. Select the teaming policy for VTEP load balancing and high availability

vxlan

Logical Network Preparation

Once the ESX hosts are prepared and ready the transport zone and Segment ID needs to be created. Under Logical Network Preparation->Segment ID enter the Segment ID pool. We do not need Multicast IP addresses since we will use Unicast as the Control Plane mode.
Typically a Transport zone and the Control Plan mode is also defined at this stage but since we are integrating vCloud Director we will allow vCloud Director to create the the transport zone.

vCloud Director Configuration

At this stage there should be a VXLAN Network pool created by default in vCloud Director, this should be in error state since VXLAN was not per-configured. Right Click the network pool and choose repair to recreate the transport zone. Once complete there should now be a transport zone created under the logical network preparation tab. Edit the transport zone and change the Control Plane mode to Unicast.

unicast

Consuming Logical Networks

The network pool that was created can now be assigned to a vCloud Virtual Data Center. To start consuming VXLAN logical networks an Edge Gateway and a Routed network needs be deployed within the organization. When the Routed network is created a logical switch with the segment ID is created as a port group in vCenter. Virtual machines can now be deployed and connected to the Routed network and use VXLAN as the underlying transport.

I hope you found this article useful. Questions or comments are welcome!

Configuring CA signed certificates for vCloud Director

I’ve seen a lot of questions around configuring CA signed certificates with vCloud director and how to avoid those “Cryptographic Errors” when starting the vCloud director service. In this post I hope to go through the configuration and setup vCloud director to use CA signed certificates.

Before we get into the details some information about the environment. I have an Intermediate certificate server and a Root certificate server, we will be issuing certificates from the intermediate certificate server but will require the complete chain to be imported in the vCloud director keystore.

Using keytool we will create the keystore and generate the CSR’s. Its recommended to use the keytool shipped with vCloud Director which is located at /opt/vmware/vcloud-director/jre/bin/keytool.

1. Create the keystore and generate a certificate for the http service

/opt/vmware/vcloud-director/jre/bin/keytool -keystore /opt/vmware/vcloud-director/etc/certificates.ks -storetype JCEKS -storepass vmware123 -genkey -keyalg RSA -dname “CN=nebula1.area88, OU=Cloud, O=VMware, L=BNG, ST=KAR, C=IN” -alias http

2. Create the certificate signing request file for the http service

/opt/vmware/vcloud-director/jre/bin/keytool -keystore /opt/vmware/vcloud-director/etc/certificates.ks -storetype JCEKS -storepass vmware123 -certreq -alias http -file http.csr

3. Generate a certificate for the console proxy service

/opt/vmware/vcloud-director/jre/bin/keytool -keystore /opt/vmware/vcloud-director/etc/certificates.ks -storetype JCEKS -storepass vmware123 -genkey -keyalg RSA -dname “CN=nebula1.area88, OU=Cloud, O=VMware, L=BNG, ST=KAR, C=IN” -alias consoleproxy

4. Create the certificate signing request for the console proxy service

/opt/vmware/vcloud-director/jre/bin/keytool -keystore /opt/vmware/vcloud-director/etc/certificates.ks -storetype JCEKS -storepass vmware123 -certreq -alias consoleproxy -file consoleproxy.cer

There should now be 2 CSR files created for the http and console proxy service that needs to be submitted to the CA. You can copy the contents of the CSR file by running ‘cat’ on the file.

5. On the Intermediate certificate server or the Root Certificate server navigate to the Web enrollment page, http://hostname/certsrv . Select Request a New certificate, choose the Advanced certificate request and select Submit a certificate request by using a base-64 encoded CMC or PKCS.

6. Paste the contents of the http CSR file in the Saved Request field. Select WebServer for the certificate template and click submit.

Screen Shot 2014-02-25 at 3.51.16 PM

7. Select DER encoded and Download the certificate

8. Repeat steps 6 & 7 for the console proxy CSR

9. With the certificates for http and console proxy generated we also need the root certificate and the intermediate certificate. From the Web Enrollment page download the root certificate and the intermediate certificate

Screen Shot 2014-02-25 at 3.52.04 PM

10. Now that we have the certificate chain we need to import the complete chain into the keystore.

/opt/vmware/vcloud-director/jre/bin/keytool -keystore /opt/vmware/vcloud-director/etc/certificates.ks -storetype JCEKS -storepass vmware123 -import -alias root -file rootca.cer

/opt/vmware/vcloud-director/jre/bin/keytool -keystore /opt/vmware/vcloud-director/etc/certificates.ks -storetype JCEKS -storepass vmware123 -import -alias intermediate -file interca.cer

/opt/vmware/vcloud-director/jre/bin/keytool -keystore /opt/vmware/vcloud-director/etc/certificates.ks -storetype JCEKS -storepass vmware123 -import -alias http -file http.cer

/opt/vmware/vcloud-director/jre/bin/keytool -keystore /opt/vmware/vcloud-director/etc/certificates.ks -storetype JCEKS -storepass vmware123 -import -alias consoleproxy -file consoleproxy.cer

11. We can verify that we have all the certificates imported into the keystore using the below command. You can also validate the thumbprint for each certificate from the keystore with the downloaded certificate

Screen Shot 2014-02-25 at 3.56.04 PM

12. Run the vcloud director configure script and provide the path to the keystore file

/opt/vmware/vcloud-director/bin/configure

13. Access the vCloud Director web page and notice that the certificate should be trusted.

cert9

Michael Webster has already made a blog post on configuring CA signed certificates for vShield Manager which you can find here

I hope this blog post is useful and helps overcome the certificate pain that we run into. Thank you for reading.