Promiscuous mode with NSX Distributed firewall

TL;DR – When running appliances that require promiscuous mode it’s advised to place the appliance in the DFW exclusion list, especially if the appliance is a security hardened virtual appliance

Last year I was working with a customer that had rolled out NSX to implement micro-segmentation. This customer is a mid-sized customer that is integrating NSX into their existing production vSphere environment. The customer had decided to implement security policy based on higher level abstractions i.e. security groups but also decided they need security policy enforced everywhere, using the Applied To column set to Distributed Firewall. Not an ideal implementation but perhaps that discussion is for another blog post

The environment being a brownfield environment, the customer wanted to implement security policy to the existing virtual appliances. They were using F5 virtual appliance for load balancing and had specific requirements to configure promiscuous mode on the F5 port group to support VLAN groups(https://support.f5.com/kb/en-us/products/big-ip_ltm/releasenotes/product/relnotes_ltm_ve_10_2_1.html)

Long story short, by configuring the F5 port groups in promiscuous mode they were facing intermittent packet drops across the environment. We were quickly able to identify that the workloads impacted were the workloads that resided on the host that had the F5 virtual appliances. As we could not power off the appliance we decided to place the F5 appliance in an exclusion list and like magic, everything returned to normal, no packet drops.

Well now that we understood what was causing the issue but as with all things we needed to understand why this behaviour was seen. We started by picking a virtual machine on the F5 host and turned up logging on the rule that allowed traffic for a particular service, in this case, SSH. We also added a tag to that rule so we could filter the logs.

We started to follow an SSH session that we initiated from 10.68.0.250 to 10.68.145.170. We were seeing the SYN from 10.68.0.250 reaches 10.68.145.170 and this is allowed by the firewall as rule 1795 permits it but interestingly enough we were also seeing the same traffic flow on a different filter, in this case, 53210

 

/var/log/dfwpktlog.log
T01:29:48.363Z 28341 INET match PASS domain-c7/1795 IN 52 TCP 10.68.0.250/65448->10.68.145.170/22 S 12345_ADMIN
T01:29:48.363Z 53210 INET match PASS domain-c7/1795 IN 52 TCP 10.68.0.250/65448->10.68.145.170/22 S 12345_ADMIN

 

Note: You can get the filter hash for a virtual machine by running vsipioctl getfilters. Each vNIC gets a filter hash

 

[root@esx101:~] vsipioctl getfilters

Filter Name : nic-11240904-eth0-vmware-sfw.2
VM UUID : 50 1e 02 7b b0 88 05 47-c2 ed dd 92 22 93 12 1d
VNIC Index : 0
Service Profile : –NOT SET–
Filter Hash : 28341

 

Looking at the output of vsipioctl getfilters we identified that the duplicate filter (53210) belonged to the F5 virtual appliance.

We also captured packets at the switchport of the virtual machine and we see 10.68.145.170 sending a SYN/ACK back to the client, this packet was also seen on both filters. The subsequent frame shows the client immediately sending a RST packet to close the connection. This was a bit strange as the firewall rule allows this traffic and there was no other reason why the client would immediately send a RST packet

Screen Shot 2018-01-17 at 2.07.11 PM

 

Interestingly the dfwpktlogs had recorded packets on the duplicate filter hitting a different rule (rule 1026) and this rule had a REJECT action so a RST packet would be sent out for TCP connections


/var/log/dfwpktlog.log
T01:29:48.364Z 53210 INET match REJECT domain-c7/1026 IN 52 TCP 10.68.145.170/22->10.68.0.250/65448 SA default-deny
T01:29:48.364Z 53210 INET match REJECT domain-c7/1026 IN 40 TCP 10.68.145.170/22->10.68.0.250/65448 R default-deny

 

Why would traffic be hitting a different rule when the initial SYN was evaluated by rule 1795 ??

So, when a virtual machine is connected to a promiscuous portgroup its filter is also in promiscuous mode and hence its filter would receive all traffic. In this particular case, the F5 appliance was placed in security group that had its own security policy applied to it. Duplicated traffic that was sent to this filter was being subjected to this security policy. The security policy had a REJECT action configured and hence a RST packet was being sent to the client

Having identified this, we provided the customer with a couple of options to mitigate the issue.

I hope you found this article to be as interesting as this issue was to me.Until next time !!

VMware VDS link aggregation enhanced support

Recently while working on a NSX design we decided to use LAG from the ESX hosts to the ToR leaf switches. After configuring LACP in active mode on the physical switches we moved on to configure LACP on the distributed virtual switch. All LACP configuration for the VDS has to be done from the vSphere WebClient but looking at the WebClient we could not find the LACP option on the VDS. After some digging around we figured that the distributed virtual switch was created using the VMware C# client. A distributed virtual switch created from the C# client has only basic support for LACP.

To use advanced LACP options click the “Enhance” option in the distributed virtual switch features box from the vSphere WebClient. This allows us to choose the load balancing algorithm, LACP mode and also creates the link aggregation group with the ESX uplinks.

lag

These are the enhanced LACP features that VDS supports:

  • Support for configuring multiple link aggregation groups (LAGs)
  • LAGs are represented as uplinks in the teaming and failover policy of distributed ports or port groups. You can create a distributed switch configuration that uses both LACP and existing teaming algorithms on different port groups.
  • Multiple load balancing options for LAGs.
  • Centralized switch-level configuration available under Manage > Settings > LAC