Add UCS vNICs causes vmnic re-order on vSphere host

I recently stumbled upon this issue when adding some new vNICs for use with a new iSCSI storage array.  I followed the usual order of operations for our UCS B-series blades: create new vNIC templates, add the vNICs to the LAN Connectivity Policy in UCS Manager, put first vSphere host in the cluster into maintenance mode, shut down and then hit blinky “pending actions” light in UCS to reboot the corresponding service profile.  Host comes back up, take it out of maintenance mode and let DRS migrate some VMs back over.  Only this time, the vMotion process kept failing VMs trying to go back over to this updated host.

Luckily, I was already working on another issue at the same time with Cisco TAC when this reared its ugly head.  We took a look and after a while saw that after adding the two vNICs and rebooting, the vmnics came up a little bit out of order in vSphere:

2017-09-12 21_51_18- - Remote Desktop Connection

In UCS, I had created the MAC pools to reference vmnics by either Fabric A or B and corresponding vmnic number.  Highlighted above, we saw that vmnic8 was showing a MAC address with aa:a4, which meant it “should” actually be vmnic4 rather than vmnic8.  It just so happens that vmnic was one of my vMotion uplinks.  TAC mentioned that this was a known issue that was still yet to be resolved between Cisco and VMware.  They pointed me to a Cisco bug article that referenced a VMware KB for the fix:

How VMware ESXi determines the order in which names are assigned to devices

The KB shows to SSH into the vSphere host with the ordering issue and run “localcli –plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias list” to output the vmnic aliases and their corresponding bus address:

2017-09-12 22_00_22- - Remote Desktop Connection

Per the article, you change the alias assigned to a particular bus address by running another local-cli command.  In this instance, I ran the command:

localcli –plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store –bus-type pci –alias vmnic4 –bus-address s00000000:07:00

Which is basically saying change the bus address that is currently assigned to vmnic8 to use the alias vmnic4.  In the first screenshot, it shows that what is vmnic8 really should be vmnic4 based on the MAC address.

The command had to be run again four more times (of course using the cli syntax):

change alias vmnic4 to vmnic5
change alias vmnic5 to vmnic6
change alias vmnic6 to vmnic7
change alias vmnic7 to vmnic8

vmnic9 was correct, so after making all the alias changes, the host was rebooted again and came back up showing the proper order:

2017-09-12 22_11_07- - Remote Desktop Connection

The host came out of maintenance mode, vMotions worked fine and all was right with the world.

I haven’t done much digging to see if this indeed is a known issue based on certain versions of vSphere/UCS, so if anyone else has run into this before or has any info other than what I heard from TAC in that it is has yet to be worked out between VMware and Cisco, definitely post a comment with any updates.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s