Testing vNUMA awareness and flat vs wide VMs

By | October 16, 2015

I was reading Mark Achtemichuk’s (VCDX #50) article “Does corespersocket Affect Performance?” And decided to do some experimenting on my own. As we have a lot of monster VMs (VMs over 8 vCPU) we’ve known that turning off CPU Hotplug (which enables vNUMA awareness) is beneficial in our environment. After reading Mark’s article I wanted to find out if doing BOTH would have a cumulative affect or not, or if one was better than the other:

The hosts:

All VMs are Windows 2012 R2 and initially had 2 sockets/X cores along with 4GB of RAM.

Member2 = 8 vCPU VM <- This is my control box, as the ESXi install hasn’t been tweaked to lower the vNUMA awareness from >8 vCPUs, I don’t expect much change here.

Member3 = 12 vCPU VM

Member4 = 16 vCPU VM

Member5 = 20 vCPU VM

Member6 = 30 vCPU VM <- This is my blow out box, definitely has to cross NUMA boundaries & exceeds total physical cores. I expect the most dramatic changes here.

I ran IntelBurnTest v2.54 (don’t forget to enable .NET3.5 compatibility!) and below are the results:

Hostname

Process Name

Initial Runtime

After vCPU Hotplug Disabled

Hotplug Enabled, Cores change to “Flat”

After both changes implemented

Member2 (8cores) IntelBurnTest

119.66

119.55

118.89

119.02

Member3 (12cores) IntelBurnTest

111.12

106.11

106.98

106.33

Member4 (16cores) IntelBurnTest

106.42

101.72

96.48

96.27

Member5 (20cores) IntelBurnTest

129.5

101.06

93.28

92.66

Member6 (30cores) IntelBurnTest

142.19

140.02

111.45

111.47

So a couple of interesting & unexpected results!

  1. There was a small improvement on the 8 core VM.
  2. Initial runtime for the 12 and 16 core VMs was better than the 8 core.  I’m assuming that this is the case because most of the compute is being taken care of by one host socket.
  3. I expected the 20 core box to be worse due to the fact that we’re now crossing NUMA for much more of the compute needs.
  4. Disabling CPU hotplug barely helped the 30 core VM at all!  Flattening the CPU architecture helped this VM far more than vNUMA awareness!

Summary –

For now I’m going to recommend doing BOTH changes to monster VMs as the results vary based upon how many cores you have allocated.  Disabling hotplug helped more for 12 – 20 core VMs whereas going flat helped across the board on all VMs.

I’ve got a call with Mark today, so I’m going to ask for clarity on these results.

One thought on “Testing vNUMA awareness and flat vs wide VMs

  1. Abhilash

    Nice post! vNUMA should be taken into account when VMs are wide or configured with large memory capacity which would cause the VMs to span physical NUMA boundaries, this can surely affect latency sensitive applications.

Comments are closed.