tl;dr – If you’ve patched everything already but still fail verification, check to see if EVC is enabled on the cluster. I found that EVC did not update itself as described in the KB article. Disable/re-enable of EVC enabled new instructions to be applied.
By now, you and the rest of the world know about the Meltdown/Spectre vulnerabilities that were disclosed on 1/3/18 or sometime thereabouts. Earlier this week, VMware released patches and this KB detailing how to apply them.
I had already taken steps to patch the (physical) systems with available BIOS/firmware updates in my environment. When vCenter Server and ESXi patches were released (links to all of which can be found in the VMware Security Advisory here), I added those patches to the pile. Guest OSs had already received patches through other channels. There were probably patches for various lamps and/or lampshade microcode that needed to be applied elsewhere. Read: There’s really just a lot of patches to apply to mitigate this mess… moving on!
Being the absolute demon that he is, William Lam has created a script which will report on the vulnerability mitigation capability of ESXi hosts and VMs running on them. He’s documented that script and how to use it on his blog.
Using Mr. Lam’s script, I found that my ESXi hosts were patched properly and successfully seeing the new CPU instructions added by new microcode. The problem I had was identifying exactly why the VMs themselves weren’t receiving the same CPU instructions.
My validation process: Create a new VM of VM Hardware version 8 and verify that Mr. Lam’s script reported it as such. Upgrade the VM Hardware Version, Power-On the VM, and again run Mr. Lam’s script for comparison. The results were varied across the environment.
I narrowed the issue down to Enhanced vMotion Compatibility (EVC). On clusters where EVC was not enabled, my validation process showed that the VM was receiving the new CPU instructions. On clusters where EVC was enabled, the VM was not being presented with the new instructions.
The VMware KB article above indicates that when ESXi hosts in an EVC-enabled cluster are upgraded, the cluster maintains the current instruction set until all ESXi hosts in the cluster have been upgraded. At that time, EVC will automatically upgrade itself to enable VMs to receive the new instructions. Based on my observations, I needed to test this.
All hosts in my EVC-enabled cluster had been patched and are reporting as such both in vCenter and Mr. Lam’s script. My vCenter Server has also been patched appropriately (which is required for exactly this reason) and reports as such in… well, vCenter. Putting a host in Maintenance Mode, I removed it from an EVC-enabled cluster and left it as a standalone host. Performing my validation process was successful! I moved the host back into an EVC-enabled cluster and performed another validation – vulnerable again!
This confirmed to me that EVC was the culprit ruining my path to successful mitigation. After considering the potential consequences, I disabled EVC and re-enabled EVC on the cluster (at the same EVC level). Once re-enabled, all validation via Mr. Lam’s script passed. Further, the Microsoft code to validate in the Guest OS passed as well. Successful mitigation!
EVC – I won’t let you get the best of me!
Note: This information has been relayed to VMware. I expect that this will be updated at some point in the near future. I’ll update this blog post to reflect that as soon as I’m made aware.