Replacing Expired PSC and VCSA Certificates

Task at hand: Replace the now-expired Machine SSL Certificates of the (still) external PSC and VCSA.

By now, there are several different blog posts about how to replace the Machine SSL Certificate using the built-in Certificate Manager tool for the PSC and VCSA. I originally performed this operation after migrating from vSphere 5.5 to vSphere 6.5. If you’re interested, you can read about the migration experience here and here. Now – some time has passed and I find a need to re-up those Machine SSL certificates as a result of a now-passed expiration date.

I had originally reviewed this blog post to make a determination on the best way to manage certificates in my new vSphere 6.5 environment. I recall nightmares of certificate replacements in vSphere 5.x. My security team raised concerns with making the VMware Certificate Authority (VMCA) a subordinate to the Corporate CA. I opted for the Hybrid approach – a Corporate CA-signed Machine SSL Certificate for PSC/VCSA and VMware VMCA-signed certificates for all other vSphere-related actions. I followed this walkthrough to help me the first time around. Updating the Machine SSL certificates again follows the same procedure – easy stuff!

Task Steps:

  • SSH to PSC
  • Run the Certificate Manager tool from /usr/lib/vmware-vmca/bin/certificate-manager
  • Select to Replace Machine SSL Certificate with Custom Signed Certificate
  • Generate CSR and key files
  • SCP the CSR from the PSC/VCSA
  • Create certificate from CSR
  • Create chain certificates for root certificate and Machine SSL certificate
  • SCP back to PSC/VCSA
  • Replace with the Certificate Manager tool

Problem at hand:

During the replacement of the certificate on my PSC (which, if you’re still using an external PSC, needs to be done before the VCSA), I encountered an error and replacing the certificate failed.

I received the error while the VMCA is looking up all of the services to verify which ones need to be updated with the new certificate thumbprint. In particular, the VMCA encountered a third-party plug-in that it didn’t like. Here’s a bit of what it was showing me:

Get service c2232449-f7cd-4fd5-ac80-dd99a0a51ee3
Update service c2232449-f7cd-4fd5-ac80-dd99a0a51ee3; spec: /tmp/svcspec_z_br_7_1
Get service 3b6f4b0a-5b54-4648-a308-51ac4fbc8459
Update service 3b6f4b0a-5b54-4648-a308-51ac4fbc8459; spec: /tmp/svcspec_cgtwn52b
Get service 2e46cfb0-8999-47e0-9891-585d5d4e0c6d
Don't update service 2e46cfb0-8999-47e0-9891-585d5d4e0c6d
Get service 32457dce-cad3-4f3f-a360-9028d2f80031
Update service 32457dce-cad3-4f3f-a360-9028d2f80031; spec: /tmp/svcspec_op6l724r
Get service a93d658b-d886-46bb-8950-06b3fbc04df4
Don't update service a93d658b-d886-46bb-8950-06b3fbc04df4
Get service a7310628-de94-42c8-b3b0-6d1c0192dddf_com.nimblestorage.hi.h5
Status : 0% Completed [Operation failed, performing automatic rollback]

Error while replacing Machine SSL Cert, please see /var/log/vmware/certificate-manager.log for more information.
Performing rollback of Machine SSL Cert...
Get site nameus : 0% Completed [Rollback Machine SSL Cert...]

This was immediately followed by the same process of looking up services to update and another failure.

Get service a93d658b-d886-46bb-8950-06b3fbc04df4
Don't update service a93d658b-d886-46bb-8950-06b3fbc04df4
Get service 32457dce-cad3-4f3f-a360-9028d2f80031
Don't update service 32457dce-cad3-4f3f-a360-9028d2f80031
Get service a7310628-de94-42c8-b3b0-6d1c0192dddf_com.nimblestorage.hi.h5

Error while reverting certificate for store : MACHINE_SSL_CERT
Rollback Status : 0% Completed [Rollback operation failed]

Error while performing rollback operation, please try Reset operation...

please see /var/log/vmware/certificate-manager.log for more information.

Looking into the certificate-manager.log file, I found the failure related to a command issued by the Certificate Manager.

2019-06-28T16:04:11.891Z INFO certificate-manager lstool command currently being executed is : ['/usr/java/jre-vmware/bin/java', '-Djava.security.properties=/etc/vmware/java/vmware-override-java.security', '-cp', '/usr/lib/vmware-sca/lib/lookup-client.jar:/usr/lib/vmware-sca/lib/*:/usr/lib/vmware/common-jars/*', '-Dlog4j.configuration=tool-log4j.properties', 'com.vmware.vim.lookup.client.tool.LsTool', 'get', '--no-check-cert', '--url', 'https://psc.mueller-tech.local/lookupservice/sdk', '--id', 'a7310628-de94-42c8-b3b0-6d1c0192dddf_com.nimblestorage.hi.h5', '--as-spec']
2019-06-28T16:04:13.748Z ERROR certificate-manager 'lstool get' failed: 1
2019-06-28T16:04:13.748Z INFO certificate-manager Error while reverting certificate for store : MACHINE_SSL_CERT
2019-06-28T16:04:13.749Z ERROR certificate-manager Error while performing rollback operation, please try Reset operation...
2019-06-28T16:04:13.749Z ERROR certificate-manager please see /var/log/vmware/certificate-manager.log for more information.

The solution:

I had run into this before! I had opened a ticket with VMware Support about this exact problem (with a different plug-in) in the distant past. The Certificate Manager, while reviewing a list of services to update, identified a problem with the plug-in. In this instance, it happens to be the Nimble Storage plugin for the HTML5 client that the Certificate Manager isn’t happy with.

To resolve this, I needed to unregister the plug-in through the Managed Object Browser. William Lam, the beast that he is, has written this great article about exactly how to do so. I’ve done so in the past, but quickly pulled up William’s article as a refresher.

Update: It seems that VMware released a KB article on this some time after I ran into it.

The new problem:

As I mentioned earlier, the task at hand was to replace an already-expired certificate. As a result of the certificate expiration, I wasn’t able to login to the vSphere Web Client or the vSphere Client (HTML 5 client). To add to it, vRealize Automation and Horizon services were also offline as a result of this mishap.

Do you know what else you can’t login to when a certificate has expired? The Managed Object Browser. This fact makes it particularly difficult to unregister the extension. It’s a classic chicken and egg scenario. To unregister the extension via MOB, I need to replace the certificate. To replace the certificate, I need to unregister the offending extension…

Important note: I’m sharing the following solution both to remind myself for next time and to help anyone that comes across the article. I can provide no support for the following described actions. Proceed in your environment at your own risk or at the guidance of VMware Support.

The new solution:

I ended up on the horn with VMware Support who had me make some changes to the script used by Certificate Manager. Use the following steps to do so:

  • SSH to the PSC or VCSA
  • Login to the bash shell
  • Navigate to /usr/lib/vmware/site-packages/cis (cd /usr/lib/vmware/site-packages/cis)
  • Make a copy of the file certificateManagerHelper.py (cp certificateManagerHelper.py certificateManagerHelper.py.bak)
  • Edit certificateManagerHelper.py (vi certificateManagerHelper.py)
  • You’re searching for the following block:
    if(rc != 0):
    logging.error("'lstool get' failed: {}".format(rc))
    raise Exception("'lstool get' failed: %d" % rc)
  • Edit the block by replacing the second line and commenting out the third line as shown:
    if(rc != 0):
    rc = 0;
    #raise Exception("'lstool get' failed: %d" % rc)
  • Save the file

This work effectively tells Certificate Manager – When you’re reviewing services and receive a return code that does not equal zero – set that return code to zero and proceed.

With this edit completed, I was able to successfully update my Machine SSL certificates without any further issues. This is a super helpful tip that I want to be sure to remember. This has significantly less impact than unregistering plug-ins for this operation.

I hope someone finds this helpful!

Leave a Reply