vRealize Automation 7.x Directory Sync Failure

tl;dr – In the event of Directory Sync failure, check the following two files on the vRA appliance for the proper Domain and Domain Controller information:

/usr/local/horizon/conf/domain_krb.properties and
/usr/local/horizon/conf/states/TENANTNAME/####/config-state.json

If stale records exist, remove them and restart the appliance. If you use an external vIDM, you may need to search for similar files there.

Scenario –

I have a vRealize Automation Appliance that hosts different tenants with identical directory structure. Directory sync for all tenants but one completes correctly. When attempting to manually sync, I was met with the following error:

Nondescript Error2
That’s it. Seriously – that little red box was all. Super descriptive, right!?

To troubleshoot, I opened every different browser I had on hand. I even installed a new one with hopes that an error message would appear in that box – no dice! At this point, I didn’t really know what I was looking for. I rebooted the vRA Appliance a few times and was still met with the same issue. I opted to open a Support Request.

As part of that Support Request, Randy (VMware Support) and I verified a few easy things. I compared Directory configuration across tenants and confirmed that the bind account functioned properly. Eventually, we took a look at the connector.log file found in /storage/log/vmware/horizon/connector.log and didn’t find much worthwhile. We generated some new traffic with less +F to while I regenerated the above-referenced error. To my delight, the log received some action.

In the log, I found the following snippets (note – these aren’t contiguous entries, but relatively close to one another) –

2018-02-05 22:43:30,974 INFO  (SimpleAsyncTaskExecutor-41084) [3010@TENANTNAME;username@TENANTNAME;127.0.0.1] com.vmware.horizon.directory.ldap.LdapConnector - Attempting to bind to sunset-dc.mueller-tech.com:389

2018-02-05 22:43:30,975 INFO  (SimpleAsyncTaskExecutor-41084) [3010@TENANTNAME;username@TENANTNAME;127.0.0.1] com.vmware.horizon.directory.ldap.LdapConnector - LDAP Context env Json Values: {
"java.naming.provider.url" : "ldap://sunset-dc.mueller-tech.com:389",
2018-02-05 21:43:09,100 WARN  (SimpleAsyncTaskExecutor-39489) [3010@TENANTNAME;username@TENANTNAME;127.0.0.1] com.vmware.horizon.directory.ldap.LdapConnector - Failed to connect to sunset-dc.mueller-tech.com:389
javax.naming.CommunicationException: sunset-dc.mueller-tech.com:389 [Root exception is java.net.UnknownHostException: sunset-dc.mueller-tech.com]

This particular tenant was trying to bind to a recently-sunset domain controller and the sync was failing as a result. Elsewhere in the log were details of exactly which users and groups were unable to update/sync.

While Randy researched on his end, I found some information in the VMware Identity Manager Documentation that stated the domain_krb.properties file needed to be updated manually when DCs were added or removed.

The domain_krb.properties file is located at /usr/local/horizon/conf/domain_krb.properties and contained the following:

##
#Date of Initial Creation
mueller-tech.com=sunset-dc.mueller-tech.com\:389,functioning-dc1.mueller-tech.com\:389,functioning-dc2.mueller-tech.com\:389

I took a quick snapshot of the vRA appliance and edited the file to remove the recently-sunset domain controller reference. Afterward, I issued service horizon-workspace restart and waited for the tenant to come back online. No good fortune. I rebooted the vRA appliance for good luck. Still no dice!

At the guidance of VMware Support, I looked at the config-state.json file at /usr/local/horizon/conf/states/TENANTNAME/####/config-state.json. In this file, I found more references to the recently-sunset domain controller listed as the “kdc” entry as shown below.

"crossRefs" : [ {
"host" : "mueller-tech.com",
"rootDomainController" : "DC=mueller-tech,DC=com",
"kdc" : "sunset-dc.mueller-tech.com",
"port" : 389,
"forestDn" : "DC=mueller-tech,DC=com",
"netBiosName" : "MUELLER-TECH"
} ],
"unresolvedCrossRefs" : [ ],
"crossRefMap" : {
"DC=mueller-tech,DC=com" : {
"host" : "mueller-tech.com",
"rootDomainController" : "DC=mueller-tech,DC=com",
"kdc" : "sunset-dc.mueller-tech.com",
"port" : 389,
"forestDn" : "DC=mueller-tech,DC=com",
"netBiosName" : "DOMAIN"
}
},
"netBiosNameByCrossRefMap" : {
"DOMAIN" : {
"host" : "mueller-tech.com",
"rootDomainController" : "DC=mueller-tech,DC=cp,",
"kdc" : "sunset-dc.mueller-tech.com",
"port" : 389,
"forestDn" : "DC=MUELLER-TECH,DC=COM",
"netBiosName" : "MUELLER-TECH"`

Comparing this file from the broken tenant to another with a functioning Sync, I found that the “kdc” entry needed to be updated to “ldap.mueller-tech.com” and I made changes accordingly. Another quick service horizon-workspace restart and it was time to test.

DICE! (Wait… is that the opposite of “no dice” or not?) Directory sync finished faster than normal and my users were able to connect as needed.

Hopefully this won’t be needed in future versions of the VMware Identity Manager or vRealize Automation Appliance. This scenario doesn’t seem to be well-documented by VMware. Hopefully this will point someone in the right direction if need be.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s