vRA 7.5 Software Components – “Failure executing script “10_installsoftware.bat”

Yet another post that’s ahead of finishing my AppDefense Series… but worth sharing nonetheless. I went down a rabbit hole for quite a while on this issue and wanted to detail what I’d found.

The Background / Setup

In my environment there is a requirement to support a wide range of different operating systems. In an effort to reduce blueprint sprawl, I wanted to find a way to dynamically select a template to clone a system from using a vRO workflow. I found just the thing in this post by Dennis Derks (Twitter). Important note about that blog post which wasn’t immediately clear to me: the cloneFrom custom property that’s mentioned is a built-in custom property. I did not know this and struggled a bit after creating a differently-named one and trying to code the template selection – don’t do that. Use the built-in property.

In addition to Dennis’ blog post, I wanted a way to dynamically update my Guest Customization Specifications during the selection. My goal was to get to a single blueprint for “Base OS” deployment. I added a bit to the tail end of Dennis’ scripting in a different workflow specifically for updating the built-in CloneSpec custom property.

//Mapping CustSpec by information in CloneFrom
var CloneFrom = vCACVmProperties.get("CloneFrom") ;
System.log("CloneFrom = " + CloneFrom);
if(CloneFrom.indexOf("Windows") !== -1){
System.log(CloneFrom + " Windows Evaluated");
var CloneSpec = "Tenant Config - Windows";
//Updating CustSpec Custom Property
var MyProperties = new Properties() ;
MyProperties.put('CloneSpec', CloneSpec);
virtualMachineAddOrUpdateProperties = MyProperties;
}
else if(CloneFrom.indexOf("CentOS")!== -1){
System.log(CloneFrom + " CentOS Evaluated");
var CloneSpec = "Tenant Config - Linux";
//Updating CustSpec Custom Property
var MyProperties = new Properties() ;
MyProperties.put('CloneSpec', CloneSpec);
virtualMachineAddOrUpdateProperties = MyProperties;
}
else if(CloneFrom.indexOf("RHEL")!== -1){
System.log(CloneFrom + " RHEL Evaluated");
var CloneSpec = "Tenant Config - Linux";
//Updating CustSpec Custom Property
var MyProperties = new Properties() ;
MyProperties.put('CloneSpec', CloneSpec);
virtualMachineAddOrUpdateProperties = MyProperties;
}
else if(CloneFrom.indexOf("OS X") !== -1 || CloneFrom.indexOf("macOS") !== -1){
System.log(CloneFrom + " macOS/OSX Evaluated");
System.log("No properties need updating. Blueprint is set to default to a blank Guest Customization field.");
}

It’s nothing special, but it’s the beginning of something that I can recycle by adding in some Tenant- or Blueprint-specific if statements. In fact, I just noticed I need to condense CentOS/RHEL into the same if statement…

I tested this for most every system with success. The biggest issue I found was related to macOS/OS X systems which need to run on Apple hardware. After some code and some troubleshooting, I learned that the __reservationPolicyID hidden property cannot be manipulated after the machine has allocated resources. I threw my shoddy JavaScript away and instead exposed a custom property, ReservationPolicyID, in my blueprint. I tied it to the getApplicableReservationPolicies built-in action in vRO (under com.vmware.vra.reservations) which allows the user to effectively select a Mac or Windows Reservation Policy. After this, testing Base OS deployments functioned properly.

The Problem / Symptoms – Software Components

Now that my Base OS stuff was finished, it was time to start with Software Components. I had already started testing these in a staging network but needed to move that testing to the final resting place. Upon having some network access opened, I began testing – everything started failing. All of my Software Components were broken. During testing, I was working with packages which were available local to the system. Changes I had made were to call from a network location for software installs. I decided to go back to basics.

My new Software Component:  echo Hi! > C:\Hi.txt Super simple stuff, right? Software Component failed.

I first checked the Deployments screen. This showed that the deployment was partially successful, but there was still an error.

The following component requests failed: vSphere__vCenter__Machine_1. Request failed: Machine _machineName_: InstallSoftware : Failure executing script ’10_installsoftware.bat’ – The system cannot find the path specified..

Heading to Infrastructure > Monitoring > Log, I found the following errors:

InstallSoftwareWorkflow Exception: WorkItem response indicates proxy agent failed to perform task.
DynamicOps.Common.WorkItems.WorkItemResponseException: WorkItem response indicates proxy agent failed to perform task.
at DynamicOps.External.SoftwareWorkflows.InstallSoftwareWorkflow.Error_SendWorkitem(Object sender, EventArgs e)
at System.Workflow.ComponentModel.Activity.RaiseEvent(DependencyProperty dependencyEvent, Object sender, EventArgs e)
at System.Workflow.Activities.CodeActivity.Execute(ActivityExecutionContext executionContext)
at System.Workflow.ComponentModel.ActivityExecutorOperation.Run(IWorkflowCoreRuntime workflowCoreRuntime)
at System.Workflow.Runtime.Scheduler.Run()

In the system itself, I found the GuestAgent.log to show similar failures to execute the ’10_installsoftware.bat’ script because it could not be found. In the agent_bootstrap.log I found that the appd.properties file could not be found. I was sure I’d dealt with that issue before…

The Troubleshooting

I immediately thought to uninstall/reinstall the Guest Agent and Software Bootstrap Agent. It seemed that there was something broken and that an uninstall/reinstall could help. To no avail. I searched the firewall for any blocked traffic and found nothing. After several hours of banging my head against a wall, I gave in and asked my fellow vExperts (quick shout out to Christopher Lewis (Twitter), Chip Zoller (Twitter), Steve Kaplan (Twitter), and Sjors Robroek (Twitter)) for guidance. After the obvious things were tested, I opened a Support Request with VMware.

I got my man Randy on the phone and we read through some logs line by line. We couldn’t find much to indicate what was going on. We started at square one with a new blueprint that was hard-set for Template and Guest Customization. We reviewed logs and found that the system was unable to download the nobel-agent.jar file from the vRA appliance. I checked my firewall and found blocks… but I’d checked before and nothing was there. A hundred times! We chalked it up to my not knowing how to query properly and set the SR to auto-close.

I resumed testing today after having some firewall ports opened up. I found that my Software Components were still failing! I immediately went to square one and built up. As soon as I added the CloneFrom drop-down, all of my Software Components failed.

I looked at my systems side by side and finally things started to come together. Looking at the GuestAgent.log I found some similar entries.

Windows-Success-Install
A successfully deployed Software Component from static-assigned Template/GuestCust
Windows-Failed-Install
Failed Software Component from dynamic Template/GuestCust

The Solution

Notice that in the failed image, the vra.software.command is /usr/bin/python – that’s when it hit me. In the blueprint, I’m using a custom property that’s shown in the request as  a drop-down. To save the blueprint, you still have to select a Template to clone from…

Windows-Success-Default-Template
Template selected for the static-assigned Blueprint
Windows-Failed-Default-Template
Template selected for the drop-down Blueprint

Earlier I referenced some code that I had worked on to set the __reservationPolicyID hidden property. The outcome of that work was that it can’t be set via the Event Broker Service – the value of it is consumed by vRA beforehand. My best guess is that the same case happens here since the cloneFrom property is doing the job of actually cloning from the selected template.

I flipped my drop-down blueprint to a Windows-based OS and my Software Components completed (or at least initiated)! This explains why I never saw anything in the firewall in previous attempts. The failures were occurring before the system even reached out to the vRA Appliance to download the nobel-agent.jar file.

“Ok, Mr. Man. Why didn’t you see anything in the firewall when you installed/uninstalled the Guest Agent/Bootstrap?” Simple answer – the template is still connected to a network that had access. The Blueprint gives the deployed VM a different network. I just started moving too fast to see some of the signs – rookie mistake. I should have been more thorough in my troubleshooting.

Next Steps

It seems that I’ve finally figured out where the ghost in the system is. Short-term, I know how to continue testing my Software Components. Long-term, I need to identify the best way to solve this problem. The drop-down blueprint works perfectly fine if I don’t have software to install – I’ll keep that for Blank OS requests.

The question now is whether I refactor some code in order to account for the Blueprint ID in the workflow or whether I do something different. I’m sure I’ve exposed some of the wrong ways to do things in this post. Happy to hear how others are doing things!

Leave a Reply