Troubleshooting Citrix Machine Creation Services (MCS) Stalling at 50%
Here is another great write-up by Jacques Bensimon.
On a recent occasion at a client, after building and customizing a Windows Server 2016 RDS image (on VMware ESXi, not that this particularly matters here) and installing the Citrix Virtual Desktop Agent (VDA) for Server OS with a view toward deploying it via Machine Creation Services (MCS), came time to actually perform the MCS deployment. Generally speaking, this is a very reliable and trouble-free process, requiring only that the Citrix Virtual Apps & Desktops (CVAD) site have a properly configured so-called “hosting connection” to the target hypervisor cluster (including sufficiently permissioned credentials) and that the user account running the machine catalog creation wizard in Citrix Studio have sufficient permissions to create machine accounts in the selected AD OU, both of which were the case here.
After going through the steps to create a new machine catalog, specifying the MCS option and selecting a fresh snapshot of the (now powered off) source VM, the number of desired machines, their naming convention, and target OU, I saw the MCS process start as always with its first step, “Copying the master image”, and its progress bar quickly get to about halfway, … then appear to stop. Not being familiar with this particular vSphere environment (storage speed, activity, etc.), I wasn’t sure how long the process would take, so I let it run overnight (Note: closing Studio would have immediately canceled the machine catalog creation, so I only locked and disconnected my remote session on the DDC). By morning, however, the progress bar hadn’t moved at all, yet there was no error or timeout indication on screen.
If you’re not familiar with the under-the-covers activity involved in the MCS process, a key step after making a copy of the source image (disk) is to assign it to a dynamically created temporary “preparation VM” (with a name that starts “Preparation – catalog_name …”) and then boot up that VM off-network to perform a few image preparation steps directed by an additional small “instructions disk” also attached to the VM (image preparation mostly entails making certain that DHCP is enabled on all network adapters and re-arming both Windows & MS Office KMS activation). Well, looking in vCenter, there was my preparation VM, already happily booted up, so clearly the image copy step had already succeeded and the issue was that the image preparation step had stalled somewhere, otherwise that VM would already be long gone (image prep is a very quick process, after which the prep VM is powered off and deleted). Now, because this VM is brought up off the network (to avoid name or IP address conflicts), no information about the image preparation steps performed locally (and thus about any encountered issues) is available for reporting by Studio, so the only provided troubleshooting avenue (in my case after canceling the stalled catalog creation process and deleting the resulting “broken” catalog after also deleting the Preparation VM and its disks) is to restart the MCS machine catalog creation process from scratch after adding the following Registry entry to the source VM (i.e. to the “master image”):
Key: HKLM\Software\Citrix\MachineIdentityServiceAgent Value: (REG_DWORD) LOGGING = 1
It is normally also necessary to configure the CVAD site with a temporary setting that prevents the preparation VM from being shut down and deleted after use (or after timeout) in order to allow the examination of the log file created as a result of the above Registry entry (by logging on to the VM’s console). This is done by running the following PowerShell command on a DDC:
Set-ProvServiceConfigurationData -Name ImageManagementPrep_NoAutoShutdown -Value $True
I however saw no need to perform this step since I had already determined that the preparation VM wasn’t being shut down in this particular case, whatever the reason. So, having added the Registry entry to my image and kicked off the machine catalog creation process again, I waited a few minutes for the progress bar to stall as before, waited a few more minutes to give the image preparation process the chance to proceed far enough to get a useful log, then logged on to the prep VM’s console to look at the log file C:\image-prep.log. Here’s a photo (no network, no e-mail!):
Huh?? No visible errors and the last line stating “Completed image preparation”?! So then why the heck … wait a minute, wait a minute: is it possible that the above-mentioned ImageManagementPrep_NoAutoShutdown site setting had been set on some previous troubleshooting occasion and never removed? So off to run some PowerShell on a DDC, and here’s the result:
Sure enough, as confirmed by the Get-ProvServiceConfigurationData cmdlet, the “no auto-shutdown” setting was indeed the cause of my stalled MCS issue. (In retrospect, it was also weird that the normal 20-minute image preparation timeout had never kicked in, but I doubt I’d have suspected the cause even if I’d been conscious of it – it would have seemed more likely that “something” was preventing the forcible shutdown and deletion of the prep VM).
After removing the undesirable site setting (as shown above) using the command
Remove-ProvServiceConfigurationData -Name ImageManagementPrep_NoAutoShutdown
and removing the LOGGING Registry entry from the master image, a fresh MCS machine catalog creation attempt succeeded in about 5 minutes flat!