How to Split a Netscaler AGEE Pair up Safely.
Post by Sam Jacobs:
This post explains how to (safely!) split a Netscaler (AGEE) High-Availability (HA) pair.
An AGEE HA pair functions in an active/passive mode, which means that all traffic flows through the Netscaler currently marked as Primary. As long as the primary Netscaler is up and functioning (and the Secondary appliance can confirm connectivity), the secondary Netscaler will sit quietly in the background. Should the secondary appliance lose connectivity to the primary, it will assume that it is down, and will begin handling all traffic. You can also deliberately cause a failover to the secondary appliance via the force failover command.
It is sometimes necessary to temporarily split an AGEE HA pair. This would be done, for example, if you wished to upgrade the appliance firmware. Simply turning off HA, while seemingly intuitive, would be disastrous. With the exception of the individual Netscaler IP (NSIP) address, all other load-balancing and Access Gateway IP addresses are shared between the appliances. If HA is simply turned off, BOTH appliances will assume that they are primary, and will attempt to handle traffic. Duplicate IP addresses will begin appearing on the network, and ARP tables will become corrupt. The result will be that some users might be able to connect, but most will not.
Citrix TV has an excellent video by Ronan O’Brien on splitting an HA pair: http://www.citrix.com/tv/#videos/1414.
As I found out the hard way, however, one simple, yet quite important step was left out (see below) – hence the impetus for this blog post. To safely split the pair, back up the ns.conf file on both appliances, and open a PuTTY session to each. Then, perform the steps below in the order specified:
|On Primary||On Secondary|
|1. set node -hastatus DISABLED|
|2. set node -hasync DISABLED|
|3. set node -haprop DISABLED|
|4. clear config full|
|5. save config|
|6. rm node 1|
|7. save config|
Step 1 tells the secondary appliance to stop participating in HA.
Step 2 tells the secondary appliance to stop receiving configuration updates from the primary.
Step 3 tells the primary to stop propagating configuration updates to the secondary.
Step 4 clears the entire configuration (make sure you have a backup!) of the secondary, with the exception of the NSIP.
Step 5 saves the secondary configuration, so that you don’t reconnect to the primary when you reboot the appliance (this is the step missing in the video above!).
Step 6 removes the secondary node from the primary’s HA configuration
Step 7 saves the primary configuration.
You can now update the secondary appliance without worrying that it will affect production users.