The default reflex when a deployment fails is usually to look at the server logs, understand (perhaps guess) what has gone wrong, modify the deployment and re-run.  If the error is obvious, then this is probably the easiest and most straightforward tactic.


However, if the logs are not clear, then it is better to debug the deployment while it is running.


First ensure that you've used the option to keep the deployment running "always" (on success and on error).  This will prevent SlipStream from terminating the machines when the SlipStream processing ends.

Once the deployment ends (possibly in a failure state), log into the machine via SSH.  The addresses (and URLs) can be found in the runtime properties of each deployed machine. 

On the deployed machine, set up the SlipStream environment for the client:

$ source /opt/slipstream/client/slipstream.setenv


SlipStream tries hard not to disrupt the VMs it deploys, so setting up the SlipStream client environment is not done by default.


If your deployment has failed, any further calls to SlipStream commands such as ss-get and ss-set will also fail. This behavior is there to ensure that failures happen quickly, without waiting for potentially long timeouts.  To reset the abort flag, issue the following command:


$ ss-abort --cancel


From this point on, you will be able to re-run the deployment execution scripts, which are available in the "/tmp/tmp*" files.  You can then iterate on these scripts until you've understood and corrected any deployment problems.


Be sure to copy the corrected scripts back into the node definitions!