Exchange 2010 DAG fails over to secondary node intermittently

February 3rd, 2014


I encountered a problem where occasionally Exchange DAG would fail over to the secondary mailbox server.  I noticed that the fail over corresponded with my nightly Veeam backups.

Prior to actually backing up data, Veeam takes a VMWare snapshot of the virtual machine, then backs up the snapshot.  The “stun” period during the snapshot is a period of time when VMWare is committing the snapshot where network connectivity can be lost.   In my case the stun period can involve dropping one or two pings to the virtual machine and was enough to cause Exchange DAG to identify the server as unhealthy and fail the DAG over to the secondary.

The article recommends reducing the sensitivity of the Microsoft cluster service, which Exchange DAG resides on in order to fix the problem.  After the changes no more unintended fail overs occur in my environment.


