Exchange 2013 - DAG - Failed And Suspended


Happen to be in the Exchange Control Panel and noticed that on our DAG it was listed as "Failed and Suspended" for the status of one of the members. I was perplexed that we didn't catch this from our monitoring or any where else, but that's a whole other issue.  My concern here was it was obviously failed.  Attempting an Update or Resume resulted in no feedback in the ECP and no change in Status.


In this case, this is the message I saw when running Get-MailboxDatabaseCopyStatus in the EMS:

It was here, that as I mentioned above, that the status wasn't changed upon attempting to update or resume the database copy.  I attempted an Update-MailboxDatabaseCopy as one would assume would reseed the database, I even added the -DeleteExistingFiles switch specifically to start from scratch, yet recieved "The seeding operation failed... ...which may be due to a disk failure"

At this point, one would have expected I assumed there was a disk problem.  Having said that I checked to make sure the disk was mounted, even browsed it and assumed this was a typical non-descript error message.  At this point, I decided that the beauty of a DAG and having multiple copies is that I could just "whack" the DB copy and reseed from scratch.  I went through the process of Removing the mailbox DB copy by doing the Remove-MailboxDatabaseCopy:

That proceeded as expected. However as the message states I went to clear the items (specifically the logs) manually and strangely received an error message "Remove-Item: The file or directory is corrupted and unreadable"

At this point, I was surprised and decided that there actually had to be a disk issue. I browsed manually back to the location and attempted to delete a log file manually and received the same popup within Windows.  I was amazed, I actually had a disk problem. This was only strange to me because our underlying disk is actually an NetApp LUN. That LUN actually holds all three DB Copies from each of the three servers in this instance.  So for one disk to be corrupted and not all three (First off Thank God!) I was miffed.  At this point I went ahead and formatted both the Drive that contained the EDB, and the LOG files.

After confirming that the DB Copy Status didn't show the original copy still I went ahead and ran the Add-MailboxDatabaseCopy command to reseed form scratch a copy of the DB.  Wella, it worked and began copying over.

The WHY:

I suspect from looking at the log dates on the server and the time that it was last inspected that it relates to a power outage we sustained.  About 3 weeks back we had a situation where we were getting bad power from both GRIDs that fed our building, and datacenter UPS.  After dealing with bad power, our Emerson UPS decided it had enough and was toggling between battery and no battery power.  Because it was toggling so frequently it actually depleted the batteries.  Despite knowing that we left our systems up while they charged since power seemed to be okay, no flickers, nothing.  Newton struck and before the batteries had enough juice to hit sustain a brown out moment,