Continuous offline archiving of EMC VNX array performance data

May 8th, 2014 No comments

 

The EMC VNX arrays do not offer a good (inexpensive) way to archive performance data continuously to a management server for future retrieval.  You CAN turn on performance data logging and have it periodically archive to the array itself, but I prefer not to have multiple GB of archived performance data on the same array I may be troubleshooting in the future, not to mention that is one more item to review on my maintenance checklist.

 

Turning on performance data logging

First thing, to enable performance monitoring and generate NAR files check (and uncheck) the following options in Unisphere > System > Monitoring & Alerts > Statistics > Performance Data Logging:

 

vnx_perf_data_archiving_screenshot_1

 

Next click start to start performance logging.  Re-verify you have unchecked the “stop automatically after” option.  The array will periodically archive performance data to .nar files on the array itself.  In my environment the array archives to a nar file about once every 12 hours for each storage processor.  You may force the array to archive to a .nar file by stopping then starting the data logging.

Note: In order to review NAR files after they are generated you must have the Unisphere Analyzer enabler installed on the array, otherwise you will have to engage EMC support to review the performance logs for you.

 

Retrieving performance logs from the array and archiving to a server

Install naviseccli on a server, then edit the below vbscript code, entering your own values for the IP addresses of the SPs, user, password, and file path.  Create a scheduled task that executes cscript.exe against the vbscript code on the server on a daily basis.  The script places a call to each SP, stores all NAR files on that SP to the directory of your choosing, then deletes all the NAR files from that SP.

 

'grab perf logs from array then delete logs off array

Set objShell = WScript.CreateObject("WScript.Shell")
Set objExecObject = objShell.Exec("cmd /c naviseccli -Address <ip of SP_A> -User <san username> -Password <san password> -Scope 0 analyzer -archive -all -o -path <folder path (ex: C:\EMC\data_archive)>
")
WScript.Sleep 60000
Set objExecObject = objShell.Exec("cmd /c naviseccli -Address <ip of SP_B> -User <san username> -Password <san password> -Scope 0 analyzer -archive -all -o -path <folder path (ex: C:\EMC\data_archive)>
WScript.Sleep 60000
Set objExecObject = objShell.Exec("cmd /c naviseccli -Address <ip of SP_A> -User <san username> -Password <san password> -Scope 0 analyzer -archive -delete -all -o")
WScript.Sleep 60000
Set objExecObject = objShell.Exec("cmd /c naviseccli -Address <ip of SP_B> -User <san username> -Password <san password> -Scope 0 analyzer -archive -delete -all -o")

 

Now you have continuously archived data from your array that you can now open in Unisphere Analyzer to review array performance.

 

 

 

Categories: EMC VNX Tags:

Proactive copy to hot spare on EMC VNX array

March 5th, 2014 1 comment

 

 

SAN errors, oh no!

 

The process started when the array emailed me a couple of soft media errors, so I glanced at the SPA and SPB event logs in Navisphere and saw this:

2014.3.5_initial_errors

Notice that the majority of errors showed up as informational and not warning or critical, meaning the array will not indicate to anyone that this drive is about to fail, yeesh.

Note also that all the errors occur on the same disk, Bus 0 Enclosure 1 Disk 7, a NLSAS 2TB drive in my environment.

Event codes included 0x6a0, 0x820, and 0x801 with descriptions of disk soft media error, soft scsi bus error, and soft media error.  My suggestion is to filter only on description, and do searches for “error” to find all the messages.

Reviewing the disks within the navisphere GUI showed that no disks were faulted, the dashboard showed no errors, and the hot spare was not in use, meaning the array did not believe the drive should be failed yet.


Time to be proactive…

 

I opened a case with EMC support, and sent them screenshots of the SP event logs, and SPCollects from both SPA and SPB, and noted that the error occurred on the same drive over 100 times in one day.  The support representative immediately requested I do a proactive copy to the hot spare disk, and ordered a replacement disk sent to me.

A proactive copy is preferred because instead of requiring the array to rebuild the RAID array to a new disk (and endure the performance degradation inherent in this procedure), it copies the data from one disk to another, then tells the RAID array to use the hot spare disk, then disables the failing disk, skipping the rebuild process altogether and hence no RAID rebuild performance degradation.

I first tried to do the proactive copy from the navisphere gui without success, below.

2014.3.5_proactivecopy_greyedout

Note the option to copy is greyed out.  Apparently VNXs new mixed RAID storage pools prevents this option from being used, so I moved onto the CLI.

Passing the command
naviseccli -Address <SP IP address> -User <user> -Password <password> -Scope 0 copytohotspare 0_1_7 -initiate
(where 0_1_7 is the failing disk) worked correctly, starting the proactive copy from the failing disk to the hotspare of the same type.

 

 

Checking progress of the proactive copy

 

Now to check progress of the copy…

First I tried looking at the disks in the GUI

2014.3.5_noprogress_gui_1

The disk state is listed as “Copying to Hot Spare(100%)”.  Hmm, 100% doesn’t seem right, I just started this procedure.  (Looking at the RAID LUN within the GUI showed the state as “transitioning” without any progress indicators)

Then I tried through the CLI

2014.3.5_noprogress_cli

Well that also doesn’t look right, either.  I continued on looking at SPCollect logs, SP event logs, and looking all over the Internet, including the EMC community forums, without finding any answers.  (running getlun on the RAID LUN from the CLI didn’t show any progress indicators also)

 

Update: I figured out a way to gauge progress, though a bit crude.  See the update at the bottom.

 

Eventually after 19 hours (NLSAS 2TB) the process completed, throwing event logs 0x6b0, 0x604, 0x67d, 0x6a8, 0x67c, 0x7a7, 0x6ab0 along with many others indicating it had marked the failing drive as failed as expected.

2014.3.5_proactivecopy_completed

Complete list of event logs thrown as part of the proactive copy completion:
0x6b0, 0x712d4601, 0x906, 0x7a7, 0x608, 0x6a1, 0x602, 0x7a5, 0x6a8, 0x712789a0, 0x67b, 0x67c, 0x603, 0x602, 0x712d0508, 0x604, 0x712d0507, 0x2580, 0x906, 0x7a6, 0x799, 0x712d4602, 0x712d4601, 0x67d,  0x7400, 0x740a, 0x2580and probably some others I missed.

 

At this point I gave the EMC CE a call and scheduled replacement of the drive.  Once the replacement drive is in place, the array should copy the information on the hot spare back to the replacement drive, then mark the hot spare drive as available again.


Update
: It appears a progress indicator of sorts is included in the lustat command in the SPCollect logs.  By running SPCollects over and over again I can gauge the progress of proactive copies and equalizations when the drive is replaced.

progress_indicator_found

This information is contained within the SPCollect zip file, within the *_sus zip file, within the SPx_cfg_info.txt file.  By looking at the EQZ percentage, I can gauge roughly when it will finish, and more importantly that the equalization or proactive copy is progressing.


Update
: I came across a new EMC article today that may have a command that will work to gauge progress better.  I have not attempted it myself yet.

From:
https://community.emc.com/docs/DOC-7962

 

emc copy to hotspare progress

 

 

Categories: EMC VNX Tags:

Exchange 2010 DAG fails over to secondary node intermittently

February 3rd, 2014 No comments

 

I encountered a problem where occasionally Exchange DAG would fail over to the secondary mailbox server.  I noticed that the fail over corresponded with my nightly Veeam backups.

Research led me to the following Veeam article:  http://www.veeam.com/kb1744

Prior to actually backing up data, Veeam takes a VMWare snapshot of the virtual machine, then backs up the snapshot.  The “stun” period during the snapshot is a period of time when VMWare is committing the snapshot where network connectivity can be lost.   In my case the stun period can involve dropping one or two pings to the virtual machine and was enough to cause Exchange DAG to identify the server as unhealthy and fail the DAG over to the secondary.

The article recommends reducing the sensitivity of the Microsoft cluster service, which Exchange DAG resides on in order to fix the problem.  After the changes no more unintended fail overs occur in my environment.

 

Categories: Exchange 2010 Tags:

setup was unable to create a new system partition error while trying to install windows 7

June 5th, 2009 No comments

At the very beginning of the install process, windows 7 rc 1 throws an error in the partition selection stage that it is unable to create a new partition.  No combination of deleting/formatting/etc of the partitions fixes the error.

In order to resolve the error, I restarted the machine and changed the boot order in the bios to boot directly from the future OS drive.   After that, windows 7 installed correctly.

Thanks to arstechnica for the fix.

Categories: windows 7 Tags:

Forcing Excel 2007 To Use Multiple Windows

April 17th, 2009 No comments

Excel 2007 by default opens multiple documents in the same window.  I absolutely hate that behavior, and maybe you do to.  Here is how to fix it:

Go to

My Computer
Tools
Folder Options
File Types
Choose XLSX
Go to Advanced

Uncheck “browse in same window” in advanced window.

Highlight Open

Edit

Make sure in the Action box it says &Open

Copy one of the following and paste into “application used to perform action” (Check whether current path has OFFICE11 or OFFICE12 then copy and paste one of the following:

“C:\Program Files\Microsoft Office\OFFICE11\EXCEL.EXE” “%1″

“C:\Program Files\Microsoft Office\OFFICE12\EXCEL.EXE” “%1″

Check the box next to use DDE

Remove anything that is in DDE Message box and DDE Application Not Running box.

Make sure the application box says: EXCEL

And in the Topic box it says: System

Thanks to Jonathan for the fix.

Stopping The “Beep” When Using The Volume Control In Windows

February 17th, 2009 No comments

One of my top 5 most annoying things ever is the “beep” that occurs in windows xp when moving the volume control on a non factory windows installation.

Here is how to fix it:

The steps are;
1. Right-click on My Computer
2. Click properties
3. On the Hardware tab, click on [Device Manager]
4. On the “View” menu, select “Show hidden devices”
5. Under “Non-Plug and Play Drivers”, right-click “Beep”
6. Click “Disable”
7. Answer [Yes] when asked if you really want to disable it
8. Answer [No] when asked if you want to reboot
9. Right-click “Beep” again.
10. Click “Properties”
11. Click [Stop]
12. On the “Driver” tab, set the Startup type to Disabled
13. Click [OK]
14. Answer [No] when asked if you want to reboot

Thanks to annoyances.org for the fix.

Dell Vostro 1700 XP Audio Driver Will Not Install

February 17th, 2009 No comments

I received an error when trying to install the Dell sugggested audio driver for XP on my Vostro 1700. Instead I installed the audio driver for the Inspiron 1500 from the Dell site and was up and running right away.

Thanks to howtofixcomputers.com for the fix.

Dell Vostro 1700 1390 WLAN XP Driver Will Not Install

February 17th, 2009 No comments

I was unable to install a XP wireless driver on my Dell Vostro 1700.  The 1700 is using a dell (actually its a broadcom rebranded) 1390 WLAN mini pci wireless card.  Instead of using the driver on the dell support site under the service tag for your device, use this driver on the dell site.

Thanks to this link at the dell forums for this fix.

Path to php.ini is incorrect in phpinfo() on Windows

January 28th, 2009 No comments

I just did a manual install of php on a windows xp box using IIS and no mysql server.  The mysql server is on another box.  However, in order to get my scripts to work with the mysql server, I need to load the php_mysql extension within the php.ini file.

I was able to get PHP running correctly (the file-level permissions are really picky if your running into trouble) but I  noticed from phpinfo() that the mysql extension was not showing up and that path to php.ini was not correct.

By default PHP recommends placing the php.ini file in the same directory as the other php files, usually something like c:\php.  However, even when the windows system path variable is setup and the machine rebooted, php is not looking for its own .ini file.

The resolution is to place the php.ini file in the C:\windows directory and restart the web service.  Apparently php is compiled with the windows directory as default and doesn’t want to change the path.

Deleting a Windows Service Manually

January 23rd, 2009 No comments

sc delete [servicename]

You should probably check to see that the service is stopped, and use this as a last resort.

Categories: Windows 2003 Server Tags: