Correcting Array Parity Errors (APE) on DDN S2A9900

Robert Leong -

[work in progress]

Overview - There are two ways to correct APEs.

1) Global catch all method.  Failing two channels on the tier and do a rebuild, but this looses all RAID redundancies and leaves the RAID vulnerable if a third drive fails.  (Typical RAID 6 setup)

2) Surgical method.  Use 'cache noderebuild' to specifically repair a known LBA, this requires failing only one channel.

APEs are detected in two ways.  1) A result of tier or LUN verify within the S2A9900.  2) When a Fibre Channel client touches a block on the DDN which contains APE.   They are distinguished by the presences or absence of a WWN entry in the log when the APE occurs.   If a WWN is there then it is triggered by client side.  

 The 'faults' command shows a summary of all APEs if they exist.  The counts are number of occurrences APE are detected and not the number of APEs contained within that tier/LUN, if the same APE is read 1000 times, then the count will show 1000.

 

cache nodeinfo=more

 

cache noderebuild

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.