February 2010 Archives

When memory goes bad : EDAC errors

| No Comments

Recently I rebooted a storage array and I saw a bunch (one per second) of errors messages spewing on the console and in the message logs.

EDAC MC0: CE page 0x7f579, offset 0x800, grain 128, syndrome 0x70, row 1, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x610, offset 0xa80, grain 128, syndrome 0x70, row 0, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x7f579, offset 0x800, grain 128, syndrome 0x70, row 1, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x7f579, offset 0x800, grain 128, syndrome 0x70, row 1, channel 1, label "": i3000 CE

After a bit of searching I found that this is indicating a correctable error (CE) in memory. My reading also indicated that this might be a sign of impending memory bank failure so I ordered new memory. In the mean time rather than have all that noise in the logs, I wanted to shut it off. By reading the Linux Kernel Documentation on EDAC I was able to figure out the how to shut off the error logging by setting 'edac_mc_log_ce'.

echo 0 > /sys/module/edac_core/parameters/edac_mc_log_ce

and while it was off I could verify that the number of errors was still increasing by looking in 'ce_count'.

cat /sys/devices/system/edac/mc/mc0/ce_count

About this Archive

This page is an archive of entries from February 2010 listed from newest to oldest.

January 2010 is the previous archive.

March 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.

OpenID accepted here Learn more about OpenID

Subscribe by Email

Enter your email address:

Delivered by FeedBurner