When memory goes bad : EDAC errors

| No Comments

Recently I rebooted a storage array and I saw a bunch (one per second) of errors messages spewing on the console and in the message logs.

EDAC MC0: CE page 0x7f579, offset 0x800, grain 128, syndrome 0x70, row 1, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x610, offset 0xa80, grain 128, syndrome 0x70, row 0, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x7f579, offset 0x800, grain 128, syndrome 0x70, row 1, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x7f579, offset 0x800, grain 128, syndrome 0x70, row 1, channel 1, label "": i3000 CE

After a bit of searching I found that this is indicating a correctable error (CE) in memory. My reading also indicated that this might be a sign of impending memory bank failure so I ordered new memory. In the mean time rather than have all that noise in the logs, I wanted to shut it off. By reading the Linux Kernel Documentation on EDAC I was able to figure out the how to shut off the error logging by setting 'edac_mc_log_ce'.

echo 0 > /sys/module/edac_core/parameters/edac_mc_log_ce

and while it was off I could verify that the number of errors was still increasing by looking in 'ce_count'.

cat /sys/devices/system/edac/mc/mc0/ce_count

Leave a comment

About this Entry

This page contains a single entry by John published on February 8, 2010 10:31 AM.

CentOS 5.4 : mismatch_cnt is not 0 was the previous entry in this blog.

AVG 9.0 pretty good; one little snag is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

OpenID accepted here Learn more about OpenID

Subscribe by Email

Enter your email address:

Delivered by FeedBurner