Machine Check Exception

From Wikipedia, the free encyclopedia

A Machine Check Exception is a hardware error which occurs when a computer processor detects an unrecoverable hardware problem.

On Windows. the error is displayed using the highly publicised blue screen of death containing the error message(the parameters inside the brackets vary):

STOP: 0x0000009C (0x00000004, 0x00000000, 0xb2000000, 0x00020151) "MACHINE_CHECK_EXCEPTION"

On linux, it gets written to the kernel log and/or the console screen (usually only to the console when the error is non-recoverable and the machine crashes as a result):

CPU 0: Machine Check Exception: 0000000000000004
Bank 2: f200200000000863
Kernel panic: CPU context corrupt

The error is usually due to failure or overstressing of hardware components where the error cannot be more specifically identified with another error message. Diagnosing the error message can be difficult, although Intel Pentium processors do generate more specific codes which can be decoded by contacting the manufacturer.

MCE's require a restart to continue and often indicate a long term general problem.

Contents

[edit] Problems that cause an MCE

Most of these errors are specific to the Pentium processor family, similar errors may occur on other processors and will cause the same problems.

Here are some of the main hardware problems that cause MCE's:

  • System bus errors (error communicating between the processor and the motherboard)
  • Memory errors that may include parity / Error correction code (ECC) problems. Error checking ensures that data is stored correctly in the RAM, if information is corrupted then random errors occur.
  • Cache errors in the processor, the cache stores important data and code. If this is corrupted errors often occur

[edit] Causes

Normal causes for MCE errors are overheating and incorrect hardware installation. Overheating can cause electrons to become more animated and so escape from the silicon tracks corrupting data. Some specific manual causes could be:

  • Overclocking (naturally increases heat output)
  • Poorly fitted heatsink/fans (the same problem can happen with excessive dust in the CPU fan)

Computer software can also cause errors in this way (normally by corrupting data they are reading or writing). For example:

  • Software performing read/writes to non-existent memory regions confuses the processor / system bus.
  • Certain sequences of operations may trigger processors to become confused - for example if too many programs are used at the same time.

[edit] Decoding MCEs

As noted previously decoding MCE errors can be difficult, normally the manufacturer (especially processor manufacturers) will be able to provide information about specific codes.

[edit] 3rd party programs

mcelog
mcelog is a Linux program to decode MCE's on X64 processors
parsemce
Coded by Dave Jones, parsemce decodes MCE's from AMD k7 processors

[edit] References

[edit] External links