Machine Check Exception
From Wikipedia, the free encyclopedia
A Machine Check Exception, also called MCE, is a computer hardware error which occurs when a computer's central processing unit detects an unrecoverable hardware problem.
On Windows, the error is displayed using the blue screen of death containing the error message(the parameters inside the brackets vary):
STOP: 0x0000009C (0x00000004, 0x00000000, 0xb2000000, 0x00020151) "MACHINE_CHECK_EXCEPTION"
On Linux, it gets written to the kernel log and/or the console screen (usually only to the console when the error is non-recoverable and the machine crashes as a result):
CPU 0: Machine Check Exception: 0000000000000004 Bank 2: f200200000000863 Kernel panic: CPU context corrupt
The error is usually due to failure or overstressing of hardware components where the error cannot be more specifically identified with a different error message. Diagnosing the error message can be difficult, although Intel Pentium processors do generate more specific codes which can be decoded by contacting the manufacturer.
MCEs require a restart of the system to continue normal operation and often indicate a long term problem of a general nature.
Contents |
[edit] Problem types
Most of these errors are specific to the Pentium processor family, similar errors may occur on other processors and will cause similar problems.
Here are some of the main hardware problems that cause MCEs:
- System bus errors (error communicating between the processor and the motherboard).
- Memory errors that may include parity / Error correction code (ECC) problems. Error checking ensures that data is stored correctly in the RAM, if information is corrupted then random errors occur.
- Cache errors in the processor, the cache stores important data and code. If this is corrupted errors often occur.
[edit] Causes
Normal causes for MCE errors are overheating and/or incorrect hardware installation. Overheating can cause electrons to become more animated and thus escape from the silicon tracks, resulting in corrupted data. Some specific manually induced causes could be:
- Overclocking (naturally increases heat output)
- Poorly fitted heatsink/computer fans (the same problem can happen with excessive dust in the CPU fan)
Computer software can also cause errors in this way (normally by corrupting data they are reading or writing). For example:
- Software performing read or write operations to non-existent memory regions which leads to confusion for the processor and/or the system bus.
[edit] Decoding MCEs
As noted previously, decoding MCE errors can be difficult. Normally the manufacturer (especially processor manufacturers) will be able to provide information about specific codes. Consult the Intel 64 and IA-32 Architectures Software Developer's Manual[1] Chapter 14, or the Microsoft KB Article on Windows Exceptions[2].
[edit] Programs to Decode MCEs
- mcat
- A Windows command-line program from AMD to decode MCEs from AMD K8, Family 0x10 and 0x11 processors
- mcelog
- A Linux program by Andi Kleen to decode MCEs from x86-64 processors
- parsemce
- A Linux program by Dave Jones to decodee MCEs from AMD K7 processors