Machine Check Exception
From Wikipedia, the free encyclopedia
A Machine Check Exception is a hardware error which occurs when a computer processor detects an unrecoverable hardware problem.
On Windows. the error is displayed using the highly publicised blue screen of death containing the error message(the parameters inside the brackets vary):
STOP: 0x0000009C (0x00000004, 0x00000000, 0xb2000000, 0x00020151) "MACHINE_CHECK_EXCEPTION"
On linux, it gets written to the kernel log and/or the console screen (usually only to the console when the error is non-recoverable and the machine crashes as a result):
CPU 0: Machine Check Exception: 0000000000000004 Bank 2: f200200000000863 Kernel panic: CPU context corrupt
The error is usually due to failure or overstressing of hardware components where the error cannot be more specifically identified with another error message. Diagnosing the error message can be difficult, although Intel Pentium processors do generate more specific codes which can be decoded by contacting the manufacturer.
MCE's require a restart to continue and often indicate a long term general problem.
Contents |
[edit] Problems that cause an MCE
Most of these errors are specific to the Pentium processor family, similar errors may occur on other processors and will cause the same problems.
Here are some of the main hardware problems that cause MCE's:
- System bus errors (error communicating between the processor and the motherboard)
- Memory errors that may include parity / Error correction code (ECC) problems. Error checking ensures that data is stored correctly in the RAM, if information is corrupted then random errors occur.
- Cache errors in the processor, the cache stores important data and code. If this is corrupted errors often occur
[edit] Causes
Normal causes for MCE errors are overheating and incorrect hardware installation. Overheating can cause electrons to become more animated and so escape from the silicon tracks corrupting data. Some specific manual causes could be:
- Overclocking (naturally increases heat output)
- Poorly fitted heatsink/fans
Computer software can also cause errors in this way (normally by corrupting data they are reading or writing). For example:
- Software performing read/writes to non-existent memory regions confuses the processor / system bus.
- Certain sequences of operations may trigger processors to become confused - for example if too many programs are used at the same time.
[edit] Decoding MCEs
As noted previously decoding MCE errors can be difficult, normally the manufacturer (especially processor manufacturers) will be able to provide information about specific codes.
[edit] 3rd party programs
- mcelog
- mcelog is a Linux program to decode MCE's on X64 processors
- parsemce
- Coded by Dave Jones, parsemce decodes MCE's from AMD k7 processors
[edit] References
- Microsoft help article on Windows exceptions.