Intermittent fault
An intermittent fault, often called simply an "intermittent", is a malfunction of a device or system that occurs at intervals, usually irregular, in a device or system that functions normally at other times. Intermittent faults are common to all branches of technology, including computer software. An intermittent fault is caused by several contributing factors, some of which may be effectively random, which occur simultaneously. The more complex the system or mechanism involved, the greater the likelihood of an intermittent fault.
A simple example of an effectively random cause in a physical system is a borderline electrical connection in the wiring or a component of a circuit, where (cause 1, the cause that must be identified and rectified) two conductors may touch subject to (cause 2, which need not be identified) a minor change in temperature, vibration, orientation, voltage, etc. (Sometimes this is described as an "intermittent connection" rather than "fault".) In computer software a program may (cause 1) fail to initialise a variable which is required to be initially zero; if the program is run in circumstances such that memory is almost always clear before it starts, it will malfunction on the rare occasions that (cause 2) the memory where the variable is stored happens to be non-zero beforehand.
Intermittent faults are notoriously difficult to identify and repair ("troubleshoot") because each individual factor does not create the problem alone, so the factors can only be identified while the malfunction is actually occurring. The person capable of identifying and solving the problem is seldom the usual operator. Because the timing of the malfunction is unpredictable, and both device or system downtime and engineers' time incur cost, the fault is often simply tolerated if not too frequent unless it causes unacceptable problems or dangers. For example, some intermittent faults in medical life support equipment can kill a patient.
If an intermittent fault occurs for long enough during troubleshooting, it can be identified and resolved in the usual way.
Some techniques to resolve intermittent faults are:
- Automatic logging of relevant parameters over a long enough time for the fault to manifest can help; parameter values at the time of the fault may identify the cause so that appropriate remedial action can be taken.
- Changing operating circumstances while the fault is present to see if the fault temporarily clears or changes. For example, tapping components, cooling them with freezer spray, heating them. Striking the cabinet may temporarily clear the fault.
- a database of similar faults which have been resolved in identical or similar equipment[1]
- precautionary changes, without attempting to pinpoint the fault. For example, electrolytic capacitors subject to high ripple currents can be changed as a routine measure, without bothering to troubleshoot the fault at all. Connectors can be disconnected and reseated. This is sometimes a measure of desperation; things are changed until the fault stops happening, and it is hoped that it is actually resolved rather than dormant.
- In electrical systems and cable systems, time domain reflectometry techniques are used: pulses are sent down electric wiring and the pulses reflected back are examined for anomalies, for example intermittent leakage during the stresses of aircraft operation.[2]
References
- ↑ Example of an intermittent TV fault in a database : "Z3T CHASSIS - NO START UP - INTERMITTENT. D1124 (5.1V) ZENER LEAKY"
- ↑ "Spread Spectrum Time Domain Reflectometry for Locating Intermittent Faults" Furse, Cynthia; Smith, Paul; IEEE SENSORS JOURNAL, VOL. 5, NO. 6, DECEMBER 2005"
External links
- A discussion of software debugging
- Sci.electronics.repair FAQ, see section "Troubleshooting of Intermittent Problems"