Saturday, August 4, 2018

Cross Checking Systems

Figure 1 - Typical Cross checking system.
     An interesting type of redundant system is the cross checking system.  Cross checking systems increase the system integrity.   In this system two identical computer systems run the processing in parallel.  If the processing does not exactly match, a system fail is initiated.  Cross checking systems are one way systems are designed to protect against single bit errors.
     The block diagram to the left depicts a typical system.  Here are two processors, the control and the monitor processors.  The system inputs must be identically applied to both processor.  They may be sensor inputs, inputs for control, clocking,...  In a deterministic system, both of these processors should have identical outputs for identical inputs.  Both processors should calculate the same exact system output.  Both processors monitor the other processors system output.   Both processors compare their calculated output to the other processors output.  If they do not match, the output line 'Mismatch' is asserted.  The "System Fail" output is asserted if either processor declares a mismatch.  There are two interesting things to note with this system.  First is that only the output from the Control processor is actually used.  The second is that although system integrity is increased, system reliability (which is measured in FIT or MTBF) will decrease.  This is because if either processor fails, then the system fails.  Although MTBF decreases, what is gained is the knowledge that the system has failed.
     In applications such as safety where integrity is important, cross checking systems are a very practical design possibility.

Saturday, June 23, 2018

Byzantine networks and the Byzantine Generals Problem

In some applications, networks and computer systems must be safe. Typically what we do is to use redundancy to protect against failures. If one component fails, another one takes over. As the tolerance for failure decreases, the cost and complexity of some of these systems increase. Byzantine networks study what happens when we have very little tolerance for failure. In fact, they consider what happens when one component does not just fail, but that component is malicious and tries to make the whole system fail.

The Byzantine Generals are a metaphor for a distributed network. In this metaphor the Generals are each computers networked together, each giving orders to their connected components. What happens when one General is 'malicious'. I.e. it fails in a way that it tries to take down other generals? What protocol should the generals adhere to so they don't follow a malicious generals? So the rest of the system operates normally.

Other than Byzantine networks there are other, less stringent architectures to protect against failures. These other solutions, such as redundancy and voting, relax the constraint of failure in various ways.

Following is a Link for a proposal to identify malicious players in Byzantine Network. This proposal assumes the network is a modern TCP/IP network, so 'omission' faults are identified by the underlying network. Click here to read the research proposal..