Why do system-level failures still occur despite fault tolerance techniques being deployed in systems?
From a development perspective, the tight integration of a large number of components creates many potential failure modes caused by interactions that cannot be discovered by unit testing. In this project, our focus is on identifying system-wide design rules that must be satisfied in order to limit propagation of seemingly minor faults throughout the system.
Our objectives in this project are to
Our approach is to build architectural models using the Architecture Analysis and Design Language (AADL) to identify system fault behaviors that are not addressed by component-fault containment techniques, to develop a formalized analysis framework for system fault containment and stability management, and to validate system architectures in the context of this framework.
Our model-based analytic framework for this investigation includes