软件错误检测与纠正技术可靠性研究

Study of the Reliability for Software-Implemented EDAC Technology

  • 摘要: 基于信息冗余的错误检测与纠正(Error Detection and Correction,EDAC)技术是常见的系统级抗单粒子翻转(Single Event Upsets,SEU)的容错方法,软件实现的EDAC技术是硬件EDAC技术的替代方案,通过软件编程,在现有存储段上增加具有纠错功能的编码(Error-correcting Codes,ECC)来实现存储区错误的检测和纠正。分析了软件EDAC方案中,纠错编码的纠错能力及编码效率、刷新间隔、需保护代码量等因素对可靠性的影响,分析和仿真实验结果表明,对于单个粒子引起的存储器随机错误,提高单个码字的纠错能力及编码效率、增大刷新间隔对可靠性的影响不大,而通过缩短任务执行的代码量来提高刷新间隔,以及压缩需保护代码的总量,对可靠性有较大改进。分析结论能够指导工程实践中,在实现资源、实时性、可靠性之间进行优化选择。

     

    Abstract: Abstract: EDAC(Error Detection and Correction)based information redundancy is a well-known system level fault-tolerance technique for SEU(Single Event Upset)in space applications. Software-implemented EDAC technique is a substitute for hardware-implemented EDAC. The encoding and checking program is added to detect and correct memory errors through accommodate extra ECC(Error-correcting Codes). The reliability of software-implemented EDAC would be analyzed for capability and the code rate of error-correcting code, the scrubbing interval and the number of program words protected. Simulation and analysis are presented to the random error of a single-bit upset, that the capability and code rate of the codeword, the scrubbing interval can be increased without appreciably affecting reliability, however reducing the number of program size protected would be obviously improve the reliability. It can provide important reference for application among the memory resource, real-time performance and reliability of choice.

     

/

返回文章
返回