Archives

  • 2018-07
  • 2018-10
  • 2018-11
  • 2019-04
  • 2019-05
  • 2019-06
  • 2019-07
  • 2019-08
  • 2019-09
  • 2019-10
  • 2019-11
  • 2019-12
  • 2020-01
  • 2020-02
  • 2020-03
  • 2020-04
  • 2020-05
  • 2020-06
  • 2020-07
  • 2020-08
  • 2020-09
  • 2020-10
  • 2020-11
  • 2020-12
  • 2021-01
  • 2021-02
  • 2021-03
  • 2021-04
  • 2021-05
  • 2021-06
  • 2021-07
  • 2021-08
  • 2021-09
  • 2021-10
  • 2021-11
  • 2021-12
  • 2022-01
  • 2022-02
  • 2022-03
  • 2022-04
  • 2022-05
  • 2022-06
  • 2022-07
  • 2022-08
  • 2022-09
  • 2022-10
  • 2022-11
  • 2022-12
  • 2023-01
  • 2023-02
  • 2023-03
  • 2023-04
  • 2023-05
  • 2023-06
  • 2023-07
  • 2023-08
  • 2023-09
  • 2023-10
  • 2023-11
  • 2023-12
  • 2024-01
  • 2024-02
  • 2024-03
  • br Related work Goal of this section

    2019-07-13


    Related work Goal of this section is to show related work focused on methods useful to assess the reliability of clustered COTS-based systems. Connelly et al. [1] propose an approach to delineate safety assurance for the use of COTS OS in safety-related applications, which must fall in SIL2. Their proposed solution is focused on developing encapsulation mechanisms able to isolate the influence of a COTS failure. They analyzed OS failure modes, which may affect safe functionalities, and then proposed mitigation techniques whose impact is not reported. Unlike [1], our work provides an entire methodology with tangible and quantified results that can be actually useful for other use. Qualitative estimations are definitely needed but not enough. Our contribution is a rigorous analysis of models and experimental results of the clustered system that gives deep evidence on the system reliability. Jones et al. [5] survey available practical methods useful to assess the safe integrity of COTS-based systems with standard IEC61508. Authors propose the adoption of testing techniques like stress, interface and statistical test (black-box methods) or tools available to check software, analyze data flows, and inject faults (white-box methods). Limitations of both methods are presented, i.e., the lack of failure data confidential to the supplier, the difficulty when computing results of tests without automated mechanisms, the problem of optimizing time-consuming tests, the difficulty of covering a wide range of software faults. Our paper does more than that: it defines an assessment methodology flow and shows an implementation on a real use case with also on-field evaluations. Park et al. [4] evaluate the effect of rejuvenation actions on the availability of an Active/Standby cluster. Authors defined a generic Markov model through which estimate how the system availability varies by changing repair time and tnf alpha inhibitor of rejuvenation. Main weaknesses of this paper that we contribute to address are: (i) a purely theoretical estimation without evidence or proves from real world; (ii) an estimation of the best rejuvenation period without considering that the aging of the system depends on aging factors and so needs tnf alpha inhibitor empirical evidence; (iii) the underestimation of the possibility that the cluster resource manager fails, which is something that can certainly occur. Skramstad et al. [2] perform a study of the possible solutions proposed by the academic and industrial community to address the certification of critical systems composed of COTS to a specific SIL level. They got to three different considerations: one could be to supervise the memory through memory mapped storage or by calculating checksums regularly; another one could be to test the system even though this is an unfeasible solution when the COTS is represented by a whole operating system; and finally, to diversify the adoption of COTS components to avoid common failure modes. This study, unlike ours, is fairly superficial, authors report few approaches available in literature without providing a comprehensive analysis. Finally, Pierce et al. [3] present a detailed report on how to assess Linux for SRS. This document provides guidelines that should set more guarantees on the OS reliability. Many OS features are identified as possible sources of failure and for this reason they should be disabled, i.e., the developer should create a monolithic kernel with a minimum number of functionalities. The conclusion of the study is that Linux, properly tuned, would be suitable for use in many safety related applications with SIL1 and SIL2 requirements. Such a work is indeed of value for the amount of information provided, but at the same time the effective impact of mitigation techniques on the reliability is not predicted. What emerges from literature is the lack of quantitative estimations and practical examples of certification assessment in industrial applications. Several techniques or approaches are presented but none of them can be actually of support for the certification. This work wants to bridge the gap by providing an entire verified approach to face the SIL2 certification of already existent systems set up.