Our paper titled “Just-in-Time Flaky Test Detection via Abstracted Failure Symptom Matching” has been accepted into the industry track of ICSME 2024. This is a collaboration between COINSE, SAP (Germany), and SAP Labs (South Korea). Congratulations!

The paper proposes a very simple idea: when a test fails due to its flakiness, the symptoms (i.e., error messages, logs, etc) may be different from those observed in non-flaky failures of the same test. Based on this intuition, we first build a database of known flaky symptoms, using the existing human labels at SAP. Subsequently, we decide whether new, incoming test failures are flaky or not by matching the textual symptoms. To make the matching more robust, we process the symptoms via “abstraction” (e.g., mask concrete IP addresses).

Overview

The achieves about 96% precision when evaluated against real world historical test data from SAP, while saving 58% of machine time that would have been used for reruns. Once again, the results show that simplicity is a huge benefit. SAP is working to integrate this into their CI/CD pipeline.