Problem
In project WooCommerce Payments, besides unit and integration tests for PHP and JavaScript, we have E2E (end-to-end) tests for some critical flows through GitHub Actions. The problem we face is that there were so many flaky tests without any clear reason.
Research
A little bit about WooCommerce Payments architecture. Besides running code as a WordPress plugin, it interacts with a few servers through HTTP requests as REST API. For running E2E tests, we do not mock these servers, and they use the real servers.
Initially, we noticed that many tests failed. Looking at them closely, there was no specific pattern. They generally happen randomly. After monitoring them more, I noticed the reason was that many HTTP request responses had error statuses, such as status 503 (Service Unavailable) or 429 (Too Many Requests). They were usually resolved by sending these HTTP requests again, or in the context of E2E tests, running tests once more.
With that in mind, our unified approach for this issue is to re-run failed tests.
Approach and Code!
To re-run failed tests, that can be many things:
- ❌ Trigger the whole E2E tests (GitHub Action) again. This is definitely inefficient as the build and the site setup happens again, and it’s just a waste of time and computing resources.
- ❌ Use the built-in jest.retryTimes() but it does not serve our specific test nature well because Jest will retry failed tests in sequence right away before it goes to the next test. The same issues, such as 429 Too Many Requests, likely happen again.
- ✅ Re-try only specific spec/test file in the new E2E run. This fits perfectly with our organization of tests: many tests rely on previous tests (that’s understandable, e.g. adding a card and then removing it) in the same spec/test file, and many flaky results have only one failed test file.
This PR includes all relevant changes as well as all details. In general, I just extracted the test results of all E2E tests for the first time, filtered out all failed spec files, and re-run these files only. This approach has a lot of advantages beside resolving the main flaky issue:
- Minimal time usage – No need to re-run the whole process, including setting up E2E test environment.
- Minimal change – No E2E setup is altered, and it’s simple to understand.
Lessons Learned
Flakiness is almost unavoidable, especially, in our setup, we use the real server for testing clients. However, we can make the flakiness rate reasonable by understanding what E2E tools offer and the real cause of failures.