E2E Puppeteer: Flaky Test Failures and Retries

Problem

In project WooCommerce Payments, besides unit and integration tests for PHP and JavaScript, we have E2E (end-to-end) tests for some critical flows through GitHub Actions. The problem we face is that there were so many flaky tests without any clear reason.

Research

A little bit about WooCommerce Payments architecture. Besides running code as a WordPress plugin, it interacts with a few servers through HTTP requests as REST API. For running E2E tests, we do not mock these servers, and they use the real servers.

Initially, we noticed that many tests failed. Looking at them closely, there was no specific pattern. They generally happen randomly. After monitoring them more, I noticed the reason was that many HTTP request responses had error statuses, such as status 503 (Service Unavailable) or 429 (Too Many Requests). They were usually resolved by sending these HTTP requests again, or in the context of E2E tests, running tests once more.

With that in mind, our unified approach for this issue is to re-run failed tests.

Approach and Code!

To re-run failed tests, that can be many things:

  1. ❌ Trigger the whole E2E tests (GitHub Action) again. This is definitely inefficient as the build and the site setup happens again, and it’s just a waste of time and computing resources.
  2. ❌ Use the built-in jest.retryTimes() but it does not serve our specific test nature well because Jest will retry failed tests in sequence right away before it goes to the next test. The same issues, such as 429 Too Many Requests, likely happen again.
  3. ✅ Re-try only specific spec/test file in the new E2E run. This fits perfectly with our organization of tests: many tests rely on previous tests (that’s understandable, e.g. adding a card and then removing it) in the same spec/test file, and many flaky results have only one failed test file.

This PR includes all relevant changes as well as all details. In general, I just extracted the test results of all E2E tests for the first time, filtered out all failed spec files, and re-run these files only. This approach has a lot of advantages beside resolving the main flaky issue:

  • Minimal time usage – No need to re-run the whole process, including setting up E2E test environment.
  • Minimal change – No E2E setup is altered, and it’s simple to understand.

Lessons Learned

Flakiness is almost unavoidable, especially, in our setup, we use the real server for testing clients. However, we can make the flakiness rate reasonable by understanding what E2E tools offer and the real cause of failures.

PHP and JavaScript: Produce the Same Result for Now and Future

Problem

In WooCommerce Payments, we have a long-standing issue with displaying fee details in order notes. Before that, my team has already had logics in JavaScript to extract data from JSON (fetching via REST API request), and dynamically display fee details in a React component. The issue I am working on is to build static HTML content (order note) from the same output but this code must be used in PHP.

Approaches

The first option seems reasonable and avoids duplication. It’s updating the current JavaScript code to generate HTML, which can be used for both order notes and React components. However, we quickly noticed that this can be complicated due to:

  1. Plenty of changes must happen so that the current generation logic can be switched from React component to HTML.
  2. React consuming and displaying HTML content is not safe. Function name dangerouslySetInnerHTML declares that safety concern explicitly.
  3. For PHP to get this HTML content, it needs to send a request to the site itself for triggering the JavaScript code. It requires even more thoughts to how to approach that correctly and safely.

And then, the second option is to write new PHP code reflecting the same JavaScript code. In other words, PHP and JavaScript codes are duplicated for this purpose.

Way to Final Solution

When decided to write PHP code, a new goal is risen: how to ensure that PHP and JavaScript can really produce the same result. Then, when a developer makes changes in PHP or JavaScript, they will need to do the same for the other language.

I approach that requirement by running unit testings for both PHP and JavaScript with the same fixtures and test results, which are actually encoded in JSON files.

With the current code, I refactor the JavaScript code first so that each line of fee details are handled individually. Then, based on that, I add JSON test files (based on the current JavaScript code result), add up new PHP code and its tests. All of these have been done in the second PR. The different fixtures actually aid me a lot in writing and tweaking PHP code to get the final result, which is pretty stable and almost have no problem.

Updates

  • August 2023, we found a bug, fixed it, and added a new fixture file. But the fix is still simple and effective for the long future.
  • September 2023, there are some changes in JavaScript logic, and developers can pick up that quickly to fix the PHP part too.

Payment: Webhook Reliability

What is the issue?

WooCommerce Payments, like any other payment service, uses webhook to notify relevant data to merchant sites. However, sometimes merchant sites can not receive these webhook events due to multiple issues. This can be a big issue if these events are related to disputes or payment status. One example is that merchants are not notified when disputes happen.

Continue reading “Payment: Webhook Reliability”

WooCommerce Payments – Embracing Stricter L-2 Support

As WooCommerce Payments continues to grow, ensuring that we provide a stable, secure, and efficient experience for merchants is a top priority. One area we’re evaluating is the policy around which versions of WordPress we support—specifically, whether to adopt a stricter L-2 policy, meaning official support only for the latest two major WordPress versions. This shift would help streamline development and testing, but it comes with tradeoffs we want to carefully consider.

Why Haven’t We Implemented a Stricter L-2 Policy Yet?

The main reason we’ve held off on enforcing a stricter L-2 policy is to support as many merchants as possible. Some stores don’t update WordPress immediately for a variety of reasons—compatibility concerns, customizations, or just caution around change. Our current “loose” L-2 policy reflects this: we say we support the latest two versions, but in practice, we don’t block older versions from running WooPayments, and our CI still runs tests against them in some cases.

Maintaining this broader support helps minimize friction for merchants and gives them more flexibility in how and when they upgrade.

Challenges If We Do Not Enforce a Stricter L-2

While broad compatibility is beneficial for users, it creates real challenges for development and testing:

  • Increased Maintenance Overhead: Supporting older WordPress versions often means keeping legacy compatibility code around, which increases the complexity of our codebase.
  • Testing Complexity: Different WordPress and PHP combinations require specific versions of PHPUnit. Our CI pipeline has to include hacks to install and configure the right versions, making the system fragile and hard to maintain.
  • Reduced Developer Velocity: Supporting many combinations means slower builds, more bugs slipping through, and more time spent troubleshooting issues that don’t affect the majority of users.
  • Inability to Adopt Modern APIs: We’re often blocked from using newer WordPress features until we drop support for versions that don’t include them.

In short, the more versions we support, the more technical debt we accumulate—and that makes it harder to build new features confidently and efficiently.

How Do We Make the Decision?

To make a responsible choice, data must drive the decision. Specifically, we need to understand how many active merchants are running WooPayments on older versions of WordPress. If that number is small—say, under 1%—then the impact of enforcing a stricter L-2 policy would be minimal.

This data could come from usage tracking, opt-in telemetry, or marketplace stats. The goal is to quantify the tradeoff: how many users would be affected vs. how much complexity we remove by dropping support.

We’re also considering what the test matrix should look like. It’s not just about which versions are supported, but also which ones we validate in CI. A leaner, more focused CI pipeline makes our code more reliable and our releases more predictable.

What We Will Do Next

Our next steps will focus on data gathering and planning:

  1. Collect usage data to see which WordPress versions are most common among WooPayments users.
  2. Define our support baseline based on that data, likely aiming to officially support only the latest two versions.
  3. Simplify our CI strategy to only cover supported versions, removing outdated PHPUnit workarounds.
  4. Introduce runtime checks in WooPayments to block usage on unsupported WordPress versions with helpful error messages.
  5. Communicate the change clearly and in advance so merchants and developers have time to adjust.

This is part of our broader effort to modernize the WooPayments plugin, reduce technical debt, and provide a better experience for everyone—from merchants to developers.

Testing PHP Static Functions with PHPUnit: Challenges and Solutions

Static methods in PHP can be convenient, but they pose significant challenges when it comes to unit testing. This post explores why testing static methods is hard, why it’s advisable to avoid them in testable code, and what strategies can be employed to mitigate these issues.

Why Testing Static Methods is Hard

Static methods are inherently tied to their class and cannot be easily replaced or mocked during testing. This tight coupling makes it difficult to isolate the method under test, leading to tests that are less reliable and harder to maintain.

Why We Should Avoid Static Methods in Testable Code

Using static methods can lead to code that is tightly coupled and difficult to test. This is particularly problematic when static methods perform actions like logging, caching, or interacting with external systems.

For instance, in this WooCommerce Payments PR, I encountered difficulties testing a Logger class that relied on static methods. The inability to mock these methods led to challenges in writing effective unit tests.

Strategies for Testing Static Methods

While it’s best to avoid static methods in testable code, there are scenarios where they are necessary. In such cases, consider the following strategies:

1. Use Dependency Injection – Most Preferred

Dependency Injection (DI) allows injecting dependencies into classes, making them more testable. By injecting a logger or cache handler, it’s possible to replace these dependencies with mocks during testing.

As highlighted in the blog post on Dependency Injection, DI helps in achieving loose coupling and makes unit testing more straightforward.

2. Wrap Static Methods

If you must use static methods, consider wrapping them in instance methods. This approach allows you to mock the wrapper during testing, providing greater flexibility.

3. Use Callables

Another approach is to pass callables (like closures) into your methods. This technique enables you to replace static method calls with mock functions during testing. But it’s not ideal as it makes things complicated.

Conclusion

Try best to avoid injecting static methods for objects you’d like to test with PHP. The only good reason I have seen for it so far is to for utils/helpers. Using Dependency Injection is the most favorable approach to write more testable and maintainable code.

Code Snippets in functions.php May Not Work as Expected

In WordPress development, it is common practice to add custom code to a theme’s functions.php file to extend or adjust site functionality. While this approach often works for straightforward customizations, there are cases where such code does not behave as expected or fails entirely.


Limitations of functions.php

The functions.php file is associated with the currently active theme and is intended primarily for theme-specific logic. While convenient for minor adjustments, its use comes with inherent limitations:

  • It only loads when the active theme is in use.
  • It may execute before required plugins or components are fully loaded.
  • It may not be included at all in non-standard request types, such as AJAX calls.

As a result, code placed in functions.php may behave inconsistently or be ignored altogether, depending on the request context and execution order.


Common Causes of Snippet Failures

1. Load Order and Execution Timing

WordPress core, themes, and plugins all hook into the loading process at different points. If a snippet relies on functionality provided by a plugin but is executed too early (e.g., before that plugin has fully loaded), it may not work properly.

To understand when various actions and filters run, it is helpful to refer to the WordPress Plugin API Action Reference, which outlines the standard execution flow during a typical request lifecycle.

2. AJAX Requests Have a Separate Execution Path

AJAX requests in WordPress are processed via admin-ajax.php and do not necessarily load the same theme-related files used during standard HTTP requests. This can result in functions.php not being executed, or key variables and hooks being unavailable during AJAX processing.

This issue is documented in real-world cases, such as in the Edit Flow plugin issue on GitHub, where code placed in functions.php did not affect AJAX behavior as expected.


Recommended Solutions

To avoid unexpected behavior when customizing WordPress, consider the following approaches:

Use the Correct Hooks and Load Points

Ensure that custom logic is attached to the appropriate WordPress action or filter and that it runs after all necessary dependencies are available. This often involves using later hooks or adjusting the priority to accommodate plugin load order.

Consider AJAX-Specific Requirements

If the intended functionality involves handling AJAX requests, be aware that these requests follow a different execution path. Code required for AJAX processing should be located in files that are guaranteed to load in both standard and AJAX contexts.

Use a Plugin Instead of functions.php

For custom code that interacts with plugins, operates across the entire site, or needs to persist independently of the active theme, it is more appropriate to place the code in a standalone plugin. This ensures consistency regardless of the theme and gives you more control over how and when the code executes.

For critical logic, consider creating a must-use plugin, which is automatically loaded by WordPress before standard plugins and themes.