Payment: Webhook Reliability

What is the issue?

WooCommerce Payments, like any other payment service, uses webhook to notify relevant data to merchant sites. However, sometimes merchant sites can not receive these webhook events due to multiple issues. This can be a big issue if these events are related to disputes or payment status. One example is that merchants are not notified when disputes happen.

Data? How do we measure?

On WooCommerce Payments, every time one webhook is not delivered to merchant sites, we log it and know how often this happens. Instead, we also received multiple support requests suggesting that their payments or disputes are not processed properly, and all investigations lead to this cause. Also, based on our logs, the number of sites having this issue is not so many, they’re just around one percent or less.

Decisions we have made?

We solve this by logging failed webhook events into our storage. Typically, every two hours, each site will send a GET request to fetch their account information. A note for these sites, it’s just an issue with receiving webhook events while there is no issue in connecting to our servers. We attach a special flag in this request if this site has failed events, which should be retrieved.

Then in merchant sites, if we notice this flag, we will trigger requests to fetch these events, and schedule them to process them in a queue in WooCommerce called Action Scheduler. We will fetch these events until the server tells something like “hey, you’ve got all failed events, no more for your fetching”. We will stop sending these fetching requests.

Tradeoff

By relying on the GET account request happening every two hours, some events can be delayed for processing by the same amount of time. However, this is still much better than these events are never processed if we do not introduce this feature. Another aspect is that sites getting this issue is a minority. Therefore, we would rather not send GET requests every hour or every minute from every merchant sites to our server to know something that has only one percent or less of chance to happen. We want to provide this feature but do not want to sacrifice sending unnecessary requests for most other sites. What we arrived is a balance between what data we have, and what issue we’d like to resolve.

PRs / Real works

The implementation part for merchant sites are in two following PRs: one is to refactor the existing code so that it’s ready for the upcoming change, and the second one is to provide the all logics behind this feature.

Actually, to carry out this feature, we also implemented some features on the server side with some state-of-the-art techniques for optimizing database queries as well as storage.

Leave a comment