Page MenuHomePhabricator

Document why cache purging (sending HTTP PURGE) is synchronous
Open, Needs TriagePublic

Description

When MediaWiki sends the PURGE request to a CDN, it waits for a response before proceeding to respond to the client. If the CDN takes a moment to respond, this makes saving pages take longer.

It's become an issue for us after we implemented wildcard cache purging with Nginx,[1] to make sure that saving a page purges both the desktop and mobile versions, and both GET and HEAD responses. Nginx takes up to several seconds to process these PURGE requests.

It's been pointed out to me that the MW code explicitly requests this synchronicity by using PRESEND instead of POSTSEND in HTMLCacheUpdater.php.[2] We weren't able to tell why this choice was made. There's a large comment in the commit adding HTMLCacheUpdater, that explains why it's deferred,[3] but it's not clear if it needs to be PRESEND.

I've been told @aaron may know more. Thanks to @Bawolff for the pointers.

[1] https://www.mediawiki.org/wiki/Manual:Nginx_caching
[2] https://github.com/wikimedia/mediawiki/blob/master/includes/cache/HTMLCacheUpdater.php#L187
[3] https://github.com/wikimedia/mediawiki/commit/35da1bbd7cb8b4414c4fbcf331473f1024bc638d

Event Timeline

Aklapper renamed this task from Why is cache purging (sending HTTP PURGE) synchronous? to Document why cache purging (sending HTTP PURGE) is synchronous.Sep 23 2025, 8:01 AM

As a test, I've done the following:

  • Perform a s/DeferredUpdates::PRESEND/DeferredUpdates::POSTSEND/g within includes/cache/HtmlCacheUpdater.php.
  • Perform a s/::PURGE_PRESEND/::PURGE_POSTSEND/g within all of includes/. (Just to correct the name of this constant in line with the above.)

After this, the purging feature still works as expected with our setup using Nginx FastCGI caching. Perhaps there is no need for the deferred action to be PRESEND rather than POSTSEND, so long as it's deferred?

The comments do mention something about edge cases with race conditions in distributed server setups, but it's unclear whether the solution to this is the deferral alone, or if it specifically needs to be a PRESEND deferral.

I think:

  • If there is a justification for PRESEND under certain conditions, but POSTSEND works "well enough" (or exactly as well in single-server setups) then there should be a configuration option to choose POSTSEND.
  • If this was simply an oversight and POSTSEND works exactly as well under all conditions, then it should simply be changed, for which the two above substitution operations I've mentioned appear to be enough.

Thoughts?

In case its not clear, sending this PRESEND can cause significant latency if you are using traditional HTTP purging and have multiple cache servers. I think the request here is to change this to POSTSEND unless there is a compelling reason why doing that would be a bad idea.

On WMF sites, the update just queues the purge into another service, so it's quick. We also don't want the deferred purge update to end up running after a bunch of slow POSTEND deferred updates. Since these POSTEND do not block the user, in that case, the user might have a decent chance of seeing stale data, since they can follow redirects and navigate around before the deferred purge update gets started. Our own purge queuing service is pretty quick, so that risk is minimal as a PRESEND update. We can't control how fast *other* POSTSEND updates are though.

On WMF sites, the update just queues the purge into another service, so it's quick.
[...]
Our own purge queuing service is pretty quick, so that risk is minimal as a PRESEND update.

Then it probably might be worth adding a setting so 3rd party wikis and wiki farms where purging takes longer can control whether it uses PRESEND or POSTSEND.

In this case, it sounds like it would run into the problem of users seeing stale pages (even without other slow deferred updates, but worse with them). I know that CDNCacheUpdates try to merge into batch updates, and we have an HTTP purge client class that does multiple URLs concurrently. Is it taking multiple seconds to purge a single URL? That seems like an operational problem.

This sounds like at least part of the reason why I haven't really tried to get my wikis working with a CDN (CloudFront since they're in AWS), concern about this kind of latency issue.

Wouldn't the editor having a session prevent them from seeing stale pages?

Session yes, if they're logged in, or reached Special:CreateAccount/Special:UserLogin. Anons get the ChronologyProtector cookie but that only lasts 10s, and might also often not be configured to exempt from caches on third-party setups.

Session yes, if they're logged in, or reached Special:CreateAccount/Special:UserLogin. Anons get the ChronologyProtector cookie but that only lasts 10s, and might also often not be configured to exempt from caches on third-party setups.

Saving a page (or previewing i think) should also get you a session cookie even if logged out. (Otherwise you wouldn't get new message notices about people mad about your edits)

I'm pretty sure that you get a session cookie if you edit while logged out (regardless of if you have the new temporary accounts feature enabled or not), unless that has changed since 1.43. I think a reasonable approach to this issue, if WMF have concerns about using POSTSEND, is to allow third-party wikis to configure this without needing to edit the source code.

I just encountered another wiki where doing these PRESEND was the cause of a major save time latency (was taking multiple seconds to send cdn purges). I think this is a major performance hurdle for most wikis using HTTP based cache purging.

Part of the issue here is that curl removed support for http/1.1 pipelining which made this much slower. Edit: Seems like without pipelining it just opens multiple tcp connections at the same time, so the latency difference really shouldn't be that much

I suppose adding some sort of priority to deferred updates or an additional stage (early post send vs late post send) could help, but that is a lot of complexity.

One potential problem though is that when purging an image page, doing the purge POSTSEND might make the user see stale data as typically users don't have a cache busting cookie for the image server. So i guess file purges should still be PRESEND.

OTOH, afaik all wikimedia purges are async including image purges, so maybe that doesn't matter.