Page MenuHomePhabricator

Investigate deletion of event registration following move-over-redirect
Closed, ResolvedPublic

Description

As seen in T395350, moving a page over a redirect can seemingly result in the associated event registration being deleted. We need to investigate why it happened, and fix it if possible.

Event Timeline

I have replicated the sequence of events resulting from the crono locally:

  • Create Event:Bicentenario de Bolivia, enable registration
  • Move Event:Bicentenario de Bolivia -> Event:Wikimixtura Bicentenario, leaving redirect behind
  • Move Event:Wikimixtura Bicentenario -> Event:Desafío Bicentenario, leaving redirect behind
  • Move Event:Desafío Bicentenario -> Event:Wikimixtura Bicentenario (over redirect), leaving redirect behind
  • Move Event:Wikimixtura Bicentenario -> Event:Bicentenario de Bolivia (over redirect), leaving redirect behind

But the registration never got deleted.

It's also worth noting that the timestamp in T395350#10860354 (20250526160727) is that of the most recent move, but I just checked the code, and it seems that we update the time any time the event page is deleted, and not only the first time. So, the event registration might have been deleted earlier than that.

Change #1151288 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/extensions/CampaignEvents@master] Prevent multiple event deletion when event page is deleted

https://gerrit.wikimedia.org/r/1151288

It's also worth noting that the timestamp in T395350#10860354 (20250526160727) is that of the most recent move, but I just checked the code, and it seems that we update the time any time the event page is deleted, and not only the first time. So, the event registration might have been deleted earlier than that.

According to the binlogs (T395350#10862819), this is not the case, and that was the first deletion of the event.

Also, here's a simplified version of the steps. All the moves left a redirect behind (suppressredirect not used), and moves marked with "OR" are over_redirect. Assuming registration is originally enabled on page A:

A -> B -> C -> B (OR) -> A (OR).

Tried this again in beta but it didn't reproduce the deletion: https://meta.wikimedia.beta.wmflabs.org/w/index.php?title=Event:T395351-A&action=history

I also tried on another page, reverting the previous moves using the "revert" button from the log, but it really just prefills the form and does nothing else. The resulting histories matches what we see on meta for all the involved pages, yet the event didn't get deleted. I don't see how this could be meta-only, either.

For completeness, I also tried this on testwiki, in case there's something production-specific involved in this bug that is therefore not visible in beta. Once again though, I did not succeed. I'm not sure what else to look for.

What must be true is that the deletion got triggered as part of the page move, in the PageDeleteComplete hook handler. Move-over-redirect triggers redirect deletion, which is effectively a normal page deletion that fires the PageDeleteComplete hook. However:

  • The deletion is synchronous, and therefore it happens before the page is moved. The hook handler and its callees (responsible for deleting the event) are also synchronous, hence invoked before the page move.
  • The hook handler reads everything from master, so it sees fresh data.
  • The redirect being deleted points to a page with event registration enabled, but we never attempt to resolve the redirect, so that should not matter.

Hence, I'm not seeing how handling the redirect deletion results in the event being deleted, when there are at least two reasons why it shouldn't (we read fresh data where the deleted page is still a redirect with no registration associated; and we do not attempt to resolve the redirect target).

However, I do see one thing that might cause this: the recently (r1139200 / T392784) added cache of event-by-page. If, for any reason, this cache held a reference to the event associated to the original (now redirect) page, we would then delete the event despite it no longer being associated with the page being deleted. The symptoms would match, and cache invalidation being involved makes this theory quite plausible. Also, having eliminated the impossible... I still need to prove this, though.

However, I do see one thing that might cause this: the recently (r1139200 / T392784) added cache of event-by-page. If, for any reason, this cache held a reference to the event associated to the original (now redirect) page, we would then delete the event despite it no longer being associated with the page being deleted. The symptoms would match, and cache invalidation being involved makes this theory quite plausible. Also, having eliminated the impossible... I still need to prove this, though.

Nevermind, I'm stoopid. The cache is ignored when reading from master, as it should be. Just to double-check, I verified that even if I comment out the cache purge entirely upon event edit, the bug still can't be reproduced.

Another theory I looked into is that the page object passed to the hook handler is somehow wrong; there have been similar issues in the past (like T348881). But this doesn't seem to be the case. The hook handler passes the PageIdentity object (actually a PageStoreRecord) straight down to EventStore without any changes along the way. There, we just call getNamespace() and getDBkey() which are simple getters. So, there's no chance of the object's state changing while handling the hook, meaning the state is either correct, or wrong from the get go. But checking the logic in DeletePage, $pageBeforeDelete really is created before any deletion has taken place, so it can't be wrong. Also, the fact that the bug isn't reproducible seems to imply there's something nondeterministic at play, but I haven't figured out what.

Change #1151288 merged by jenkins-bot:

[mediawiki/extensions/CampaignEvents@master] Prevent multiple event deletion when event page is deleted

https://gerrit.wikimedia.org/r/1151288

I investigated a bit and couldn't reproduce. Without further information, like the exact sequence of events (in case I missed something), I can't do anything else besides closing this task. Can be reopened if the issue resurfaces and we get more debugging information.