Page MenuHomePhabricator

EventBus change events involving redirect changes are sometimes incorrect
Closed, ResolvedPublic

Description

When performing api edits on testwiki changing a page from a redirect into text and back, the events produced to mediawiki.page_change.v1 have incorrect values for redirects.

Reproduction:

Monitor events from the public streams:

curl -s -N https://stream.wikimedia.org/v2/stream/mediawiki.page_change.v1 | grep "Cirrus_Updater"

edit to redirect:

curl -s -XPOST 'https://test.wikipedia.org/w/api.php' \
  -d 'action=edit&format=json&title=Cirrus%20Updater&text=%23REDIRECT%20%5B%5BSandbox%5D%5D&token=%2B%5C&formatversion=2'`

edit to plaintext:

curl -s -XPOST 'https://test.wikipedia.org/w/api.php' \
  -d 'action=edit&format=json&title=Cirrus%20Updater&text=plaintext&token=%2B%5C&formatversion=2'`

On the edit from a redirect into plaintext the page field of the event includes is_redirect: false, which is correct, but it also includes redirect_page_link which should not be populated because that is the old redirect.

"page":{"page_id":153275,"page_title":"Cirrus_Updater","namespace_id":0,"is_redirect":false,"redirect_page_link":{"page_title":"Sandbox","namespace_id":0,"page_id":68971,"is_redirect":false}}

on edit from plaintext into a redirect, the event incorrectly reports that there is no redirect:

"page":{"page_id":153275,"page_title":"Cirrus_Updater","namespace_id":0,"is_redirect":false}

Event Timeline

The events contain the wrong data because they are loading them from the replica database. This is reproducable semi-reliably simply by setting up a mariadb replica in a dev environment without any artificial lag. It's very reliably reproduced when setting mariadb master_delay to 2.

There are two separate (but similar) wrong paths to information being invoked here:

  1. On the WikiPage object passed into the PageSaveComplete hook, WikiPage::isRedirect can return false if the page was converted into a redirect by the edit. This is because WikiPage::updateRevisionOn set mRedirectTarget to null after edit, and WikiPage::getRedirectTarget fetched the state from the replica. This is used for the page.is_redirect field of the event.
  1. In PageChangeHooks::lookupRedirectTarget. Here a reference to a wiki page is resolved into the redirect target. The code takes a reference to a page, loads the relevant PageStoreRecord from a replica and passes that into the RedirectLookup service. The RedirectLookup turns the record into a WikiPage object and invokes WikiPage::getRedirectTarget which again queries the replica. The result of this is passed into PageStore and the result is used to populate the page.redirect_page_link field of the event.

Change 970432 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/core@master] WikiPage: Avoid loading redirect from replica after edit

https://gerrit.wikimedia.org/r/970432

Move events have a similar behaviour, not closely investigated but plausibly for the same reason. Moving the test page and leaving behind a redirect via the web ui on testwiki emits the following in the page field of the create page event. is_redirect should be True and redirect_page_link should contain info about the new page.

"page":{"page_id":153321,"page_title":"Cirrus_Updater","namespace_id":0,"is_redirect":false}

Also suspicious is that the move event has the following content. In this case it seems to be reporting the pre-move title, where i would expect it to have the new title (note the page_id is different from above, this is the moved page and the above is the redirect it left behind):

"page":{"page_id":153275,"page_title":"Cirrus_Updater","namespace_id":0,"is_redirect":false}

With the number of errors adding up, I wonder if what we actually need here is some sort of test harness that performs a variety of api requests and checks the events that come out the other end. To reproduce the above errors it would need to utilize a primary/replica configuration, perhaps via mwcli. Thats starting to sound like a larger project though.

EBernhardson renamed this task from Mediawik change events involving redirect changes are sometimes incorrect to EventBus change events involving redirect changes are sometimes incorrect.Nov 1 2023, 8:16 PM

Change 970875 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/EventBus@master] Move event must contain the destination page title

https://gerrit.wikimedia.org/r/970875

Change 970876 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/EventBus@master] Correct redirect serialization in page change events

https://gerrit.wikimedia.org/r/970876

Change 970875 merged by jenkins-bot:

[mediawiki/extensions/EventBus@master] Move event must contain the destination page title

https://gerrit.wikimedia.org/r/970875

Change 970432 merged by jenkins-bot:

[mediawiki/core@master] WikiPage: Avoid loading redirect from replica after edit

https://gerrit.wikimedia.org/r/970432

Change 970876 merged by jenkins-bot:

[mediawiki/extensions/EventBus@master] Correct redirect serialization in page change events

https://gerrit.wikimedia.org/r/970876

Works in testing, still needs to be checked after deployment (next week)

MSantos subscribed.

Thanks for pushing that forward and sorry for the noise with the triage, we are still figuring out our process as we move forward.

I'm moving this to our radar, please let us know if there's anything else you might want our support.

Rerunning the reproduction steps from above, on edit from a redirect into a normal wikitext page we get the following, looks fixed:

"page":{"page_id":153275,"page_title":"Cirrus_Updater","namespace_id":0,"is_redirect":false}

On edit from a normal wikitext page into a redirect, also looks fixed:

"page":{"page_id":153275,"page_title":"Cirrus_Updater","namespace_id":0,"is_redirect":true,"redirect_page_link":{"page_title":"Sandbox","namespace_id":0,"page_id":68971,"is_redirect":false}}