Page MenuHomePhabricator

Remove database insertion code from WikiPage::getRedirectTarget() / WikiPage::insertRedirect()
Closed, ResolvedPublic

Assigned To
Authored By
matmarex
Sep 13 2023, 10:34 PM
Referenced Files
F37727863: followup.tsv
Sep 15 2023, 7:12 PM
F37727864: followup.sql
Sep 15 2023, 7:12 PM
F37722413: missing_rows.tsv
Sep 14 2023, 4:46 PM
F37722412: missing_rows.sql
Sep 14 2023, 4:46 PM
F37722411: missing_fields.tsv
Sep 14 2023, 4:46 PM
F37722410: missing_fields.sql
Sep 14 2023, 4:46 PM

Description

WikiPage::getRedirectTarget() / WikiPage::insertRedirect() contains logic to detect two incomplete database migrations – introduction of the "redirect" table (2008) and addition of rd_interwiki and rd_fragment fields to that table (2011) – and insert/update the redirect table rows if they weren't migrated.

These migrations seem to have been completed in Wikimedia production, probably many years ago, either thanks to this code or some manual maintenance. We should remove this fallback.

There are a few outdated entries in production that can't be updated – either due to using syntax for redirects that is no longer accepted (in which case it has nothing to update), or due to database corruption (in which case it throws errors, e.g. T281223). I'll list them, then fix the former and file/find bugs about the latter.

(I found this while investigating T346383)

Event Timeline

There are a few outdated entries in production that can't be updated – either due to using syntax for redirects that is no longer accepted (in which case it has nothing to update), or due to database corruption (in which case it throws errors, e.g. T281223). I'll list them, then fix the former and file/find bugs about the latter.

491 missing rd_interwiki and rd_fragment fields: (of which 206 on ttwiki, 72 on hiwiki, 57 on hiwiktionary)


206 missing redirect rows: (of which 191 on nostalgiawiki)


These migrations seem to have been completed in Wikimedia production, probably many years ago, either thanks to this code or some manual maintenance.

So there was a maintenance script run in 2008, taking care of adding the redirect rows: T12931#155945 https://wikitech.wikimedia.org/wiki/Server_Admin_Log/Archive_12#July_22 "Tim: fixing redirect table on all wikis" – sadly no one recorded what the script was.

I can't find anything about the 2011 migration populating rd_interwiki and rd_fragment fields.

I can't find a maintenance script for these migrations, not even a deleted one. The only maint scripts that ever existed with "redirect" in the name are checkBadRedirects.php (which lists pages with a redirect row that don't currently compute as a redirect, but doesn't update anything) and fixDoubleRedirects.php (which would incidentally populate the fields, but only if the page was a double redirect)

Just in case, I visited each of the listed pages with a quick script, which should trigger the WikiPage code – none of them got the missing rows/fields populated.

10 pages throw a RevisionAccessException when accessed, and 12 more pages don't throw exceptions, but their content is not accessible. I had a look at these pages with some more queries (not sure if there's some better way), and I found that all of them are affected by T281325 (those that don't throw have been already marked as corrupted, using the script in that task).


All other pages are accessible, but they're not redirects. Many have various typos in the #REDIRECT markup, which could be fixed by editing the pages (presumably the parser used to be more lenient). A few don't look like redirects at all.

If we ever do any kind of maintenance cleanup for this, I think the correct thing is to delete any redirect rows that have null rd_interwiki and rd_fragment fields, and to set page_is_redirect=0 when the corresponding redirect row doesn't exist.

In the meantime, we should ensure that WikiPage::getRedirectTarget() and WikiPage::isRedirect() return false for these rows (see also T226644: WikiPage::isRedirect and WikiPage::getRedirectTarget() seemingly disagree).

Change 957976 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] WikiPage: Stop trying to insert `redirect` rows on reads

https://gerrit.wikimedia.org/r/957976

So there was a maintenance script run in 2008, taking care of adding the redirect rows: T12931#155945 https://wikitech.wikimedia.org/wiki/Server_Admin_Log/Archive_12#July_22 "Tim: fixing redirect table on all wikis" – sadly no one recorded what the script was.

I can't find anything about the 2011 migration populating rd_interwiki and rd_fragment fields.

I can't find a maintenance script for these migrations, not even a deleted one. The only maint scripts that ever existed with "redirect" in the name are checkBadRedirects.php (which lists pages with a redirect row that don't currently compute as a redirect, but doesn't update anything) and fixDoubleRedirects.php (which would incidentally populate the fields, but only if the page was a double redirect)

I found that refreshLinks.php --old-redirects-only can perform the 2008 migration, and refreshLinks.php --redirects-only can perform the 2011 migration – that was probably the script used by Tim. (Both options are enabled by default if you run that maint script with no parameters.)

As far as I can tell, refreshLinks.php was never a part of the automated updater for third-party wikis, although it used to suggest running this script when upgrading from some old versions (code deleted in e1988c3e, c192b6c3).

Change 959888 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Add database updater script for 2008/2011 redirect schema changes

https://gerrit.wikimedia.org/r/959888

Change 959888 merged by jenkins-bot:

[mediawiki/core@master] installer: Add database updater for 2008/2011 redirect schema changes

https://gerrit.wikimedia.org/r/959888

Change 957976 merged by jenkins-bot:

[mediawiki/core@master] WikiPage: Stop trying to insert `redirect` rows on reads

https://gerrit.wikimedia.org/r/957976

There are a few outdated entries in production that can't be updated – either due to using syntax for redirects that is no longer accepted (in which case it has nothing to update), or due to database corruption (in which case it throws errors, e.g. T281223). I'll list them, then fix the former and file/find bugs about the latter.

491 missing rd_interwiki and rd_fragment fields: (of which 206 on ttwiki, 72 on hiwiki, 57 on hiwiktionary)


206 missing redirect rows: (of which 191 on nostalgiawiki)


I've edited all of the pages that had typos or syntax mistakes, and were accessible to me (not protected / closed wiki / stopped by abuse filter).

The rest of them can be fixed with the database updater script. Filed T347218 for that.

Change 963112 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Remove allowances for nullable `rd_interwiki` and `rd_fragment`

https://gerrit.wikimedia.org/r/963112

Change 963113 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Remove allowances for missing `redirect` rows

https://gerrit.wikimedia.org/r/963113

Change 963112 merged by jenkins-bot:

[mediawiki/core@master] Remove allowances for nullable `rd_interwiki` and `rd_fragment`

https://gerrit.wikimedia.org/r/963112

Change 963113 merged by jenkins-bot:

[mediawiki/core@master] Remove allowances for missing `redirect` rows

https://gerrit.wikimedia.org/r/963113