Page MenuHomePhabricator

Implement normalizing MediaWiki link tables
Open, MediumPublic

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedMarostegui
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedABran-WMF
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedJAllemandou
ResolvedLadsgroup
ResolvedSBisson
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
DeclinedMarostegui
ResolvedMarostegui
ResolvedUmherirrender
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
OpenNone
OpenNone
ResolvedLadsgroup

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Ladsgroup moved this task from Inbox to Epic - Database on the Data-Persistence board.
Ladsgroup lowered the priority of this task from High to Medium.Jan 27 2022, 12:10 AM

for templatelinks, it should be high but the overarching ticket should be medium.

I've reviewed the plan with @tstarling and we both think it's ready to implement. I believe the last iteration of the proposal to not be cross-cutting or otherwise in need of wider consultation as it does does not affect public APIs, nor should it affect extensions, or other features and engineering teams.

This will be a big project and take some time to complete. I've confirmed with @Ladsgroup that the Data Persistence team has approved resourcing for this in this and coming quarters to see it through to completion in collaboration with Performance.

Our database internals are not part of the Stable interface, and so any extensions querying or writing here directly are by definition unsupported, and I'm not aware of existing technical debt in core or bundled/deployed extensions that do this

As with any schema, one notable case outside production where querying these databases does regularly happen, is Toolforge. So this will need coordination with Technical Engagement on communicating these changes, and possibly also on offering an intermediary database-view for compat (to be determined). Depending on how fast our migration goes, and the availability and resourcing of TechEng, I anticipate that the final state of the migration (where we stop writing to the old table column) may have to be post-poned to fit their schedule. It is unlikely we will get to that so soon though, so I think this is fine to start working on, and recommend that we estimate now in which quarter the last phase will be reached, and then reach out to TechEng for which quarter they can help us with that part of the roll out, and then we can adjust as needed.

As with any schema, one notable case outside production where querying these databases does regularly happen, is Toolforge. So this will need coordination with Technical Engagement on communicating these changes, and possibly also on offering an intermediary database-view for compat (to be determined). Depending on how fast our migration goes, and the availability and resourcing of TechEng, I anticipate that the final state of the migration (where we stop writing to the old table column) may have to be post-poned to fit their schedule. It is unlikely we will get to that so soon though, so I think this is fine to start working on, and recommend that we estimate now in which quarter the last phase will be reached, and then reach out to TechEng for which quarter they can help us with that part of the roll out, and then we can adjust as needed.

Correct, I'm planning to send an announcement about this soon-ish so I would see if we need to provide a view or we simply can drop it. I doubt it'd be too popular but I'll announce it soon regardless.

Will this affect mediawiki releases or only wikimedia servers?

Will this affect mediawiki releases or only wikimedia servers?

This is for all MediaWiki installations; the migration will happen as part of the normal update.php process for users who run that, and otherwise the maintenance script will be manually runnable (such as for Wikimedia ourselves).

Yes. From 1.39 onwards, new installation will use the new schema. Won't make it to 1.38 though.

Will this also be implemented for the redirect table?

Will this also be implemented for the redirect table?

I don't think so, there is not much gain because there is not much duplication, redirect table is small and there is an overhead that comes with normalization which makes not worth it.

Hi, just a reminder to update the schema in https://commons.wikimedia.org/w/index.php?title=File:MediaWiki_database_schema_latest.svg&redirect=no when the work is finished (or perhaps also after each of the normalizations?)

@Dvorapa I no longer update these in SVG form. Instead, we now have https://www.mediawiki.org/wiki/Manual:Database_layout/diagram which can be quickly updated on-wiki by developers with the procedure largely automated now. We publish this twice a year after a major release. It is not published for alpha commits.

Developers that build atop the alpha software prior to release, may consult the schema files directly as-needed.

I see, should be mentioned at the image page and perhaps the redirect from commons should be changed too

@Ladsgroup: I saw that in Scribunto protocol-relative links are outputted by default at least for mw.title generator (maybe for others as well), for example, in mw.title.new( 'Example' ):fullUrl( 'action=edit' ). Is this a problem that needs to be fixed in Scribunto? I read email from T335819 and it’s a bit confusing. It says there that the table no longer stores those links in HTTP but also this:

If your wiki heavily uses proto-relative URLs in articles' wikitext, we recommend changing them to https instead which also improves storage as every proto-relative URLs takes up two rows.

I just thought I’d let you know since obviously Lua use is very widespread in templates.

Thanks for the pointer. To my knowledge local domains are not stored in externallinks at all (which has confused me a lot multiple times) so this shouldn't be an issue. Do you see it being recorded?