Page MenuHomePhabricator

pre 1.21 linktrail
Closed, DeclinedPublic

Description

linktrail was only exposed in the API in v1.21. We need to determine how we'll support pre v1.21.

Event Timeline

jayvdb raised the priority of this task from to Needs Triage.
jayvdb updated the task description. (Show Details)
jayvdb added a project: Pywikibot.
jayvdb subscribed.
Restricted Application added subscribers: Aklapper, Unknown Object (MLST). · View Herald TranscriptApr 30 2015, 6:40 AM

Copying a comment from https://gerrit.wikimedia.org/r/#/c/207179/3/pywikibot/family.py,cm

If we dig up the history of linktrails, we may be able to deprecate the family definitions without _much_ loss of functionality for older versions, and *increase* our support for older versions at the same time.

We'll need to look at any changes to regex in family.py to see if the commit messages give clues for specific choices made by previous pywikibot contributors.

The values are defined in the language files, but could be overridden by MediaWiki messages
https://www.mediawiki.org/wiki/Manual:MediaWiki_architecture#Localizing_messages
*however* I believe that overriding linktrail was using a MediaWiki: message was disabled for performance reasons.

Some wikis still have the MediaWiki: message, even thought it was not used, so that could be a fallback.
https://fr.wikipedia.org/wiki/MediaWiki:Linktrail
fr.wikipedia.org/w/api.php?action=query&meta=allmessages&ammessages=linktrail
on wmf wikis, often these messages have been deleted

As the value in those language files changed over time, our static hard-wired linktrail definitions in the Family class will be wrong on some older sites. So, what we have is not perfect, and we may be able to build an alternative which is also not perfect, but requires less maintenance.

The link trail was previously always quite close to 'unicode word', however there was a lot of problems with using pcre's 'unicode' functionality, which is why custom sets of permitted letters were added to the link trail per language.

If the python re unicode word matching is similar to the custom sets of letters in the mediawiki language files, it could be good enough as a generic fallback for pre 1.21

Another fallback strategy would be to store MW version language specific information (such as linktrail), on translatewiki , and fetch it from there. That way it is maintainable and reusable. We'd need to talk to the site maintainers about a page naming convention.

The change to using a unicode regex was here: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/36253 , brought about due to T16512.

It would be interesting to see if that regex is 'close' to the effect of previous linktrail regex , as it might be usable as a default .

We will never get perfect parsing of old revisions unless we load the regex from the php source code of the relevant MW version used at the time of the revision. Which is an insane problem to solve and unlikely anyone cares about accuracy that much.

Xqt triaged this task as Low priority.Aug 26 2016, 7:46 AM
Xqt subscribed.

Support for MW < 1.23 will be dropped (T268979)