Page MenuHomePhabricator

Category sort key (cl_sortkey_prefix) set to wrong value when a page is moved and stays that way
Closed, ResolvedPublic

Description

Please excuse me for my bad English. Feel free to re-write this entry.
In eswiki we have a maintenance category that uses timestamps as sortkeys, articles with older timestamps appearing first. It was working ok until a few days ago when some changes were made (see this comment by Bawolff [https://bugzilla.wikimedia.org/show_bug.cgi?id=4912#c19]). After that, we experienced two different issues:

  1. Articles that have been moved during the three last days, are completely mis-sorted, appearing at the very end of the category under their initial letters. It looks like their timestamp sortkeys have been lost, and only their pagenames are taken.
  1. This issue can be easily fixed, nevertheless I'll describe it fyi:

Articles with 8-digit timestamps YYYYMMDD (usually manually tagged by users) and those with full-sized 14-digit timestamps YYYYMMDDHHMMSS (usually bot inputs) are no longer correctly sorted each other. In addition, we created some "label" pages to mark months and years boundaries, keyed with 6-digit and 4-digit timestamps. Right now, these labels appear at the end of their periods, rather than at the beginning. I plan to fix this issue padrighting all timestamps with zeros up to 14 digit.

  1. Yes, a third issue appeared: Pages with pure-numeric pagenames are also mis-sorted. Our "label" pages assigned to mark years are titled Wikipedia:2007, Wikipedia:2008 and so, and are manually sortkeyed with 2007, 2008 and so. It seems these pages are experiencing another kind of sort error, appearing all together after all timestamped pages but before moved ones. The issue disappears adding a letter (not a number) to the sortkey, ie 2007x.

Version: unspecified
Severity: normal

Details

Reference
bz28020

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:31 PM
bzimport set Reference to bz28020.
bzimport added a subscriber: Unknown Object (MLST).

Hmm, I'm fairly confused whats going on. For some reason its not recognizing the custom sortkey [[Categoría:Wikipedia:Wikificar| 20110308]] correctly ( http://es.wikipedia.org/w/api.php?action=query&titles=H%C3%A9ctor%20Palomares%20Medina&prop=categories&clprop=sortkey seems to indicate a {{defaultsortkey:}} is overriding it, but defaultsortkey shouldn't override a per category specified sortkey, and the other examples in http://es.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=category:Wikipedia:Wikificar&cmsort=sortkey&cmprop=sortkey&cmstartsortkey=%203 aren't being overridden by a {{DEFAULTSORT}})

However, the parser seems to extract it fine ( http://es.wikipedia.org/w/api.php?action=parse&page=H%C3%A9ctor%20Palomares%20Medina&prop=categories ), and null edits/purges don't fix it.

So for some reason its just throwing out the custom sortkeys for no apparent reason whatsoever.

Héctor Palomares Medina has a forced {{DEFAULTSORT}} included in {{BD|1958||Palomares Medina, Hector}}, but the other examples do not have a {{DEFAULTSORT}} at all. What *all* of of them do have in common is that have been recently moved. While I was seeing this issue, a page suddenly appeared in the lettered sections at the end of the category. Then I noticed it has been just moved.

I`ve just noticed that this issue is present in all recently moved articles, in all categories, not just in timestamp-sortkeyed ones. Actually, pagename (or {{DEFAULTSORT}} if it exists) is overriding all custom sortkeys in all recently moved pages. And not only null edits and purges don't fix it, also major editions do nothing, only clearing/changing a categorization tag itself, then saving, then re-inserting it, restores the custom sortkey.

Just to clarify this is every article that gets moved, not just some of them.

Will creating an article, with [[category:Foo| 20100503]] on it, then moving that article to a new title always cause the issue to be observed, or just sometimes?

I think it has something to do with the category sortkeys get associated with the prefix of the category name that comes before the space(?!) I am able to reproduce this on trunk using the following steps:

Steps to reproduce (on trunk):
Template cat containing:
[[category:Published byme]]
[[category:Published| {{{1}}}]]

Page foo containing:
{{cat|210}}

After that, move foo to a new title,

page foo is now listed both in category:Published and Category:Published_byme with the sortkey " 210". Expected behaviour is for it to only be listed in category:Published with that sortkey. Null edits do not fix it, but adding and removing the template does. Adding another category after the fact also does not fix the sortkeys on the older categories

Ok I think I found the issue:
*Moving a page sets all cl_sortkey_prefix on all categorylinks to whatever the value was on the first one it got out of the db.
*When doing linksupdate, we only check cl_sortkey_prefix, we don't check cl_sortkey. If these get out of sync, such that cl_sortkey_prefix is right, but cl_sortkey is not, this never gets corrected short of adding and removing the category.

fixed r83866. (This won't fix pages this already happened to, will only stop it from happening again. To fix pages that its already happened to, you have to manually remove and re-add the category (or template containing the category). For clarification purpose, fixed in r83866 means fixed in the code repo, it might take a little while before the fix is deployed to Wikimedia, but since its a 1.17 regression, probably not too much time).

Sorry, forgot you have multiple issues here:

For issue 2 - They appear to be caused by adding the X to the end of the sortkey. In my testing, a page named project:2010, with the sortkey " 2010" will sort at the beginning of the things with a sortkey starting with " 2010...". Adding an X will make it sort at the end of the 2010 section since X comes after all numbers in the alphabet (Same for the month pages).

For issue #3 - I can't reproduce. [[Project:2010]] with
[[category:catname| 2010]] on it sorts in the expected position.

Good job Bawolff! I'm not a programmer and comment 8 is too complicated for me to understand, but think it will be deployed soon. I've also found this issue in articles with "normal" non-numerical sortkeys, and without leading spaces.

In my issue #2 there are no x at all. I've only put one x as a test in a unique label page. This miss-sort is present between pages tagged manually (8 digits) tagged by bots (14 digits) and label pages with 4 and 6 digits. Nevertheless I'll study it better way with those api.php queries you show me in comment 3. I didn't know them.

Did you remember this?
http://es.wikipedia.org/w/index.php?title=Usuario_Discusi%C3%B3n:Gustronico&diff=prev&oldid=31805957