Page MenuHomePhabricator

document / improve sortkey handling
Open, Needs TriagePublic

Description

The sortkey handling should be working similarly what it worked before the T386783 which removed also the ability to update sortkeys by using self-categorization ie:

The new version did not allow for recategorizing the same file to the same cat with a new sort key after the pipe, the old version does that well.

Rough idea on how categorization should work (needs confirmation by checking it from the code)

When user adds, moves, copies or removes category without specifying sortkey (sortkey = undefined), it should:

  • Use sortkey as wildcard which matches all categories with the specified name, regardless of whether there is a sortkey
  • If category is copied then it will copy the sortkey too
  • This is current behaviour, there is no change to this

When user defines both category name and sortkey and adds, copies or moves category :

  • It should match to all categories with that category name and add or update the sortkey to new sortkey
  • If the defined new sortkey is an empty string, then it should add or update to an empty sortkey
    • Examples:
      • Wikitext [[Category:Turku|]]
      • JSON example: "mediawikiCategories": [ { "name": "Turku", "sort": "" } ]

When user defines both category name and sortkey and removes categories:

  • It should remove the category only when both category and sortkey match (?)

Event Timeline

I think the problem Peli described is because filterPages() doesn't take the sortkey into account. (line 1535)

Since the code doesn't know in filterPages() whether the selectedcat has any sortkeys (as that would require loading every page's wikicode), we could just accept that if targetcat has any sortkey, then it is not self-categorization in filterPages(). I think that this should fix the recategorizing issue when wikitext is used.

In JSON handling, the dev version is working differently in the new version so that if a user is adding the same category but with a different sortkey, it would add it twice instead of updating the existing one. ( example ) So, this will require code changes to keep it consistent with wikitext behavior.

In wikitext sortkey handling works like this

  • add allways omit : [[Category:NewCat]]
  • copy copy from source cat : [[Category:SrcCat|SORTKEY]]…[[Category:NewCat| SORTKEY]]
  • move copy from source cat : [[Category:NewCat| SORTKEY]]
  • delete (deleted entirely)

If a user enters a category in the search field and includes a sort key by appending` |SORTKEY` to the category name (e.g. CATEGORYNAME|SORTKEY), it will work—but only as a byproduct of how the category name is copied as is into the wikitext. . As a result, there are bugs in how the category name (and its sort key) is displayed in descriptions and in the user interface. Nevertheless, some users rely on this behavior.

The editJsonCategories() function follows the wikitext implementation, but there are a few corner cases to watch for. In particular, it always uses the sort key specified in the search field, instead of preserving the old one. In wikitext code mode, the generated wikitext will be broken if both the new and the old sort keys are defined simultaneously.

Here is my version for the code

Peli documented another bug in sortkey handling. Likely existing bug, not regression by last changes

But i found a bug in Moving file to it's own parent for sorting order with use of just an empty key. This does not work as expected. The task was move file 3 to "Uncategorized images of the Rijksmuseum (Crappy crops)| " but the result was: moved to "Uncategorized images of the Rijksmuseum (Crappy crops)|Uncategorized images of the Rijksmuseum". example Test was done by move image from category overview to sort a order in same cat, trying to move it to place 1. result: error in using empty sortkey. The workaround here is to always use a key after the empty space "Uncategorized images of the Rijksmuseum (Crappy crops| 0". But it would be nicer if it got fixed

Hi @Zache, The issue came from Cat-a-lot not handling empty sort keys ([[Category:Foo| ]] or [[Category:Foo|]]) correctly. During an edit, the code treated them as new categories instead of equivalent to a bare category ([[Category:Foo]]), which caused duplicates with the category name used as a sort key.

To fix this, I added a normalization helper that splits a category into { name, sort } and treats bare categories and empty-sort categories as equivalent. Then, inside editWikitextCategories(), I updated the logic so that if an existing bare category is found when adding with an empty sort key, it replaces it with the explicit [[Category:Foo| ]] instead of duplicating. This preserves the existing behaviour for other cases while fixing the duplication issue for empty sort keys.

Another thing, this problem is only observed in wikitext pages and not json pages so we are good on that front.

The diff can be found here

Note to-self. Searchkey submit handler trims the categoryname so "Foo| " becomes "Foo|" (MediaWiki:Gadget-Cat-a-lot.js line 266 )

@adiba_anjum. Thank you! Some comments about your change. The idea of using normalizeCategory() for splitting the targetcat into categoryname and sortkey is a good one and is similar to what is used in editJsonCategories. It also partially solves the issue. However, there are some caveats as well.

The change to Check if that file is already in that category looks like something that will create unexpected results as it totally changes what the block was doing. Also, sumCmt also is not defined yet in that part of the code.

A safer route would be to update only the move and edit branches of the function so the change impacts only places that we know to be broken.

One caveat is also how wikicode expands in unexpected ways, so it differs depending on whether there is a space or not. However, trailing space is already trimmed away when the searchkey is read, so trimming needs to be handled as well.

Examples of how wikicode behaves in differently when it is written little bit differently.

when savingwikicode after savesummary
[[category:CATEGORYNAME]][[category:CATEGORYNAME]]"CATEGORYNAME"
[[category:CATEGORYNAME|]][[category:CATEGORYNAME|CATEGORYNAME]]"CATEGORYNAME"
[[category:CATEGORYNAME| ]][[category:CATEGORYNAME| ]]""

Here is my updated version of the code: diff

Thanks a lot for reviewing my patch and pointing out the caveats. I see how it was too broad and could’ve caused side effects. I’ll be more careful to think about the wider impact of changes going forward. Also, your explanation about how MediaWiki treats | vs | and how the search key handler trims spaces was really helpful, it cleared up certain misconceptions I had about the same and inadvertently had used that in my patch. I tested your diff too and it worked great