It may be nice to add a candidate finder that bases its results on a category-based search. This can help for more focused translation of articles.
@ellery from time to time, we get a request to add an option to search by category in GapFinder. Given that the category system (with all its caveats) is used heavily by editors, I'm wondering if we should go ahead and offer this option. What do you think?
@Cervisiarius I'm cc-ing you here as this conversation may be related to the conversations we have around stub expansion.
I couldn't find a way to query category information from WDQS, so I resorted to traversing the tree manually using the action api.
I'm not sure how we should rank the candidates; it is currently decided by the internal implementation of Python's set (hash-based). I think we'll want to use rank_method=sitelinks in GapFinder with this search method, unless a better option exists.
@leila We have a couple of options for how to integrate this into GapFinder
- Detect if the seed is a Category and then use the category search - no action required by the user
- Require a search=category parameter to be passed to activate the category search
I think detecting if the seed is a category will provide the most seamless experience and also be simple to implement.
The translation type now has this functionality, but I don't know how well it will work for large category trees, since it currently does a breadth-first search only until it has enough candidates (500).
Moving conversation here for posterity:
Can you have a look at the following page and see if the top categories can be extracted from the list of categories on top of this page? https://en.wikipedia.org/wiki/Portal:Contents/Categories
It would be nice if we could use Categories throughout instead of Portals (from a programmatic perspective).
Either way, we'll have to address the issue of using the appropriate list based on the source language.
It does seem there are some base categories we could safely source our suggestions from, but we would have to determine them per wiki (or just not show the suggestions):
@schana the way I imagine this is that the database contains information about all categories for which we do have recommendations and then the user can search in those categories and will get a "no category match found", otherwise. So, as long as you implement the functionality to search by category, we should be good, right?
We don't maintain a database with this information for the Translation Recommendation Type. This is intended to be used as a candidate finder, which is then filtered by wikidata sitelinks to recommend articles for translation. The issue with the category search as it is currently implemented is getting good results from the Category tree.