Page MenuHomePhabricator

Search by categories in GapFinder - Translate
Open, MediumPublic

Description

It may be nice to add a candidate finder that bases its results on a category-based search. This can help for more focused translation of articles.

Event Timeline

@ellery from time to time, we get a request to add an option to search by category in GapFinder. Given that the category system (with all its caveats) is used heavily by editors, I'm wondering if we should go ahead and offer this option. What do you think?

@Cervisiarius I'm cc-ing you here as this conversation may be related to the conversations we have around stub expansion.

@leila We could talk to Magnus about how he does category search in https://tools.wmflabs.org/not-in-the-other-language. Also, this may be very easy to implement by using WDQS now ...

leila triaged this task as Medium priority.Feb 23 2017, 10:49 PM
leila added a project: Recommendation-API.
leila updated the task description. (Show Details)
leila moved this task from Backlog to Next up on the Recommendation-API board.
leila added subscribers: schana, Aklapper.

Change 340332 had a related patch set uploaded (by Nschaaf; owner: Nschaaf):
Add category search to translation type

https://gerrit.wikimedia.org/r/340332

I couldn't find a way to query category information from WDQS, so I resorted to traversing the tree manually using the action api.

I'm not sure how we should rank the candidates; it is currently decided by the internal implementation of Python's set (hash-based). I think we'll want to use rank_method=sitelinks in GapFinder with this search method, unless a better option exists.

@leila We have a couple of options for how to integrate this into GapFinder

  1. Detect if the seed is a Category and then use the category search - no action required by the user
  2. Require a search=category parameter to be passed to activate the category search

I think detecting if the seed is a category will provide the most seamless experience and also be simple to implement.

Change 340332 merged by jenkins-bot:
[research/recommendation-api] Add category search to translation type

https://gerrit.wikimedia.org/r/340332

Assigned the task to myself to do some QA on the results.

Moving conversation here for posterity:

Can you have a look at the following page and see if the top categories can be extracted from the list of categories on top of this page? https://en.wikipedia.org/wiki/Portal:Contents/Categories

It would be nice if we could use Categories throughout instead of Portals (from a programmatic perspective).

Either way, we'll have to address the issue of using the appropriate list based on the source language.

It does seem there are some base categories we could safely source our suggestions from, but we would have to determine them per wiki (or just not show the suggestions):
https://en.wikipedia.org/wiki/Category:Main_topic_classifications
https://de.wikipedia.org/wiki/Kategorie:Sachsystematik

Thoughts?

@schana the way I imagine this is that the database contains information about all categories for which we do have recommendations and then the user can search in those categories and will get a "no category match found", otherwise. So, as long as you implement the functionality to search by category, we should be good, right?

@schana the way I imagine this is that the database contains information about all categories for which we do have recommendations and then the user can search in those categories and will get a "no category match found", otherwise. So, as long as you implement the functionality to search by category, we should be good, right?

We don't maintain a database with this information for the Translation Recommendation Type. This is intended to be used as a candidate finder, which is then filtered by wikidata sitelinks to recommend articles for translation. The issue with the category search as it is currently implemented is getting good results from the Category tree.

leila renamed this task from Search by categories in GapFinder to Search by categories in GapFinder - Translate.Apr 13 2017, 4:10 PM

@leila Here's the task talking about category search from the hackathon: T165982

leila removed leila as the assignee of this task.Mar 18 2020, 11:42 PM
leila removed subscribers: schana, ellery.