Page MenuHomePhabricator

Update suggestion items in database so that we dont show titles that already exist in target language
Closed, ResolvedPublic1 Estimated Story Points

Description

The titles for showing suggestions are in CX database. We query them for a user, source language, target language and list in CX dashboard. But the titles might be already present in target language - created using CX or any other article creation methods. CX Source selector and CX translation editor has checks and warnings about 'already existing title', so this check is not going to be missed. But even then showing an existing article as suggestion won't be good.

Following is the proposed strategy to keep our suggestion database updated

  • SuggestionListManager will have a public method to delete a title from cx_suggestions for the given source language, target language, source title.
  • APIContentTranslationSuggestion to be written to delete a suggestion, create a suggestion list, add to a suggestion list
  • ApiQueryContentTranslationSuggestions will do a check whether any suggestions is in progress or in published stage and filter them out from suggestions. This can be part of getRelevantSuggestions method - while querying check if cx_translations table has entry for the translation. ApiQueryContentTranslationSuggestions will not update suggestion database. The translation might get deleted in while in progress and we dont want a valid suggestion get disappeard from suggestion database.
  • In CX Dashboard, when suggestions are recieved, before displaying each one to user, use action=query&prop=langlinks api to check if the article is created somehow in target language. Don't list those titles, but collect them. Pass those titles to APIContentTranslationSuggestion to delete. So, as long as people visiting the suggestion tab, the suggestion database get updated 24X7.
  • APIContentTranslationPublish will delete the suggestion item from database after a translation is published.

Event Timeline

santhosh raised the priority of this task from to Needs Triage.
santhosh updated the task description. (Show Details)
santhosh subscribed.
santhosh removed a project: ContentTranslation.
santhosh set Security to None.
santhosh removed a subscriber: Krenair.

@Nikerabbit, please review the above strategy. Thanks.

Following is the proposed strategy to keep our suggestion database updated

More specifically, pruning outdated suggestions

  • SuggestionListManager will have a public method to delete a title from cx_suggestions for the given source language, target language, source title.

In case it is used by automatic pruning methods, it would be useful if it took a list of items to delete.

  • APIContentTranslationSuggestion to be written to delete a suggestion, create a suggestion list, add to a suggestion list

This seems unnecessary until we implement "mark for later". When is that planned to be implemented?

  • ApiQueryContentTranslationSuggestions will do a check whether any suggestions is in progress or in published stage and filter them out from suggestions. This can be part of getRelevantSuggestions method - while querying check if cx_translations table has entry for the translation. ApiQueryContentTranslationSuggestions will not update suggestion database. The translation might get deleted in while in progress and we dont want a valid suggestion get disappeard from suggestion database.

If it is in-progress, it is okay to just not show it (but for simplicity I would be ok for deleting it in this case as well, it will reappear in the list when it is updated if still relevant). For published articles it should be deleted (I recommend using a DeferredUpdates).

  • In CX Dashboard, when suggestions are recieved, before displaying each one to user, use action=query&prop=langlinks api to check if the article is created somehow in target language. Don't list those titles, but collect them. Pass those titles to APIContentTranslationSuggestion to delete. So, as long as people visiting the suggestion tab, the suggestion database get updated 24X7.

Why not do this in the backend instead, should be faster and we can delay implementing the API module.

  • APIContentTranslationPublish will delete the suggestion item from database after a translation is published.

This seems unnecessary extra work if we implement the above features.

NB: It seems it is not possibly to quote from description directly

Why not do this in the backend instead, should be faster and we can delay implementing the API module.

That is a good option. In backend, what is the best way to find if 'PageX' exist and connected using wikidata between 'en' and 'ca'? The check might be exectured from 'eswiki'. I mean, is there a way other than the equivalent of query=langlinks api call?

As is common, the proper abstraction is missing for ApiQueryLangLinks. We can use the same way to call API internally which we use in the publishing module.

Change 237627 had a related patch set uploaded (by Santhosh):
Suggestions: Filter out ongoing translations and existing pages

https://gerrit.wikimedia.org/r/237627

Change 237627 merged by jenkins-bot:
Suggestions: Filter out ongoing translations and existing pages

https://gerrit.wikimedia.org/r/237627