Page MenuHomePhabricator

Simple commands to generate suggestions for a set of articles
Closed, DeclinedPublic

Description

Even if we fix T192708: Let the prefill run until the end when going through all articles, a complete run will always need some time (probably several days). It would be nice to be able to generate suggestions for a list of items, for instance titles or DOIs which we know have been worked on recently.

The are two steps:

  • provide a list of page titles, which can simply be passed to get_proposed_edits();
  • provide a search criterion or filter, from which to generate a list of titles, for instance from a list of DOIs or a category or whatever appears in the articles of interest (hopefully we can just use the search API so that one can use all the CirrusSearch capabilities; but we could also query links in the database or filter the results from the list of template usages).

For now I'm doing the first of the two: I've just created a quick and dirty bash command (which is not even able to handle quotes and parentheses in titles; I'd rather avoid escaping them with pattern matching):

cd ~/www/python/src
echo $1
~/www/python/venv/bin/python -c 'from app import get_proposed_edits, app; get_proposed_edits("'$1'", True)'

This can be passed to jsub with as little as 150 MB of memory and at least 0.5 seconds wait. For 210k titles from https://zenodo.org/record/997222/files/enwiki-20170720-pages-articles-citations.tsv.xz , that will still take one week.

A proper solution will handle the multithreading within python, e.g. https://stackoverflow.com/a/28913218/1333493

Event Timeline

Nemo_bis moved this task from Bugs to Functionality on the OABot board.

Currently we're not short on suggestions and we have a workaround, so it's not the first in line.

A workaround is now to alter the pywikibot generators manually in prefill.py. I'm fine if someone closes this, but I'll do so only after a test run.

This is superseded.