Page MenuHomePhabricator

Make use of the project field in categorisation jobs
Closed, ResolvedPublic

Description

In monuments_all table, we do not keep the origin wiki project of the monument.

The ErfgoedBot categorization bot tries to match an image to Commons categories, using several methods.
One follows the monument_article to look for {{CommonsCat}}-like template on it. But this assumes that the monument_article lives on a Wikipedia project. As we have lists from other projects (like WikiVoyage), that the harvesting job supports, then we cannot categorise these files.

Update: There is now a project field in monuments_all. What is needed is for the categorisation jobs to make use of this new data.

Event Timeline

JeanFred raised the priority of this task from to High.
JeanFred updated the task description. (Show Details)
JeanFred subscribed.
Lokal_Profil subscribed.

Plan of attack:

Add project field to country databases and monuments_all.

Once done this should also be used when converting wikitext links to urls (api/CommonFunctions.php.processWikitext())

The project field has now been added to monuments_all.

Was there a separate task for implementing this in the categorization? If not more details should be added to this task to identify the next step.

categorize_images.py seem to include some unused bits (specifically getArticle() ) but from a quick look I would say:

  • commonscatTemplates (and associated calling functions) needs to be made project aware.
  • getMonData() needs to get project from the SQL query.
  • get_new_categories() needs to make use of project delivered through monData .
Lokal_Profil renamed this task from The monuments_all table does not keep track of the origin wiki, hindering the categorisation to Make use of the project field in categorisation jobs.Jul 15 2016, 8:18 AM
Lokal_Profil updated the task description. (Show Details)

Change 299273 had a related patch set uploaded (by Lokal Profil):
Make categorisation project aware

https://gerrit.wikimedia.org/r/299273

Change 299273 merged by jenkins-bot:
Make categorisation project aware

https://gerrit.wikimedia.org/r/299273

Mentioned in SAL [2016-08-01T07:54:26Z] <Lokal_Profil> (correction to last line) Deployed latest from Git, 5fe42fe (T111618), 1ec3530, 9a630b5 (T139258)

@JeanFred Do we have a good test case to see if this is working? I would expect something like number of uncategorised images on ru_(ru)?

I'm going to assume that this works as expected until I hear something suggesting otherwise.