Page MenuHomePhabricator

Get WikiProject script to scrape WikiProjects from French Wikipedia
Closed, ResolvedPublic2 Estimated Story Points

Description

In order to support the WikiProject related features in CopyPatrol for French Wikipedia, we'll need to get Leon's script to scrape the assessment templates from French Wikipedia and create a table of them in the Labs DB.

Event Timeline

kaldari moved this task from New & TBD Tickets to Up Next (June 3-21) on the Community-Tech board.
kaldari set the point value for this task to 2.

MusikBot should be good to go. It is currently parsing talk pages to determine the WikiProjects, and from my tests this is working for French Wikipedia. There may be a few false positives here and there, but once we have Page Assessments fully deployed I can update the script to use that API.

As for integration with CopyPatrol, MusikBot now goes off of the lang column in the copyright_diffs table to fetch records for the given language, just as EranBot does. Similarly I've added a wp_lang column to the wikiprojects table, so in CopyPatrol we'll SELECT wikiprojects by language.

Relevant commit: 0d155f7

@MusikAnimal: Let's try to migrate this data soon so that Sam will have some data to test with for T145436.