Page MenuHomePhabricator

Ensure reports work for Wikidata harvests
Closed, ResolvedPublic

Description

We produce various reports during a harvest:

  • unused images
  • images without
  • unknown fields
  • ...

Ensure that these still work when feed data from a Wikidata harvests. One likely issue is e.g. to on an "unused images" list on a Wikipedia page make the assumption that the "source list" lives on the same wiki.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 21 2017, 12:26 PM

One potential difference is that the source field (at least when viewed through the api) lists a wikidata source as http://www.wikidata.org/entity/<Qid> instead of e.g. //ka.wikipedia.org/w/index.php?title=<list name>

Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptAug 7 2017, 3:12 PM

Change 370481 had a related patch set uploaded (by Lokal Profil; owner: Lokal Profil):
[labs/tools/heritage@wikidata] Make unused image reports deal with sparql harvested data

https://gerrit.wikimedia.org/r/370481

There is likely a similar need to look at the reflexes in the api. They should already be centralised though.

Change 370481 merged by jenkins-bot:
[labs/tools/heritage@wikidata] Make scripts dealing with the sparql source field deal with sparql harvested data

https://gerrit.wikimedia.org/r/370481

I think that all that is now left to do is to run and inspect the results.

Lokal_Profil closed this task as Resolved.Aug 25 2017, 10:38 AM
Lokal_Profil claimed this task.

Closing this and adding a comment to do a run once everything is live.