Page MenuHomePhabricator

Generate lists of unillustrated sections in enwiki that have image suggestions from other wikis
Closed, ResolvedPublic

Description

Hi team!
Would it be possible to have a clean list of all unillustrated sections in English Wikipedia for which we have an image recommendation? Ideally, the format should be:

<page id>,<page title>,<section title>,<n_recommendations>

Where n_recommendations is the number of wikis where you find a matching illustrated section.

Thank you! in advance for your help!

Event Timeline

@Miriam what should we do with sections that has multiple images? do you want

<page id>,<page title>,<section title>,<img_title>,<n_recommendations>

?

@diego, No need to write image title, just the total number of images that could be recommended, so one line per section per page. Thanks!

@Miriam please find the data in csv format here and the code used to generate it here.

I've also added the wikidata item id, so the columns on that file are:

<wikidata_id>,<page id>,<page title>,<section title>,<n_recommendations>

So data looks like this:

wikidata_idpage_idpage_titlesection_headingn_recommendations
Q101227353359722Heinzenberg Castlehistory4
Q101354421521011Kaza, Himachal Pradeshfestivals & tourism1
Q10141193122565Fort McPherson, Northwest Territorieshistory1
Q1018115119596692007–08 Boston Celtics seasonroster1
Q10181914823615Petrovec Municipalitygeography1
Q10207686167212040The East (2020 film)production2
Q102165215429976Doux, Ardennespopulation4
Q10218825102310Arnold Berlinerbiography1
Q10239001241126Dujiangyan Cityhistory2
Q10251348207303Battle of Focșanireferences2
...

Let me know if you any further questions or data requirements.