Page MenuHomePhabricator

Surface translation suggestions based on the Wikipedia Cultural Diversity Observatory
Open, Needs TriagePublic


The Wikipedia Cultural Diversity Observatory project provides data with strategic value and resources to organize and fight for more cultural diversity within Wikipedia. In particular, it provides lists of relevant Cultural Context Content (CCC) that are relevant articles for users to translate. CCC is the group of articles in a Wikipedia language edition that relates to the editors' geographical and cultural context (places, traditions, language, politics, agriculture, biographies, events, etcetera.).

This ticket proposes to integrate the Top CCC Articles as part of Content Translation suggestions. In this way, users can cover more easily the gaps for relevant articles that are missing in their language.

More aspects of the integration will be detailed as they are discussed, including how to integrate the suggestions (e.g., mixed with the current recommendations, or as part of a campaign), and the technical details to implement it.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 10 2019, 12:43 PM

Current version of the Top CCC articles is at with the name top_ccc_articles_current.db.

The database contains 3 tables for each language edition. One with its own Top CCC articles and article features such as number of Bytes, number of Editors,...(not relevant for this case), another one which tells the list each article belongs to (there are several lists according to some topics) and the third one (e..g. ccc_enwiki_top_articles_page_titles) contains all the other languages Top CCC articles.

e.g. ccc_enwiki_top_articles_page_titles (qitem text,page_title_target text,generation_method text,measurement_date text,PRIMARY KEY (qitem, measurement_date));

In this third table, the page_title field contains the title when it exists, otherwise it is empty or generated by Apertium, or it takes the label from the associated Wikidata Qitem. The field generation_method specifies whether the title is real or is generated by these methods (sitelinks, label and translation).

The articles whose generation_method is not sitelinks could be used as suggestions - they are the gaps.
For any further clarification, please ping me.