Page MenuHomePhabricator

Provide recommendation of which data source to use for WikiTrivia Game
Closed, ResolvedPublic

Description

Background
For the Wikipedia Trivia Game experiment we'd like to leverage data about articles On This Day. The intention is that the list we pull from is as curated as practical.

Task

  • Investigate the potential data sources below or others that come to mind for our use case then
    • Wikipedia:Selected Anniversaries
    • List of days of the year
    • OTD helper
    • More like this
  • Demo top contenders/recommendations to PM and Design

Event Timeline

JTannerWMF triaged this task as Medium priority.

Some things to consider:

  • How to avoid mature/sensitive content, for example (from today's On this day feed in the Android app):
    • "A suicide attack of a mosque in Herat, Afghanistan kills 20people."
    • " Eleven mountaineers from international expeditions died on K2..."
    • "A supermarket fire kills 396 people and injures 500 others..."
    • "Charles Whitman kills 16 people at the University of Texas..."
  • How to avoid items that reveal the year, for example (from today's On this day feed in the Android app):
    • "The Great Mississippi and Missouri Rivers Flood of 1993 comes to a peak"
  • How to avoid items that might be too obscure/difficult to the target audiences, for example (from today's On this day feed in the Android app):
    • "The then-villa of Mayagüez, Puerto Rico, formally receives its city charter from the Royal Crown of Spain"
    • "The Spanish conquest of Iberian Navarre commences with the capture of Goizueta"

Wondering if there's any way to incorporate ranking from the pageview API, to attempt to find events that are "popular", as a sort of proxy for things that might be interesting, easier to guess, etc.

JTannerWMF raised the priority of this task from Medium to High.Aug 5 2024, 5:22 PM

The definitive API that is likely to be the best fit for this application is the onthisday API, provided by our wikifeeds service.
This has the following benefits:

  • This API basically just parses the corresponding Wiki page for a particular day, but does so for a few different languages, and structures it in a consistent format intended for consumption by apps.
  • It's nicely cached, so there's little worry about spikes in traffic.
  • It has the advantage of being already semi-"curated", since it comes from a wiki page edited by the community.

This API breaks the data into the following categories:

  • events: Literally the entire list of events from the corresponding wiki page.
  • births: The list of births on that same wiki page.
  • deaths: Ditto for deaths.
  • selected: A much shorter (and more community-curated) list of events, parsed from a slightly different wiki page, which is what gets presented on the Main Page of the wiki.

Each of the above categories consists of pre-structured items with a short blurb, thumbnail, and all the relevant wiki links, which make it ready-made to plug into a sequential game-like interface.
We can draw from any of the above categories to compose a list of questions.


Some further random thoughts (read: personal opinions!):

How to avoid mature/sensitive content

This would require a level of curation (nonzero!) that is not feasible for this iteration. And anyway, this is a bit subjective; Items of this nature are shown on the Main Page on a regular basis, as deemed appropriate by the community.

How to avoid items that reveal the year

This is a great observation, and could be resolved by programmatically excluding any items where the blurb contains any kind of number (from one to four digits), with certain exceptions.

How to avoid items that might be too obscure/difficult to the target audiences

Ditto the point about nonzero curation. Personally I wouldn't mind seeing items of radically varying difficulty, and could find it fun to make wild guesses and seeing how far off I am.

Thanks @AlexHollender-WMF and @Dbrant.
Do you think it might be feasible to test out different categories of the API with testers on the testing day on Thursday (eg. see what it's like to use only 'selected' category vs. the more general API call?