Page MenuHomePhabricator

[XL] Topics personalized slide: "You read the most articles in the Arts and Culture topic area"
Open, LowPublic

Description

Background

The Apps team plans to explore if a personalized Wikipedia in Review feature that displays insights about a user's reading history, editing history, and donation history is engaging for App users, and inspires them to make a donation to the Wikimedia Foundation.

Requirements
  • If user has more than 1 article matching to a topic area, show this slide
  • If a user is not eligible, hide this slide (no collective fallback)
  • If a user has read in multiple topics, show a list of the top topics (numbers can be replaced with bullet points)
Designs

image.png (1×786 px, 86 KB)

Engineering Notes
  • Open question: Does the Topics API work with all languages?
    • A: Yes, see example (ores_articletopics item)
  • (data half) Upon app resume, pull persisted slide item object for this slide identifier. If evaluated = no AND remote config does not have a personalized slide kill switch, pull all their article page views (WMFPageView) within the remote config data population dates from Core Data (use WMFData Core Data stack from T370216). Gather up their unique article titles, and bucket them by Wikimedia project. Then chunk each bucket into 500 titles. For each chunk make a call to the MediaWiki action API to gather their topics (https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=cirrusdoc&titles=Cat). You can add 50 (or 500 for certain clients?) titles at a time into the titles url query item. Fetch these up to the number of pages in the remote config max pages setting (T376040), then persist the topics in a new WMFData Core Data table, and add a relationship between it and a WMFPage. Continue to keep evaluated = no, display = no for slide model. The next time app resumes, pull a new list of WMFPage's from Core Data that do NOT already have persisted topics, chunk/bucket again, and continue hitting the topic API to fetch their topics and persist. When we have reached the end of gathering the topics for all viewed articles, do a final evaluation to gather the top 3 read topics. Save these topics as slide model metadata. Then set evaluated = yes. If any top read topics were saved in slide model, set display = yes.
  • (UI half) Ensure that when we create the view models before presenting slides (see T376044), fetch the persisted model from previous bullet point. If display = yes, build a slide view model for this data (using the report model's metadata). Ensure slide view model is inserted into the correct order, and that associated collective "fallback" slide is NOT inserted.

Event Timeline

Tsevener updated the task description. (Show Details)
Tsevener renamed this task from Topics personalized slide: "You read the most articles in the Arts and Culture topic area" to [L] Topics personalized slide: "You read the most articles in the Arts and Culture topic area".Oct 3 2024, 7:00 PM
Tsevener renamed this task from [L] Topics personalized slide: "You read the most articles in the Arts and Culture topic area" to [XL] Topics personalized slide: "You read the most articles in the Arts and Culture topic area".

@HNordeenWMF

Possible scope cut:

Can we only check this against their primary app language, instead of all languages?
Also, can we only evaluate their most recent 50 (or 500, unclear how limited we are with the API) read articles?

Estimating assuming primary app language only, but we still fetching topics for page views from the year.

@HNordeenWMF I think it would be good to cut this slide entirely since we don't have the data readily available and it's a heavy lift to pre-populate it.

One thing we can do (before end of this year) is to fetch and persist article topics for an individual article whenever it is viewed. Then whenever we do WiR for 2025, the topics will already be local and accessible.