Assess database requirements for link recommendations reading entry point
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Tgr
	May 9 2022, 10:35 AM

Description

Link recommendations use a bunch of tables called growthexperiments_link_recommendations (one per wiki) on x1, which cache data from a recommendation system (which is slow). Currently we are keeping a contant pool of ~20K articles per wiki, which is enough to give users a feed of link recommendation tasks within some article topic they choose. But if we wanted to suggest people link recommendation tasks about the article they are reading at the moment (the project name for this is "entry point in reading experience"), we'd need this data for all articles.

We want to assess 1) if it would be reasonable to run an experiment on a few mid-size wikis to test how much a reading entry point would help with turning readers into editors and retaining new editors; 2) whether it would be feasible to scale up to all wikis eventually, 3) whether it would help or hurt or be necessary / impossible to move these tables out of MediaWiki (they just cache responses for a Kubernetes-based web service, so logically they could just as easily live in a database belonging to that service).

Currently the table size is something like 50-100M (so about 5K per wiki). On cswiki, which is our go-to wiki for testing new features, including every article would take about 2G. On enwiki, it would be about 20G.

Background:

link recommendations feature documentation
link recommendations technical documentation
link recommendations tables documention: T266913: Add a link engineering: create tables in Wikimedia production
(old) reading entry point task: T240513: Newcomer tasks: entry point in reading experience
service plans: T307881: Scaling of link suggestions service

Related Objects

Mentioned In: T308084: Reduce DB space used by Echo notifications
T307881: Scaling of link suggestions service
Mentioned Here: T308084: Reduce DB space used by Echo notifications
T240513: Newcomer tasks: entry point in reading experience
T266913: Add a link engineering: create tables in Wikimedia production
T307881: Scaling of link suggestions service

Event Timeline

Tgr created this task.May 9 2022, 10:35 AM

Restricted Application added a project: Growth-Team. · View Herald TranscriptMay 9 2022, 10:35 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

But if we wanted to suggest people link recommendation tasks about the article they are reading at the moment (the project name for this is "entry point in reading experience"), we'd need this data for all articles.

An alternative to caching recommendations for all articles would be to fetch data on the fly. Especially when limiting the max recommendations to a value of 2 (the minimum on most wikis), getting the data is something like ~2 seconds. Presumably anything less than 10 seconds would work from a product perspective, where a user is actually reading/skimming an article before seeing that there are possible changes to make?

Also, given that this would be shown to logged-in newcomers only (who would probably also be able to opt out of this feature, I imagine), the traffic should be manageable.

jcrespo added a project: Data-Persistence (work done).May 9 2022, 11:01 AM

MMiller_WMF mentioned this in T307881: Scaling of link suggestions service.May 9 2022, 7:02 PM

MMiller_WMF updated the task description. (Show Details)

In T307902#7913384, @kostajh wrote:

But if we wanted to suggest people link recommendation tasks about the article they are reading at the moment (the project name for this is "entry point in reading experience"), we'd need this data for all articles.

An alternative to caching recommendations for all articles would be to fetch data on the fly. Especially when limiting the max recommendations to a value of 2 (the minimum on most wikis), getting the data is something like ~2 seconds. Presumably anything less than 10 seconds would work from a product perspective, where a user is actually reading/skimming an article before seeing that there are possible changes to make?

Also, given that this would be shown to logged-in newcomers only (who would probably also be able to opt out of this feature, I imagine), the traffic should be manageable.

That's an interesting idea, @kostajh -- but perhaps then there would be issues around the volume of using the API? Like hitting it to get link suggestions for every page load of an article would be a lot. I imagine we would need business rules to decide which users get it, and for what articles?

• ppelberg subscribed.May 9 2022, 10:20 PM

In T307902#7915343, @MMiller_WMF wrote:

In T307902#7913384, @kostajh wrote:

But if we wanted to suggest people link recommendation tasks about the article they are reading at the moment (the project name for this is "entry point in reading experience"), we'd need this data for all articles.

An alternative to caching recommendations for all articles would be to fetch data on the fly. Especially when limiting the max recommendations to a value of 2 (the minimum on most wikis), getting the data is something like ~2 seconds. Presumably anything less than 10 seconds would work from a product perspective, where a user is actually reading/skimming an article before seeing that there are possible changes to make?

Also, given that this would be shown to logged-in newcomers only (who would probably also be able to opt out of this feature, I imagine), the traffic should be manageable.

That's an interesting idea, @kostajh -- but perhaps then there would be issues around the volume of using the API? Like hitting it to get link suggestions for every page load of an article would be a lot. I imagine we would need business rules to decide which users get it, and for what articles?

Right. Calling on the fly would be something to consider if this reading entrypoint is done for a subset of page views, not for all traffic to a page. Some variables we could play with:

all users or only authenticated users?
all authenticated users or only newcomers?
all newcomers or only accounts created in the last N days?
issue the API request immediately on page load or wait N seconds (or after the article is scrolled) before querying the link recommendation API?

If the goal is to tell the user that link suggestions exist for the article immediately on page load, then we should go with the proposal in this task, which is to generate recommendations for as many articles as possible on the wiki.

Per discussion with @Ladsgroup:

the +2G of enabling for all articles on cswiki shouldn't be a problem
if we want to do this for all wikis (~50G?), we should offset it by freeing up space. T308084: Reduce DB space used by Echo notifications seems fairly easy to do and would probably free up more space than that.

Yup. My only request would be to be careful about massive writes or reads when enabling it in really large wikis (coordinate with the DBAs beforehand) but generally x1 is in a healthy state.

Ladsgroup mentioned this in T308084: Reduce DB space used by Echo notifications.May 12 2022, 10:34 AM

So, what is the next step? Should we try enable for all articles on cswiki and see how it goes?

The reading entry point is probably a large enough project that it needs to be slotted into our annual plan. Generating tasks for all cswiki articles does not take much coding, but it doesn't seem that valuable as an experiment so I'd wait until we are at the point where we actually need it.

In T307902#7924086, @Tgr wrote:

The reading entry point is probably a large enough project that it needs to be slotted into our annual plan. Generating tasks for all cswiki articles does not take much coding, but it doesn't seem that valuable as an experiment so I'd wait until we are at the point where we actually need it.

Alright, moving off the current sprint board then and into Triaged.

kostajh moved this task from Inbox to Triaged on the Growth-Team board.May 12 2022, 11:25 AM

Thank you all! Yes, that's right @kostajh -- this would be an annual planned project. This ticket was just to begin the thought process, and we'll figure out when to actually proceed here.

Marostegui moved this task from Triage to In progress on the DBA board.May 23 2022, 1:29 PM

Marostegui removed a project: DBA.

RhinosF1 subscribed.Aug 17 2022, 1:42 PM

TheresNoTime removed a subscriber: RhinosF1.Dec 15 2022, 11:35 PM

Assess database requirements for link recommendations reading entry pointOpen, Needs TriagePublicActions

Description

Related Objects

Event Timeline

Assess database requirements for link recommendations reading entry point
Open, Needs TriagePublic
Actions