Decide on scalable approach for watchlist integration of Wikidata
Open, MediumPublic
Actions

Assigned To

None

Authored By

	daniel
	Oct 12 2017, 12:21 PM

Description

Originally, we inserted a recentchanges row for each local page affected by a change in a connected wikibase repo (e.g. if the label of Q159 was used on 10000 pages, we inserted 10000 rc rows when that label was changed). However, this was found to generate too much load (see e.g. T171027), so a hard cut-off was introduced as a quick fix, see https://gerrit.wikimedia.org/r/#/c/383384/

That situation is however not satisfactory. We at least want to be smarter which pages to "ping" via the recentchanges mechanism - e.g. insert rc rows for the most watched pages that are affected by the change.

Related Objects
Search...

Status	Subtype	Assigned	Task
Open		None	T90435 [Epic] Wikidata watchlist improvements (client)
Open		None	T91192 Show new versions of files (and possibly separately description changes) made on WikimediaCommons on the Watchlist
Resolved	PRODUCTION ERROR	Bawolff	T171027 "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis
			Restricted Task
Resolved		Lydia_Pintscher	T177707 don't dispatch changes to all affected pages for highly used items
Open		None	T178063 Decide on scalable approach for watchlist integration of Wikidata

Event Timeline

daniel created this task.Oct 12 2017, 12:21 PM

See also T173121

Small correction/clarification on "this was found to generate too much load" as an ops, I interpret load as throughput/backlog work. The insertion [load] itself was not the problem (the spikes on inserts were too large, but something that could be smoothed); the problem lies in the proportion of wb-originated changes vs. others and the size of the recentchanges table itself. Literally, issues could be solved by making the 2 million different query patterns of recentchanges better, but I am going to assume that is more difficult than changing the wb rows behaviour :-).

The idea is a more accurate phrasing would be "this causes some recentchanges and watchlist-related queries to have >60 seconds of latency and the table became operationaly unmaintainable for some wikis".

Aklapper mentioned this in T177707: don't dispatch changes to all affected pages for highly used items.Oct 16 2017, 12:56 PM

• jcrespo mentioned this in T171027: "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis.Oct 16 2017, 1:00 PM

Risker subscribed.Oct 17 2017, 10:53 PM

Anomie mentioned this in T176269: Disallow <templatestyles> tags in non-Template namespaces.Jan 21 2018, 1:04 AM

Michael moved this task from Backlog to ChangeDispatch&Watchlists on the MediaWiki-extensions-WikibaseClient board.Mar 6 2024, 10:36 AM

Michael added a project: Wikidata Change Dispatching & Watchlists.Mar 7 2024, 6:04 PM

Michael moved this task from Incoming to Tech/Needs Investigation on the Wikidata Change Dispatching & Watchlists board.Mar 7 2024, 7:16 PM