Page MenuHomePhabricator

Implement usage tracking without eu_touched
Closed, ResolvedPublic

Description

As discussed in T124737.

  • Stop using eu_touched in SELECTs
  • Stop updating (touching) it
  • Drop it.

On edit:

After a user edited a page we immediately run DataUpdateHookHandlers::doParserCacheSaveComplete, thus adding the new usage entries to the table (without removing any of the old values). Some time after that a LinksUpdate job will run (asynchronously), that will trigger DataUpdateHookHandlers::doLinksUpdateComplete which deletes all usage entries, except for those in the ParserOutput of the edit that triggered the LinksUpdate.
Pruning all old entries during a LinksUpdate run is ok as we also invalidate all older parser cache entries in that case.

Please note that page views that happen between the page save but before the LinksUpdate run will have their usages being lost (as we initially insert the usages via DataUpdateHookHandlers::doParserCacheSaveComplete, but delete them in our LinksUpdate hook handler later on). That is a problem with the current implementation and will also be one in the new implementation without eu_touched.

On page view:

Page views in languages we don't have in the parser cache trigger DataUpdateHookHandlers::doParserCacheSaveComplete which inserts the additional usages into the table (but doesn't prune any).

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

hoo created this task.Feb 4 2016, 5:08 PM
hoo updated the task description. (Show Details)
hoo raised the priority of this task from to Needs Triage.
hoo added subscribers: hoo, daniel, aude.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptFeb 4 2016, 5:08 PM
hoo claimed this task.Feb 4 2016, 5:31 PM
daniel added a comment.Feb 4 2016, 6:17 PM

@hoo please outline here how we will be keeping entity_usage up to date without eu_touched, and what assumptions we are making.

hoo updated the task description. (Show Details)Feb 9 2016, 10:26 PM
hoo set Security to None.

@hoo please outline here how we will be keeping entity_usage up to date without eu_touched, and what assumptions we are making.

Added what happens on edit and on page view. Also added a note about the potential race condition.

Lydia_Pintscher triaged this task as High priority.Feb 16 2016, 1:55 PM
Lydia_Pintscher moved this task from Proposed to Backlog on the Wikidata-Sprint-2016-02-16 board.
hoo added a comment.Feb 24 2016, 12:05 AM

I started working on this, but wont be able to finish it before Friday (as I don't work Wednesday and Thursday this week).

I'll probably do step 1 and 2 in a single patch and only do so much b/c that we can revert/ downgrade at any time without loosing tracking data.

Change 278045 had a related patch set uploaded (by Hoo man):
Implement usage tracking without eu_touched

https://gerrit.wikimedia.org/r/278045

Change 278045 merged by jenkins-bot:
Implement usage tracking without eu_touched

https://gerrit.wikimedia.org/r/278045

hoo closed this task as Resolved.
hoo removed a project: Patch-For-Review.

Fixed on master, not yet sure when to deploy. See T132628 for dropping the field in question.

hoo moved this task from Doing to Done on the Wikidata-Sprint-2016-04-12 board.Apr 13 2016, 8:47 PM
hoo added a comment.Apr 15 2016, 6:13 AM

This has been deployed to all Wikis with yesterday's train.

The write (update) traffic on all shards reduced significantly, see for example db1052 (s1/ enwiki master):

Change 287581 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
Remove unused method from EntityUsageTableTest

https://gerrit.wikimedia.org/r/287581

Change 287581 merged by jenkins-bot:
Remove unused method from EntityUsageTableTest

https://gerrit.wikimedia.org/r/287581