Page MenuHomePhabricator

[RfC] Implement usage tracking without eu_touched
Closed, ResolvedPublic

Description

Given the performance problems we have with needing to touch entity usage rows (T111769, T122429) I looked at Wikibase for a bit and tried to see whether we could actually drop eu_touched.

I hope the analysis below is conclusive (did I miss any major use case for the field?), thus allowing for the conclusion I drew from it.


The field is (directly) only being used in EntityUsageTable right now (except for tests). In there we have four usages of that field:

  1. EntityUsageTable::touchUsages (thus updating the field)
  2. EntityUsageTable::makeUsageRows which is only called in EntityUsageTable::addUsages (for adding new rows)
  3. EntityUsageTable::queryUsages (which can optionally be filtered by the touched time).
  4. EntityUsageTable::pruneStaleUsages (usages older than a given value of that field)

1 and 2 are only used in SqlUsageTracker::trackUsedEntities which in turn is only used in UsageUpdater::addUsagesForPage. That function is used in two places:

  • DataUpdateHookHandlers::doLinksUpdateComplete which calls it immediately before also pruning old values (with the current timestamp, thus all values in the table before that will be pruned).
  • AddUsagesForPageJob which is only fired in case a new ParserCache entry is being saved.

3 has only one usage where eu_touched actually matters: EntityUsageTable::pruneStaleUsages (which is 4) where it is used to be able to delete by PK. Thus it is not interesting here.
4 is being used in SqlUsageTracker::pruneStaleUsages only which in turn is used in UsageUpdater::pruneUsagesForPage only. That function in turn is only used in DataUpdateHookHandlers in two places:

  1. To prune entries for deleted pages (where the timestamp obviously doesn't matter)
  2. In DataUpdateHookHandlers::doLinksUpdateComplete immediately after touching the batch of relevant usages (with the current timestamp, thus all values in the table before that will be pruned).

To conclude this, it should be enough to look at DataUpdateHookHandlers in order to get the big picture.


Behaviour on edit:

After a user edited a page we (immediately) run DataUpdateHookHandlers::doParserCacheSaveComplete, thus adding the new usage entries to the table (but without touching any of the old values, yet). Some time after that a LinksUpdate job will run (asynchronously), that will trigger DataUpdateHookHandlers::doLinksUpdateComplete which deletes all usage entries, except for those in the ParserOutput of the edit that triggered the LinksUpdate.

Page views that happen between the page save but before the LinksUpdate run will have their usages being lost (as we initially insert the usages via DataUpdateHookHandlers::doParserCacheSaveComplete, but purge them in our LinksUpdate hook handler later on). That is a problem with the current implementation and will also be one in the new implementation without eu_touched.


As far as I see, we can come around using eu_touched at all by making to changes:

  1. Simply delete all old usages and insert the new ones afterwards in DataUpdateHookHandlers::doLinksUpdateComplete (obviously you would do a diff and only touch rows you need to in a real implementation)
  2. Simply keep letting AddUsagesForPageJob insert all new usages it has. In order to avoid race conditions with doLinksUpdateComplete, the job should know about the page_touched of the page in question and only actually insert its rows, in case the touched timestamp hasn't changed.

Event Timeline

hoo raised the priority of this task from to Needs Triage.
hoo updated the task description. (Show Details)
hoo added subscribers: hoo, Lydia_Pintscher, daniel and 2 others.

Note that our primary use case for usage tracking is purging the parser cache. So we need to track usages for every rendered version of a page that gets stored in the parser cache. When a page is changed or not, and what the revision timestamp is, is really not relevant.

So, we can add tracking entries whenever something is cached. How do we purge stale entries? E.g. we are tracking entries for a rendering in fr, and then the page is re-rendered in fr, without a new revision being created (because a template has changed). We can now compare the new usages for fr with the old usages recorded in the database. But for usage entries that aren't in the new rendering, we don't know if they are used by a rendering for a different language, e.g. de (entity_usage does not track the target language). We can only know that based on eu_touched: if it's larger or equal to page_touched (which gets updated when the page gets re-parsed), then the records are still needed. If touched is smaller than touched, the entries are stale, and no current rendering uses them.

So, as far as I can see, we can get rid of eu_touched only if we replace it with eu_cache_key. That would effectively mean tracking usage for each target language separately. That means a lot more rows in this table on multi-lingual projects like commons. For other projects, it would not make a difference.

I kind of like the idea of having eu_cache_key in the table: it makes explicit the fact that we are tracking parser cache entries.

So, as far as I can see, we can get rid of eu_touched only if we replace it with eu_cache_key. That would effectively mean tracking usage for each target language separately.

I don't see why that would be the case. It seems to me you are under the misconception that we purge caches for individual languages, but we don't (and can't). Once something relevant for one page changes, we need to throw away all cache entries for that page anyway.

What we already do right now (if you read my above analysis closely) is tracking all entity usages that are associated with a page (that can mean one ore more ParserCache entries with different cache keys and whatnot). We don't need to know which cache entry these keys belong to in the end, we just need to know which page we need to purge in case.

Once MediaWiki decides that the page caches need to get cleared, we throw away all entity usages rows (effectively, as we purge all that are older than wfTimestampNow()) and then insert the new usages of the (at that point) single new ParserCache entry.

I don't see why that would be the case. It seems to me you are under the misconception that we purge caches for individual languages, but we don't (and can't).

True, we don't right now. But that is being worked on. Then again, once we have proper dependency tracking of derived resources, that will itself replace the entity_usage table (which is just tracking of entities by derived resources).

Once MediaWiki decides that the page caches need to get cleared, we throw away all entity usages rows (effectively, as we purge all that are older than wfTimestampNow()) and then insert the new usages of the (at that point) single new ParserCache entry.

Yes, that was my initial plan. But that means we have to be sure that the re-rendering and putting-into-the-parser-cache happens after the save & purge. Which, surprisingly, it often doesn't. In particular, ApiStashEdit stores the new rendered version into the parser cache before it gets saved. And it will do so even for previews that never get saved.

Btw: conceptually, when tracking dependencies, we need to track the identity of resource that depends. eu_cache_key would provide this, eu_touched is a hacky way to get around that requirement.

I don't see why that would be the case. It seems to me you are under the misconception that we purge caches for individual languages, but we don't (and can't).

True, we don't right now. But that is being worked on. Then again, once we have proper dependency tracking of derived resources, that will itself replace the entity_usage table (which is just tracking of entities by derived resources).

I see that is planned, but I don't think that we should try to cater for that right now (also eu_touched would be a poor mean to handle that).

Once MediaWiki decides that the page caches need to get cleared, we throw away all entity usages rows (effectively, as we purge all that are older than wfTimestampNow()) and then insert the new usages of the (at that point) single new ParserCache entry.

Yes, that was my initial plan. But that means we have to be sure that the re-rendering and putting-into-the-parser-cache happens after the save & purge. Which, surprisingly, it often doesn't. In particular, ApiStashEdit stores the new rendered version into the parser cache before it gets saved. And it will do so even for previews that never get saved.

As discussed on IRC, I don't think that's so hard to get right and stashed edits aren't a problem here, as they don't trigger any of our hook handlers.
Keep in mind that core also does post edit updates for all of its tracking tables and it works at least reasonably well.

Btw: conceptually, when tracking dependencies, we need to track the identity of resource that depends. eu_cache_key would provide this, eu_touched is a hacky way to get around that requirement.

Not necessary (if I understand you correctly). We just need to treat all existing and valid cached versions of a page as a single entity and we're conceptually ok. That doesn't make for an exhaustive list of "dependencies" (across all cached versions that could theoretically exist)… which might not be a very nice way to handle that, but it's enough for our use case.

We talked about this on IRC some more and came to the conclusion that it would probably work without eu_touched.

It's probably not worth implementing that, as dependency graphs are upcoming anyway.

As per @hoo request, I am sharing my own opinion here:

This is blocking ROW based replication application, which should be rolled in soon. In my own opinion (database-focused) I think wikidata updates (this is one of them) are the worse mediawiki-related production cluster problem happening right now.

In that case I think we should pick this up soon. If my above analysis is correct and per the IRC discussion I had with @daniel yesterday, I don't think is very hard to do.

First step is to stop using eu_touched, then stop updating (touching) it and finally we can drop it.

I talked to Gabriel about plans to obsolete link tables and he told me, that there are no immediate plans to replace local link tables (which includes this table). Due to this, I will pick this up in the next days if @daniel is ok with that.

hoo claimed this task.
hoo updated the task description. (Show Details)

I added some comments regarding the behaviour on edit, just for the record.

We concluded this RfC in a meeting today and it will be implemented. I'll open a ticket for that.