Page MenuHomePhabricator

RESTBase content rerenders sometimes don't pick up the newest changes
Open, Needs TriagePublic

Description

While deploying the new storage model to production we've temporary made it issue requests for mobile content for both old and new storage, compare the results and log a warning if the content doesn't match. The rate of mismatches is not very high, however all the mismatches a due to one of the responses not reflecting the change that made the serenader happen - either some Wikidata properties are missing or not updated, different revisions of some template are used or the whole page revision is different.

Current theory is that this is caused by DB replication lag - different MCS instances hit different slaves and get different results. Although the rates of such events for content mismatches is low, the rate of MCS getting stale data must be much higher - if both MCS instances hit a lagging slave there will be no mismatch, but the content will be stale in both renders.

There're several possible solutions to this, but the most obvious are:

  • Introduce some delay before starting rerenders. It's simple, but doesn't guarantee correctness in any way, just reduces the probability of errors
  • Make it hit the master - that greatly increases the load on a master mysql, probably beyond it's capacity.

Event Timeline

Here's a bunch of examples of when this happens collected from the render-comparation logs we've had during transition of mobile content to the new storage. I've tries to provide an example of every class of issue I've found, but there might be more classes.

  1. The whole new revision is not picked up. This is probably due to replication lag and one of the instances hitting a stale slave. Example:
X-Triggered-By: req:23c63446-3640-42fd-9742-28817015608a,mediawiki.revision-create:https://fr.wikipedia.org/wiki/Liste_de_films_LGBT,resource_change:http://fr.wikipedia.org/api/rest_v1/page/html/Liste_de_films_LGBT
diff: .lead.revision 140938455 vs 140938593
  1. Lastmodified is not updated. The lastmodified is taken from the rev_timestamp in the DB, so it's not quite clear how the result can have the same revision and different rev_timestamp
X-Triggered-By: req:bebe6ad9-8fe4-4a95-b68e-4b2c0342452d,mediawiki.revision-create:https://en.wikipedia.org/wiki/User:Legacypac/CSD_log,resource_change:http://en.wikipedia.org/api/rest_v1/page/html/User%3ALegacypac%2FCSD_log
diff: .lead.lastmodified 2017-09-25T07:58:14Z vs 2017-09-25T07:58:10Z
  1. Seing a stale version of some template.
X-Triggered-By: req:ac0f40d4-8bff-429a-a546-0cbf81ba5aba,mediawiki.revision-create:https://en.wikipedia.org/wiki/Wikipedia:Administrator_intervention_against_vandalism,change-prop.transcludes.resource-change:https://en.wikipedia.org/wiki/User%3AExcirial%2FDashboard%2FContent
diff: Some properties that were added by a template expansion are missing or different
  1. Same as previous but for Wikidata changes.

This is not really avoidable in our architecture, but I'd move it to icebox to be a constant reminder.

Restricted Application edited projects, added Analytics; removed Analytics-Radar. · View Herald TranscriptJun 10 2020, 6:33 AM
Restricted Application edited projects, added Analytics; removed Analytics-Radar. · View Herald TranscriptJun 10 2020, 6:36 AM