Paste P6486

Chat Log

Authored by daniel on Dec 19 2017, 4:21 PM.
1​[16:35] <RoanKattouw> addshore: DanielK_WMDE_: (Moving here because both -tech and -dev are noisy) During my walk I figured out WHY the "revision does not exist" message happened. It was related to ChronologyProtector, but not in the way we thought: it happened BECAUSE it was doing its job
2​[16:36] <addshore> :D
3​[16:36] <RoanKattouw> For newly created pages/revisions, the text table rows were written to the replica (and so weren't on the master), but the revision table rows were written to the master (and replicated to the replica)
4​[16:36] <RoanKattouw> When you first save the page, the replica hasn't caught up yet, so CP ensures that your next page view reads from the master (which never happens otherwise), because it's the only up-to-date server
5​[16:37] <RoanKattouw> The master doesn't have the text row, so trying to get the text fails
6​[16:37] <RoanKattouw> Then when you refresh, the replica has caught up, so your page view reads from the replica, and it has both rows so it works fine
7​[16:37] <addshore> aaah, and then the second refresh reads from the replica
8​[16:37] <addshore> Sounds like a thoughtful walk :)
9​[16:37] <RoanKattouw> The real problems began when an AbuseFilter rule was hit, because AF was still writing to the master
10​[16:38] <RoanKattouw> So the master assigns that text row an old_id which it thinks is the next available old_id, but the replica has already used that ID for something else
11​[16:38] <addshore> and then that is the point the replication exploded
12​[16:38] <RoanKattouw> Then when the replica tries to replicate that insertion, it fails because of an ID collision, and replication stops
13​[16:39] <RoanKattouw> Leaving both the replica and the master in a broken state: the master has revision rows pointing to old_ids that don't exist, or if they do, point to AbuseFilter data
14​[16:39] <addshore> RoanKattouw: so wikidatawiki on beta is also broken
15​[16:40] <RoanKattouw> And the replica has one AbuseFilter log entry that points to an old_id that points to revision text (and no more, becaues replication stops at this point)
16​[16:40] <RoanKattouw> Yeah I can imagine
17​[16:40] <RoanKattouw> I'm just about to check the others
18​[16:40] <addshore>
19​[16:40] <addshore> well that ticket actually talks about enwiki
20​[16:40] <RoanKattouw> I fixed enwiki and deploymentwiki by transferring the text rows from the replica to the master and updating the references to them in the revision table for their new IDs
21​[16:40] <addshore> you can probably write a query to find all wikis that have edits between now and the time the patch was first merged / landed on beta
22​[16:41] <RoanKattouw> I could do that but it's easier to just compare the text tables on the master and replica
23​[16:41] <RoanKattouw> If there are rows they disagree on, that means I need to fix things
24​[16:42] <RoanKattouw> Going to start doing that with wikidatawiki now
25​[16:42] <addshore> well, sorry for this fallout, and thanks for helping!
26​[16:43] <RoanKattouw> No worries!
27​[16:43] <RoanKattouw> 4 layers failed here
28​[16:43] <RoanKattouw> (Author, reviewer, MW DB abstraction, read only flag on the DB server)
29​[16:43] <RoanKattouw> So I can hardly blame any individual one