Page MenuHomePhabricator

Lots of rows are missing from enwiki_p.`revision`
Closed, DeclinedPublic

Description

I discovered this while looking at P2180. These revisions actually do exist on the revision table. See also T115081

Event Timeline

Glaisher raised the priority of this task from to Needs Triage.
Glaisher updated the task description. (Show Details)
Glaisher added projects: Cloud-Services, DBA.
Glaisher added a subscriber: Glaisher.

Query in P2180 returns 1185 rows in labs but only 16 in production.

Can you verify that

a) Those missing rows are not deleted on purpose due to private data, at the two places that they are filtered (sanitarum and labs)

b) They are missing from all lab hosts (labsdb100[123])

It is ok if you cannot/do not know how, I am only requesting it if you want to speed things up.

Can you verify that

a) Those missing rows are not deleted on purpose due to private data, at the two places that they are filtered (sanitarum and labs)

I believe that the revision table has fields nulled instead of entire rows going missing like revision_userindex etc.

b) They are missing from all lab hosts (labsdb100[123])

Finding out.

I left the query running on each labsdb overnight, and got this result:
labsdb100[12]: 1107 rows
labsdb1003: 1929 rows

Oh.

Perhaps T118095 is a duplicate of this task?

Despite being a mixture of old an recent pages, all missing edits seem to be from around the same dates.
This would point to a range of transactions missing. I will identify the exact range and backfill those revisions back to sanitarium, and then labs.

chasemp added a subscriber: chasemp.
jcrespo moved this task from Backlog to In progress on the DBA board.

Is this task still in progress?

Is this task still in progress?

Yes. Unsurprisingly, it takes a lot of time to perform a full reimport. :-( I tried faster approaches and they didn't work or created worse issues- long and painful must be.

I have given up on trying to resync production to the current labsdbs. It is almost impossible while they are in use. The soon to be setup servers wont have that problem:

  • they use row replication, so they should not get desynced
  • They are behind a proxy, so they can be depooled for maintenance without afecting users, at any time
  • They use innodb, so they can be copied from production much more quickly

Enwiki is already loaded on the new hardware, which is 5x faster, and it can be tested (ask for testing if you need it sooner). It is still in beta and not documented (so not yet announced officially), but that will be the way to fix this.

Closing as declined becuse I will not fix the current slaves- users will get all fixed in the newer, faster hardware.