Page MenuHomePhabricator

Flow data missing on Wikimedia production wikis
Closed, ResolvedPublic

Description

At Flow boards before ~February, there is data loss (intermittent rendering is probably depending on which cache is hit):


Caused by T95869: Fix RevisionStorage::update() (updates were writing to rev_content, due to wrong External Store call in update()) when running T90443: S14. Run FlowUpdateRevisionContentLength.php on prod wikis after it is deployed.. Also, it should not be possible to ever store null data in External Store (see https://gerrit.wikimedia.org/r/#/c/203479/)

Event Timeline

Mattflaschen-WMF updated the task description. (Show Details)
Mattflaschen-WMF raised the priority of this task from to Needs Triage.
Restricted Application added a project: Collaboration-Team-Triage. · View Herald TranscriptApr 9 2015, 5:15 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
EBernhardson triaged this task as Unbreak Now! priority.Apr 9 2015, 5:46 PM
EBernhardson added a subscriber: EBernhardson.

Change 203479 had a related patch set uploaded (by EBernhardson):
Ensure we do not provide null data to insert in ES

https://gerrit.wikimedia.org/r/203479

These messages were sent to all affected pages (not including testwikis):

Hi all. Investigation is ongoing into a data-loss that was reported yesterday (phab:95580), which seems to have been caused by a maintenance script updating the database. This affects all topic titles and post contents on this board prior to 11 February 2015. The Operations team is currently assisting with data-recovery from backups. We'll post more information here when we have it. We apologize for not having full information for you right now. Post here if you have any questions; we'll keep this Topic updated when we know more.

Update: The developers have a plan for recovery. They're going to talk to a few more members of the Operations team, to confirm the exact details, and various options, before proceeding. That is estimated to be Monday at this point, due to various people being away for the weekend. For the current discussions, please continue as normal! I'll update this topic again, when we have more information.

Mattflaschen-WMF renamed this task from Data intermittently not rendered on officewiki to Data missing on Wikimedia production wikis.Apr 12 2015, 4:46 AM
Mattflaschen-WMF set Security to None.

Change 203771 had a related patch set uploaded (by EBernhardson):
[WIP] Locate content for revisions with null ES data

https://gerrit.wikimedia.org/r/203771

Mattflaschen-WMF updated the task description. (Show Details)
Legoktm renamed this task from Data missing on Wikimedia production wikis to Flow data missing on Wikimedia production wikis.Apr 13 2015, 5:32 PM

Change 203479 merged by jenkins-bot:
Ensure we do not provide null data to insert in ES

https://gerrit.wikimedia.org/r/203479

Change 203771 merged by jenkins-bot:
Locate content for revisions with null ES data

https://gerrit.wikimedia.org/r/203771

Final stats after first run of fixing things: P521

EBernhardson added a comment.EditedApr 15 2015, 5:04 AM

This leaves us with 59 unresolved revisions across all wikis. I'm fairly certain i can manually resolve 15 of those, which will bring it down to 44. There are 11 more that resolve to their parent content, but the parent still isn't available. This means there are 33 revisions in total that we still have no idea what the content is, and no hints yet for tracking them down.

Some of these 33 are because they are recorded as having a length of '0'. We should double check if the code allowed creating 0 length revisions at the time those were created. Some of the 33 are off by 1 byte from found content from the right timespan. It might be worthwhile to look at that content and see if it makes sense in the missing position.

DannyH closed this task as Resolved.Apr 16 2015, 6:23 PM
DannyH added a subscriber: DannyH.

This has been resolved; thanks to Erik and Matt for the hard work of getting all the words back where they belong.

For the few remaining unresolved revisions, it looks like they're either "0" content length, or are test posts on test pages. At this point, we don't need to go above and beyond to unduplicate a handful of posts that say "Teste 2" and "Teste 3". This is good; we're done.

As a sanity check, in betalabs
[enwiki]> select distinct(rev_mod_state) from flow_revision where rev_content_length=0;

rev_mod_state
delete
hide
suppress

4 rows in set (0.25 sec)

That's kind of hard to read. It's:

mysql> select distinct(rev_mod_state) from flow_revision where rev_content_length=0;
+---------------+
| rev_mod_state |
+---------------+
| delete        |
| hide          |
| suppress      |
|               |
+---------------+
4 rows in set (0.14 sec)

So not all of them are moderated.

I think these are all empty headers or null edits.

Except there's 5 I can't immediately explain:

mysql> select COUNT(rev_change_type), rev_change_type from flow_revision where rev_content_length=0 AND rev_mod_state NOT IN ('delete', 'hide', 'suppress') AND rev_change_type NOT IN ('create-header', 'edit-header', 'edit-post', 'create-topic-summary', 'edit-topic-summary') GROUP by rev_change_type;
+------------------------+-----------------+
| COUNT(rev_change_type) | rev_change_type |
+------------------------+-----------------+
|                      5 | reply           |
+------------------------+-----------------+
1 row in set (0.23 sec)

But this is Beta, and there is no reason to think this bug has recurred.

DannyH removed a subscriber: DannyH.Mar 24 2016, 3:54 AM