Sudden reversion to old version of page ("lastrevid" != "revid")
Closed, ResolvedPublic

Description

On Chrome Windows 7, I noted twice today that ANI on English Wikipdia was not displaying current threads, but was displaying threads three days old. Refreshing did not make a difference, although occasionally it would display the current page.

The issue evidently struck others as well, as a user who tried to add a new section wound up adding it to the old material instead of the current page: https://en.wikipedia.org/w/index.php?title=Wikipedia%3AAdministrators%27_noticeboard%2FIncidents&diff=595313330&oldid=595312917

This seems to have happened again, here:
https://en.wikipedia.org/w/index.php?title=Wikipedia%3AAdministrators%27_noticeboard%2FIncidents&diff=595314104&oldid=595313971

On the IRC English Wikipedia admin's channel, an admin noted that every time he tried to load it, the page was showing a post from 21:53, 10 February 2014 (UTC) at the very bottom.

When I communicated with an editor about the issue here - https://en.wikipedia.org/w/index.php?title=User_talk:NE_Ent&oldid=595314919 - I noticed that another editor had been impacted in the section above.


Version: wmf-deployment
Severity: major

bzimport set Reference to bz61319.
Mdennis-WMF created this task.Via LegacyFeb 13 2014, 4:26 PM
Aklapper added a comment.Via ConduitFeb 13 2014, 7:10 PM

(In reply to Maggie Dennis from comment #0)

On Chrome Windows 7, I noted twice today that ANI on English Wikipdia was
not displaying current threads, but was displaying threads three days old.
Refreshing did not make a difference, although occasionally it would display
the current page.

Did somebody try purging ([[WP:Purge]])?

Is this really a "reversion" in the sense of reverting changes, or maybe just an old version being delivered and displayed for some people (caching issues)?

Mdennis-WMF added a comment.Via ConduitFeb 13 2014, 9:14 PM

Oh, yes. People have tried purging repeatedly. The old version is delivered and displayed erratically - sometimes I am seeing the current version and other times the old. If you look at the history, you can see that it is still happening - https://en.wikipedia.org/w/index.php?title=Wikipedia:Administrators%27_noticeboard/Incidents&action=history

There is also discussion at Village Pump/Technical, although i don't know if it will help:
https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Discussions_disappearing_and_reappearing

Coren speculated earlier that it might be related to "new section", and when you look at the history of ANI there does seem to be something to that.

However, when it happened to me just a few minutes ago (https://en.wikipedia.org/w/index.php?title=Wikipedia%3AAdministrators%27_noticeboard%2FIncidents&diff=595354506&oldid=595351438) I had intended to edit the last section only. I can't be sure I did, because I wiped out the pre-built edit summary. But I was aware that the problem might be related to this and intended to avoid it, anyway.

Ciencia_Al_Poder added a comment.Via ConduitFeb 13 2014, 9:30 PM

An hour ago there was a problem with esams cluster, similar to what happened on bug 54647 and then tracked on bug 56545, so the problem seems to be the same: cluster fails and we're getting cached pages, which is better than having no pages at all

Anomie added a comment.Via ConduitFeb 14 2014, 1:01 AM

This is not good.

API query: https://en.wikipedia.org/w/api.php?action=query&prop=info|revisions&rvlimit=1&format=jsonfm&pageids=2535910&servedby=1

{

"query-continue": {
    "revisions": {
        "rvcontinue": 595381322
    }
},
"servedby": "mw1192",
"query": {
    "pages": {
        "2535910": {
            "pageid": 2535910,
            "ns": 4,
            "title": "Wikipedia:Reference desk/Science",
            "contentmodel": "wikitext",
            "pagelanguage": "en",
            "touched": "2014-02-14T00:41:03Z",
            "lastrevid": 595381347,
            "counter": "",
            "length": 112194,
            "revisions": [
                {
                    "revid": 595381347,
                    "parentid": 595381322,
                    "minor": "",
                    "user": "SineBot",
                    "timestamp": "2014-02-14T00:41:03Z",
                    "comment": "Signing comment by [[Special:Contributions/68.41.73.11|68.41.73.11]] - \"/* Freezing point? */ new section\""
                }
            ]
        }
    }
}

}

The "lastrevid" field and the "revid" field in revisions should be the same. I suspect that some of the slave DBs are somehow screwed up and haven't gotten the page_latest field updared

Anomie added a comment.Via ConduitFeb 14 2014, 1:02 AM

Oops, pasted the wrong copy.

{

"query-continue": {
    "revisions": {
        "rvcontinue": 595381322
    }
},
"servedby": "mw1205",
"query": {
    "pages": {
        "2535910": {
            "pageid": 2535910,
            "ns": 4,
            "title": "Wikipedia:Reference desk/Science",
            "contentmodel": "wikitext",
            "pagelanguage": "en",
            "touched": "2014-02-14T00:41:03Z",
            "lastrevid": 594888322,
            "counter": "",
            "length": 80791,
            "revisions": [
                {
                    "revid": 595381347,
                    "parentid": 595381322,
                    "minor": "",
                    "user": "SineBot",
                    "timestamp": "2014-02-14T00:41:03Z",
                    "comment": "Signing comment by [[Special:Contributions/68.41.73.11|68.41.73.11]] - \"/* Freezing point? */ new section\""
                }
            ]
        }
    }
}

}

Anomie added a comment.Via ConduitFeb 14 2014, 2:16 AM

More data:

anomie@terbium:/usr/local/apache/common-local$ for db in 'db1055' 'db1043' 'db1037' 'db1049' 'db1051' 'db1056'; do echo $db; echo -e 'select page_latest from page where page_id=2535910;' | mwscript sql.php --wiki=enwiki --slave=$db; done
db1055
stdClass Object
(

[page_latest] => 595381347

)
db1043
stdClass Object
(

[page_latest] => 595381347

)
db1037
stdClass Object
(

[page_latest] => 595381347

)
db1049
stdClass Object
(

[page_latest] => 595381347

)
db1051
stdClass Object
(

[page_latest] => 595381347

)
db1056
stdClass Object
(

[page_latest] => 594888322

)

So db1056 seems out of sync somehow.

gerritbot added a comment.Via ConduitFeb 14 2014, 2:32 AM

Change 113322 had a related patch set uploaded by Springle:
depol db1056 for pt-table-sync checks bug 61319

https://gerrit.wikimedia.org/r/113322

gerritbot added a comment.Via ConduitFeb 14 2014, 2:33 AM

Change 113322 merged by jenkins-bot:
depol db1056 for pt-table-sync checks bug 61319

https://gerrit.wikimedia.org/r/113322

Springle added a comment.Via ConduitFeb 14 2014, 2:41 AM

db1056 has been depooled for a sync check, and the remaining slaves will get the same treatment in rotation jic.

db1056 was demoted from master a couple weeks ago, backed up and then eventually rebuilt from another unpooled s1 slave, db1050. It's possible the original problem lies on that box.

Aklapper added a comment.Via ConduitFeb 25 2014, 3:47 PM

Sean: Anything left to do / investigate here or can this be closed as FIXED?

Aklapper added a comment.Via ConduitMar 20 2014, 11:47 AM

Sean: Anything left to do / investigate here or can this be closed as FIXED?

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.