Rerun populateParentId from some IDs (English Wikipedia)
Closed, ResolvedPublic

Description

Per my rather long and rambling commentary on bug #34922, the remaining NULLs in the WMF revision table are really beginning to cause problems.

It seems that the script simply didn't work properly the first time, since the NULLs peter out in the morning of 8 April 2008, the same day the script was committed to SVN (r32937).

For reasons not yet explained, the NULLs only start appearing around 01:30UTC 4 October 2007.

Some revisions in that range has 'rev_parent_id's, but not very many.

Therefore I would suggest that the script be run again to cover thus period (give or take, revision IDs 162147000 to 204172300).


Version: unspecified
Severity: normal

bzimport set Reference to bz34981.
bzimport added a subscriber: Unknown Object (MLST).

Adding as a blocker for bug #34922, since it seems to be required for a full fix.

And remove again, since as Bawolff rightly says, MediaWiki is fixed, it's Wikimedia at fault.

(In reply to comment #0)

Therefore I would suggest that the script be run again to cover thus period
(give or take, revision IDs 162147000 to 204172300).

Eurgh, those are en.wp revids of course, I'll have to establish whether this appears on other wikis at a later date. I suspect it probably does.

(In reply to comment #3)

Eurgh, those are en.wp revids of course, I'll have to establish whether this
appears on other wikis at a later date. I suspect it probably does.

Nope, doesn't appear to (I just tested Commons, de.wp and fr.wp). So just the English Wikipedia then.

Created attachment 10218
Screenshot of the issue

The attached screenshot taken from [1] shows that edits before 14:27, 2 November 2007 and after 11:53, 27 March 2008 and up until today have the sizes calculated properly. The ones in the middle don't and fallback to a revision total size (no color, and no +/- sign. It's not a wrongly calculated difference size, it just shows the total size of the page at that time).

[1] https://en.wikipedia.org/w/index.php?title=Special:Contributions/Dpmuk&dir=prev

Attached:

(In reply to comment #5)

Created attachment 10218 [details]
Screenshot of the issue

The attached screenshot taken from [1] shows that edits before 14:27, 2
November 2007 and after 11:53, 27 March 2008 and up until today have the sizes
calculated properly. The ones in the middle don't and fallback to a revision
total size (no color, and no +/- sign. It's not a wrongly calculated difference
size, it just shows the total size of the page at that time).

[1]
https://en.wikipedia.org/w/index.php?title=Special:Contributions/Dpmuk&dir=prev

Note if you're commenting/complaining about the fallback behaviour, the fallback behaviour of just "shows the total size of the page at that time" was introduced by me in r112995 and more discussed at bug 34922. It was between doing that, and just showing nothing at all. I'm not sure which is better.

Attached:

daniel_money wrote:

No I wasn't commenting about the fallback behaviour - that makes sense and is quite obvious as the text isn't bold, isn't coloured and doesn't include a + or -. At the time of filing the bug however it was showing as a diff, i.e. the edit at 14:27, 2 November 2007 was showing (+1646) in green and bold. Presumably something fixed in bug #34922 solved that problem.

(In reply to comment #7)

At the time of filing the bug however it was showing as a diff, i.e. the
edit at 14:27, 2 November 2007 was showing (+1646) in green and bold.
Presumably something fixed in bug #34922 solved that problem.

Well, Bawolff put in a fix that isolated those revisions giving bad diff values, and replaced them with page-sizes, hence the resolution of that bug. Of course, we'd actually quite like them as diff values, hence this bug.

Reedy added a comment.Apr 17 2012, 3:47 PM

I've hacked up the script to work between the start and end you suggested, also adding in a condition of where rev_parent_id = null (I think I'll put that into vcs) to further reduce the number of rows read to be checked and updated

Running in a screen session as me on fenari

(In reply to comment #9)

I've hacked up the script to work between the start and end you suggested, also
adding in a condition of where rev_parent_id = null (I think I'll put that into
vcs) to further reduce the number of rows read to be checked and updated

Running in a screen session as me on fenari

Any luck with this? Or does the script need fixing in some other way?

Reedy added a comment.Apr 18 2012, 1:20 PM

(In reply to comment #10)

(In reply to comment #9)
> I've hacked up the script to work between the start and end you suggested, also
> adding in a condition of where rev_parent_id = null (I think I'll put that into
> vcs) to further reduce the number of rows read to be checked and updated
>
> Running in a screen session as me on fenari

Any luck with this? Or does the script need fixing in some other way?

...doing rev_id from 169280200 to 169280399

It's not going to be quick ;)

...Ah.

So that's ~23% done then in ~22 hours. On that basis, give or take, it's going to take another 3 days, which doesn't sound unreasonable.

Good good :)

...doing rev_id from 176812800 to 176812999

...doing rev_id from 185014000 to 185014199

Another two days then, give or take.

Seems finished, from what I can tell?

Reedy added a comment.Apr 23 2012, 6:16 PM

rev_parent_id population complete ... 37262590 rows [34817563 changed]

Yup, seems to be

Add Comment