Rerun populateParentId from some IDs (English Wikipedia)
Closed, ResolvedPublic

Description

Per my rather long and rambling commentary on bug #34922, the remaining NULLs in the WMF revision table are really beginning to cause problems.

It seems that the script simply didn't work properly the first time, since the NULLs peter out in the morning of 8 April 2008, the same day the script was committed to SVN (r32937).

For reasons not yet explained, the NULLs only start appearing around 01:30UTC 4 October 2007.

Some revisions in that range has 'rev_parent_id's, but not very many.

Therefore I would suggest that the script be run again to cover thus period (give or take, revision IDs 162147000 to 204172300).


Version: unspecified
Severity: normal

bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz34981.
Jarry1250 created this task.Via LegacyMar 5 2012, 12:53 AM
Jarry1250 added a comment.Via ConduitMar 5 2012, 12:54 AM

Adding as a blocker for bug #34922, since it seems to be required for a full fix.

Jarry1250 added a comment.Via ConduitMar 5 2012, 12:56 AM

And remove again, since as Bawolff rightly says, MediaWiki is fixed, it's Wikimedia at fault.

Jarry1250 added a comment.Via ConduitMar 5 2012, 2:08 PM

(In reply to comment #0)

Therefore I would suggest that the script be run again to cover thus period
(give or take, revision IDs 162147000 to 204172300).

Eurgh, those are en.wp revids of course, I'll have to establish whether this appears on other wikis at a later date. I suspect it probably does.

Jarry1250 added a comment.Via ConduitMar 6 2012, 1:57 PM

(In reply to comment #3)

Eurgh, those are en.wp revids of course, I'll have to establish whether this
appears on other wikis at a later date. I suspect it probably does.

Nope, doesn't appear to (I just tested Commons, de.wp and fr.wp). So just the English Wikipedia then.

Krinkle added a comment.Via ConduitMar 12 2012, 1:30 AM

Created attachment 10218
Screenshot of the issue

The attached screenshot taken from [1] shows that edits before 14:27, 2 November 2007 and after 11:53, 27 March 2008 and up until today have the sizes calculated properly. The ones in the middle don't and fallback to a revision total size (no color, and no +/- sign. It's not a wrongly calculated difference size, it just shows the total size of the page at that time).

[1] https://en.wikipedia.org/w/index.php?title=Special:Contributions/Dpmuk&dir=prev

Attached:

Bawolff added a comment.Via ConduitMar 12 2012, 1:35 AM

(In reply to comment #5)

Created attachment 10218 [details]
Screenshot of the issue

The attached screenshot taken from [1] shows that edits before 14:27, 2
November 2007 and after 11:53, 27 March 2008 and up until today have the sizes
calculated properly. The ones in the middle don't and fallback to a revision
total size (no color, and no +/- sign. It's not a wrongly calculated difference
size, it just shows the total size of the page at that time).

[1]
https://en.wikipedia.org/w/index.php?title=Special:Contributions/Dpmuk&dir=prev

Note if you're commenting/complaining about the fallback behaviour, the fallback behaviour of just "shows the total size of the page at that time" was introduced by me in r112995 and more discussed at bug 34922. It was between doing that, and just showing nothing at all. I'm not sure which is better.

Attached:

bzimport added a comment.Via ConduitMar 13 2012, 3:38 AM

daniel_money wrote:

No I wasn't commenting about the fallback behaviour - that makes sense and is quite obvious as the text isn't bold, isn't coloured and doesn't include a + or -. At the time of filing the bug however it was showing as a diff, i.e. the edit at 14:27, 2 November 2007 was showing (+1646) in green and bold. Presumably something fixed in bug #34922 solved that problem.

Jarry1250 added a comment.Via ConduitMar 13 2012, 8:31 AM

(In reply to comment #7)

At the time of filing the bug however it was showing as a diff, i.e. the
edit at 14:27, 2 November 2007 was showing (+1646) in green and bold.
Presumably something fixed in bug #34922 solved that problem.

Well, Bawolff put in a fix that isolated those revisions giving bad diff values, and replaced them with page-sizes, hence the resolution of that bug. Of course, we'd actually quite like them as diff values, hence this bug.

Reedy added a comment.Via ConduitApr 17 2012, 3:47 PM

I've hacked up the script to work between the start and end you suggested, also adding in a condition of where rev_parent_id = null (I think I'll put that into vcs) to further reduce the number of rows read to be checked and updated

Running in a screen session as me on fenari

Jarry1250 added a comment.Via ConduitApr 18 2012, 1:17 PM

(In reply to comment #9)

I've hacked up the script to work between the start and end you suggested, also
adding in a condition of where rev_parent_id = null (I think I'll put that into
vcs) to further reduce the number of rows read to be checked and updated

Running in a screen session as me on fenari

Any luck with this? Or does the script need fixing in some other way?

Reedy added a comment.Via ConduitApr 18 2012, 1:20 PM

(In reply to comment #10)

(In reply to comment #9)
> I've hacked up the script to work between the start and end you suggested, also
> adding in a condition of where rev_parent_id = null (I think I'll put that into
> vcs) to further reduce the number of rows read to be checked and updated
>
> Running in a screen session as me on fenari

Any luck with this? Or does the script need fixing in some other way?

...doing rev_id from 169280200 to 169280399

It's not going to be quick ;)

Jarry1250 added a comment.Via ConduitApr 18 2012, 1:30 PM

...Ah.

So that's ~23% done then in ~22 hours. On that basis, give or take, it's going to take another 3 days, which doesn't sound unreasonable.

Good good :)

Reedy added a comment.Via ConduitApr 19 2012, 11:55 AM

...doing rev_id from 176812800 to 176812999

Reedy added a comment.Via ConduitApr 20 2012, 10:23 AM

...doing rev_id from 185014000 to 185014199

Jarry1250 added a comment.Via ConduitApr 20 2012, 6:37 PM

Another two days then, give or take.

Jarry1250 added a comment.Via ConduitApr 22 2012, 3:42 PM

Seems finished, from what I can tell?

Reedy added a comment.Via ConduitApr 23 2012, 6:16 PM

rev_parent_id population complete ... 37262590 rows [34817563 changed]

Yup, seems to be

Add Comment