Page MenuHomePhabricator

ETA for jobs that run in parallel is completely wrong
Closed, DeclinedPublic

Description

The ETA generated for page text dumps is based on the total number of revisions in the project rather than on the number of revisions we are actually going to retrieve for the chunk (which is unknown; we don't know how many revisions are contained in the 2 million range of pageIDs we might be getting in one chunk file, and we would be waiting a *very* long time for a select count(*) to complete). Fix this to give some reasonable estimate.


Version: unspecified
Severity: minor

Details

Reference
bz27115

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:23 PM
bzimport set Reference to bz27115.

(In reply to comment #0)

The ETA generated for page text dumps is based on the total number of
revisions in the project rather than on the number of revisions we are actually going

Looking at http://dumps.wikimedia.org/backup-index.html I see that some processes don't even provide an ETA at all:

2013-01-29 17:37:42 cswiki: Dump in progress

2013-01-29 14:14:25 in-progress All pages with complete edit history (.7z)
    cswiki-20130129-pages-meta-history.xml.7z 1.0 GB (written)

Is that intended? And to who is showing an ETA interesting?

The recompression steps, for example, don't show an eta because there is no simple way to make an estimate. The only steps with an eta are those that walk through XML, and based on page and revision info they generate an estimate.

People watch these to find out when their favorite file is going to complete. I've also gotten bug reports based on the eta being much longer than it should be.

Nemo_bis lowered the priority of this task from Medium to Low.Apr 9 2015, 7:32 AM
Nemo_bis set Security to None.
ArielGlenn closed this task as Declined.Feb 20 2017, 9:18 PM

I'm going to decline this in favor of work on the rewrite project. Jobs will be split up into small pieces and run out of order anyways, so the notion of an ETA will have to be changed.

ArielGlenn moved this task from Backlog to Done on the Dumps-Generation board.Feb 20 2017, 9:21 PM