Page MenuHomePhabricator

Raw webrequest partitions from 2014-09-23T18:xx:xx onwards not marked successful
Closed, ResolvedPublic

Description

From 2014-09-23T18:xx:xx onwards, no partitions were marked successful
[1].

What happened?

[1]


qchris@stat1002 jobs: 0 time: 00:48:38 // exit code: 0
cwd: ~
~/cluster-scripts/dump_webrequest_status.sh

+---------------------+--------+--------+--------+--------+
| Date                |  bits  |  text  | mobile | upload |
+---------------------+--------+--------+--------+--------+

[...]

| 2014-09-23T18:xx:xx |    X   |    X   |    X   |    X   |    
| 2014-09-23T19:xx:xx |    X   |    X   |    X   |    X   |    
| 2014-09-23T20:xx:xx |    X   |    X   |    X   |    X   |    
| 2014-09-23T21:xx:xx |    X   |    X   |    X   |    X   |    
+---------------------+--------+--------+--------+--------+

Version: unspecified
Severity: normal

Details

Reference
bz71213

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:51 AM
bzimport set Reference to bz71213.
bzimport added a subscriber: Unknown Object (MLST).

Today's refinery deployment came with

Ie557acff61b907e0a43c45f0ca82b5bf43a800d6

which adds a new mandatory parameter "mark_directory_done_workflow_file" to
"oozie/webrequest/partition/add".

It seems that after the deployment, this Oozie job was not resubmitted.
Hence, it was running with the old properties file, hence missing the
setting for the newly added parameter.

To not disturb Oozie too much, I rolled back

/wmf/refinery/current/oozie/webrequest/partition/add

on the cluster to ebc92c1.

So now the directory contains xmls that work with the old properties file.

I started to rerun the affected jobs, and the first few finished already,
and the corresponding partitions were now marked successful.

In a few hours, the last jobs should have finished.
Waiting to close the bug until then.

The jobs reran just fine.
All affected webrequest partitions are now marked successful.

Pagecount generation automatically waited for webrequest partitions to
get successful, and automatically continued once they were.

So we now have good data for each of the affected partitions/hours.