Page MenuHomePhabricator

Raw webrequest partitions for 2014-10-30T21/1H not marked successful
Closed, DeclinedPublic

Description

The bits and upload webrequest partition [1] for 2014-10-30T21/1H have
not been marked successful.

What happened?

[1]


qchris@stat1002 jobs: 0 time: 08:12:06 // exit code: 130
cwd: ~
~/cluster-scripts/dump_webrequest_status.sh

+------------------+--------+--------+--------+--------+
| Date             |  bits  | mobile |  text  | upload |
+------------------+--------+--------+--------+--------+

[...]

| 2014-10-30T19/1H |    .   |    .   |    .   |    .   |
| 2014-10-30T20/1H |    .   |    .   |    .   |    .   |
| 2014-10-30T21/1H |    X   |    .   |    .   |    X   |
| 2014-10-30T22/1H |    .   |    .   |    .   |    .   |
| 2014-10-30T23/1H |    .   |    .   |    .   |    .   |

[...]

+------------------+--------+--------+--------+--------+

Statuses:

. --> Partition is ok
M --> Partition manually marked ok
X --> Partition is not ok (duplicates, missing, or nulls)

Version: unspecified
Severity: normal

Details

Reference
bz72810

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:53 AM
bzimport set Reference to bz72810.
bzimport added a subscriber: Unknown Object (MLST).

For bits, it only affected cp3020.
The affected period is 2014-10-30T21:25:41/2014-10-30T21:26:26.
No lost messages, only 70660 duplicates, which is <2 seconds worth of
data for bits.

For bits, it only affected cp3018.
The affected period is 2014-10-30T21:25:18/2014-10-30T21:26:10.
No lost messages, only 34087 duplicates, which is <2 seconds worth of
data for upload.

I could not find anything relevant in puppet, nor in SAL.

It's again only esams.

According to ganglia, kafka.rdkafka.brokers.*.rtt.avg's Max went up
during that time on

  • cp3018 to 6.0M for analytics1018)
  • cp3020 to 12.6M for analytics1018)

But other caches had even higher Max values for that average (

cp3019 had 36.7M for analytics1021
cp3010 had 11.8M for analytics1021
cp3010 had  8.8M for analytics1022

), but did not show duplicates.

According to ganglia, kafka.rdkafka.brokers.*.outbuf_cnt's Max went up
during that time on

  • cp3018 to 334.9 for analytics1022 (not analytics1018! It had 28.4 max for analytics1018)
  • cp3020 to 720.8 for analytics1018

But cp3019 had 479 for analytics1021 (i.e. a similar Max value), but
did not show duplicates.

kevinator set Security to None.