Page MenuHomePhabricator

strange virtual pageview jump on 2019-04-16-03
Closed, ResolvedPublic

Description

Take a look at this jump on virtual pageviews that looks quite anomalous:

https://grafana.wikimedia.org/d/000000018/eventlogging-schema?orgId=1&from=1555291024765&to=1555463824765&var-schema=VirtualPageView

Pinging @Jdlrobson and @phuedx to see if they know why this might be

Event Timeline

Nuria created this task.Apr 17 2019, 1:18 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 17 2019, 1:18 AM

This smells like bot activity....

From the SAL:

15:21 otto@deploy1001: scap-helm eventgate-analytics finished
15:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:20 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.28 -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:19 otto@deploy1001: scap-helm eventgate-analytics finished
15:19 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:19 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:18 otto@deploy1001: scap-helm eventgate-analytics finished
15:18 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:18 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:16 elukey: roll restart kafka on kafka-jumbo100[1-6] to pick up openjdk upgrades

I don't see how a rolling restart of Kafka or a deployment of EventGate could've impacted the EventLogging pipeline. There weren't any deployments to the Wikipedias nearby the spike either.

Nuria added a comment.Apr 17 2019, 3:18 PM

Eventgate and eventlogging (despite name) share no infrastructure so those two events are unrelated. I wonder whether the rolling restart made the metric artificially spikey?

@elukey, Nuria is right here, yes? If eventlogging-processors were stuck for a bit, then the per schema topics will jump up when raw client side event processing is reenabled.

@elukey, Nuria is right here, yes? If eventlogging-processors were stuck for a bit, then the per schema topics will jump up when raw client side event processing is reenabled.

I think so yes, timing are the ones related to my roll restart of Jumbo..

There's a spike in VirtualPageview events today (see https://grafana.wikimedia.org/d/000000018/eventlogging-schema?orgId=1&from=1555459200000&to=1555516800000&var-schema=VirtualPageView) which appears to correspond to today's updates to the cluster. Further confirmation?

Ottomata added a comment.EditedApr 17 2019, 5:24 PM

VirtualPageview counts in Hive should be normal, its only Kafka messages that should have delayed messages causing a later jump. If you see the same jump in Hive, then this is related to something other than maintenance.

Nuria added a comment.Apr 17 2019, 7:32 PM

Ok, will close ticket cause it seems pretty well stablished that this is related to consumers consuming at a higher rate after a restart.

Nuria closed this task as Resolved.Apr 17 2019, 7:32 PM

I've been noodling on proposing that @Stashbot and/or scap create annotations in Grafana. I think such annotations would've been useful here.

Also! Thanks, everyone!