Page MenuHomePhabricator

Track instances of duplicate Popups events being logged
Closed, ResolvedPublic3 Estimated Story Points

Description

There's a concern that the duplicate Popups events that @Tbayer reported in T161769: Schema:Popups sends extraneous link interaction events in control condition are being sent by the client. While there are a couple of avenues for investigation for the server-side components of the EL architecture, we should be confident that the PP codebase is behaving correctly.

We will log in Page Previews when an event that has already been logged is logged again. This will increase our confidence that the issue is not in the Page Previews code base (and let us know whether it is part of sendBeacon or EventLogging).

AC

  • If a duplicate event is logged within a page session, then:
    • It is dropped;
    • The PagePreviews.DuplicatePopupsEvent counter is incremented in StatsD, i.e.
mw.track( 'counter.PagePreviews.DuplicatePopupsEvent', 1 );

TBD

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Jdlrobson set the point value for this task to 3.
MBinder_WMF set the point value for this task to 3.Apr 19 2017, 5:30 PM

My recommendation would be to do this inside and outside of the eventLogging change listener so that we can compare what the system thinks it's logging and what is actually being logged.

I think we should do this as a standalone change listener that listens to the same piece of state as the eventLogging change listener and keeps track of it and logs to statsd directly when it finds duplicates (not using the state tree or the statsd change listener). That way even if different from the rest of the code base it would be isolated and we can get rid of it in a couple of weeks fairly simply.

Change 350206 had a related patch set uploaded (by Phuedx):
[mediawiki/extensions/Popups@master] Track and discard duplicate enqueued events

https://gerrit.wikimedia.org/r/350206

Change 350552 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/Popups@master] Remove isLoggingEnabled with Null Object pattern

https://gerrit.wikimedia.org/r/350552

Reviewed, works perefectly. @phuedx - really nice job. I love the idea of NullObject pattern. Tasks like this one deserves +3! +2 is not enough.

Change 350552 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Remove isLoggingEnabled with Null Object pattern

https://gerrit.wikimedia.org/r/350552

Should I sign this off or Tilman?
@phuedx how best can I simulate a duplicate event?

Also is there more work to be done here with regards to the dashboard (https://grafana.wikimedia.org/dashboard/db/reading-web-page-previews?refresh=1m&orgId=1) ?

Change 350206 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Track and discard duplicate enqueued events

https://gerrit.wikimedia.org/r/350206

From IRC:

22:48:56 <phuedx|afk> jdlrobson: just a heads up that i'll be deploying and discussing the duplication changes
22:49:02 <phuedx|afk> on tuesday
22:49:06 <phuedx|afk> discussing w/ tilman

I'm actually going to try to get 'em deployed on Monday.

Also is there more work to be done here with regards to the dashboard (https://grafana.wikimedia.org/dashboard/db/reading-web-page-previews?refresh=1m&orgId=1) ?

Done. See the "EventLogging Data Integrity" row. Note well that I've manually incremented the PagePreviews.EventLogging.DuplicateToken and .DuplicateEvent counters so that I could create (and verify!!1) the graphs on the dashboard.

I'm actually going to try to get 'em deployed on Monday.

[There are no deploys this week because of the codfweqiad switchover](https://wikitech.wikimedia.org/wiki/Deployments#Week_of_May_1st) 👎

Change 352554 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/Popups@wmf/1.29.0-wmf.21] Remove isLoggingEnabled with Null Object pattern

https://gerrit.wikimedia.org/r/352554

Change 352555 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/Popups@wmf/1.29.0-wmf.21] Track and discard duplicate enqueued events

https://gerrit.wikimedia.org/r/352555

Change 352554 merged by jenkins-bot:
[mediawiki/extensions/Popups@wmf/1.29.0-wmf.21] Remove isLoggingEnabled with Null Object pattern

https://gerrit.wikimedia.org/r/352554

Change 352555 merged by jenkins-bot:
[mediawiki/extensions/Popups@wmf/1.29.0-wmf.21] Track and discard duplicate enqueued events

https://gerrit.wikimedia.org/r/352555

Mentioned in SAL (#wikimedia-operations) [2017-05-08T18:41:11Z] <catrope@tin> Synchronized php-1.29.0-wmf.21/extensions/Popups/: T163198 (duration: 00m 39s)

"And now we play the waiting game…"

Change 353035 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/Popups@master] eventLogging: Discard events with duplicate tokens

https://gerrit.wikimedia.org/r/353035

@Jhernandez, @pmiazga: I've moved this task into Needs Code Review as it's easier to follow the conversation.

Change 353038 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/Popups@wmf/1.30.0-wmf.1] eventLogging: Discard events with duplicate tokens

https://gerrit.wikimedia.org/r/353038

Change 353035 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] eventLogging: Discard events with duplicate tokens

https://gerrit.wikimedia.org/r/353035

Change 353044 had a related patch set uploaded (by Jhernandez; owner: Jhernandez):
[mediawiki/extensions/Popups@master] Clear interaction after an event for it is logged in EL

https://gerrit.wikimedia.org/r/353044

Change 353058 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/Popups@wmf/1.29.0-wmf.21] eventLogging: Discard events with duplicate tokens

https://gerrit.wikimedia.org/r/353058

Change 353038 merged by jenkins-bot:
[mediawiki/extensions/Popups@wmf/1.30.0-wmf.1] eventLogging: Discard events with duplicate tokens

https://gerrit.wikimedia.org/r/353038

Change 353058 merged by jenkins-bot:
[mediawiki/extensions/Popups@wmf/1.29.0-wmf.21] eventLogging: Discard events with duplicate tokens

https://gerrit.wikimedia.org/r/353058

Mentioned in SAL (#wikimedia-operations) [2017-05-10T13:19:21Z] <hashar@tin> Synchronized php-1.30.0-wmf.1/extensions/Popups: eventLogging: Discard events with duplicate tokens - T161769 T163198 (duration: 01m 08s)

Mentioned in SAL (#wikimedia-operations) [2017-05-10T13:24:21Z] <hashar@tin> Synchronized php-1.29.0-wmf.21/extensions/Popups: eventLogging: Discard events with duplicate tokens - T161769 T163198 (duration: 00m 43s)

Change 353044 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Clear interaction after an event for it is logged in EL

https://gerrit.wikimedia.org/r/353044

I've deliberately skipped QA as @Jhernandez's change affected only the EventLogging instrumentation.

@Tbayer: While @Jhernandez was testing his change locally, he found that multiple identical "opened" events could be generated by the EL subsystem (see https://usercontent.irccloud-cdn.com/file/MOQQvJIe/multi-link-click.gif). However, these should've been captured by the client-side deduplicator.

Unless I'm mistaken there is nothing left to review.

@Jdlrobson: I think either you or I could sign off on this task at this point. @Jhernandez's change made the EL subsystem a little more aggressive in how it tidied up its state.

When I read https://grafana.wikimedia.org/dashboard/db/reading-web-page-previews?refresh=1m&orgId=1 I'm seeing

Screen Shot 2017-05-17 at 10.47.30 AM.png (349×654 px, 29 KB)

Does this mean 44 out of 49 duplicate events per minute?!
What are the units for duplicate events/tokens? Are these captured in events logged?

We will log in Page Previews when an event that has already been logged is logged again. This will increase our confidence that the issue is not in the Page Previews code base (and let us know whether it is part of sendBeacon or EventLogging).

Has this increased our confidence? What are the next steps?

@Jdlrobson No, 49 per minute, and 44 duplicates in the last 24h. If you hover over the (i) you get an explanation about the metric:

Screen Shot 2017-05-17 at 12.05.34 PM.png (213×452 px, 22 KB)

Those events are filtered out right now and not sent to event logging, but logged to statsd for debugging.

When I read https://grafana.wikimedia.org/dashboard/db/reading-web-page-previews?refresh=1m&orgId=1 I'm seeing

Screen Shot 2017-05-17 at 10.47.30 AM.png (349×654 px, 29 KB)

Does this mean 44 out of 49 duplicate events per minute?!

No, but I can see why there's some ambiguity. I've updated the titles and descriptions of the metrics to highlight that they apply to the period that you've chosen using the timepicker.

We will log in Page Previews when an event that has already been logged is logged again. This will increase our confidence that the issue is not in the Page Previews code base (and let us know whether it is part of sendBeacon or EventLogging).

Has this increased our confidence? What are the next steps?

Jein. We're confident in the implementation of the de-duplicator. However, after the deduplication changes went live we still saw duplicate web requests being logged by the server, which is as frustrating as it is confusing. @Tbayer and I are planning on reopening a couple of old investigations into event duplication that were closed by Analytics.

@Tbayer has also done some initial QA of the post-de-duplicator data – say that three times quickly – in T165461: Various sanity checks for data from the rewritten Popups instrumentation and there don't seem to be "any obvious anomalies", which increases my confidence.

Jdlrobson claimed this task.