Page MenuHomePhabricator

Spin out a tiny EventLogging RL module for lightweight logging
Closed, ResolvedPublic

Description

In CentralNotice, which runs on almost every pageview, we need to send events with the minimum possible JS code. Client-side validation isn't a priority here... Instead of copypasta-ing some EventLogging methods, it'd be nice to have a minimal EL module that just assembles the URL sends the event.

Reference: https://gerrit.wikimedia.org/r/#/c/408488/

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Nuria raised the priority of this task from High to Needs Triage.

Change 450591 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/EventLogging@master] Switch to ESLint from JSCS+JSHint

https://gerrit.wikimedia.org/r/450591

Change 450597 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/EventLogging@master] Remove unused ext.eventLogging.Schema from registry

https://gerrit.wikimedia.org/r/450597

Change 450598 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/EventLogging@master] Move files to their own directory per module

https://gerrit.wikimedia.org/r/450598

Change 450591 merged by jenkins-bot:
[mediawiki/extensions/EventLogging@master] Switch to ESLint from JSCS+JSHint

https://gerrit.wikimedia.org/r/450591

Change 450597 merged by jenkins-bot:
[mediawiki/extensions/EventLogging@master] Remove unused ext.eventLogging.Schema from registry

https://gerrit.wikimedia.org/r/450597

Change 450598 merged by jenkins-bot:
[mediawiki/extensions/EventLogging@master] Move files to their own directory per module

https://gerrit.wikimedia.org/r/450598

Change 450642 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/EventLogging@master] Separate debug code from regular code

https://gerrit.wikimedia.org/r/450642

Change 450643 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/EventLogging@master] debug: Minor clean up

https://gerrit.wikimedia.org/r/450643

I think that Nuria was right to press for an evidence-based rationale, and I haven't seen one.

The size I am seeing is 2.3 KiB, not 3 KiB:

$ curl -s -H "Accept-Encoding: gzip" -o/dev/null -w "%{size_download} bytes\n" "https://en.wikipedia.org/w/load.php?modules=ext.eventLogging|ext.eventLogging.subscriber|ext.eventLogging.Schema"
2374 bytes

But even this number is inflated, because we typically bundle more modules together, resulting in better compression. When bundled with additional modules, the size I am seeing for client-size EventLogging code is between 1.5 and 2 KiB. I can't see how this number could make a material difference to page speed.

The principle of validating early and often (and dropping anything that fails validation) has worked very well in practice and the consequences for dropping it are not well-understood. If you want to litigate this, I'd expect stronger arguments.

The argument that I initially went by was @AndyRussG saying:

For CentralNotice, we decided not to load EL client-side code on the recommendation of performance folks (related Gerrit change), since CN and its dependencies are loaded on nearly every pageview. However, it would be preferable not to repeat the same functionality in two places... hence this task. :)

But looking at that closer, I don't see where the recommendation from the performance team is. I'm not familiar enough with how large JS payloads are on wiki projects, but is 1 or 2 KiB a problem? That's what we would save if we didn't validate client-side (from @Krinkle's examples, it looks like that's the size of most EL schema modules).

@ori I'll look into it more later, but short version: The problem isn't the size of the core module. Instead, my main concerns are:

  • The startup registry is getting rather large (35K compressed, 101K uncompressed). This needs to be downloaded, parsed, and executed before we can begin to download/parse/execute any other module, after which the page can become interactive. Aside from the CPU/wall time to process it all (ideally without jank), there is also competition with bandwidth for HTML/CSS, and memory footprint on all page views for the registry.
    • I am working with multiple teams to reduce their footprint in the registry. And every extension needs to do its bit, each single bit will not be super impactful, but combined - I hope will be. For EventLogging, the startup registry contains 55 schema.* modules, which is about 2.6K of meta data. Remember, this is not 2.6K of code, this is just meta-data, pulled down on every page view globally. See also "Reduce size of startup manifest" at T127328, and breakdown at T192623.
  • The process for logging an event using the advocated mw.track() pattern is too slow, and as a result does not work for tracking clicks (due to being async). See T192862 for a related DevTools-timeline analysis. The main issue is the asynchronous loading of the EL and EL-schema modules. Given the precedent set by CentralNotice and Popups extensions, as well as our mobile apps, it seems simple enough to just cut this out entirely; leaving only the lightweight subscriber module that listens to mw-track and then (more or less directly) goes to JSON/navigation.sendBeacon. Once there, it is also safe to perform a navigation in the browser.
    • This lightweight module can still auto-insert the schema revision number using a data-augmented RL file module, similar to what we do with mediawiki.util in core. I will write more about this later.
  • EventLogging's debug mode is fairly new and has room for improvement. The idea is to move it to a separate module (to exclude it from the prod payload), and re-implement the client-side validation there using the schema from the API.

I don't know that this is entirely related but @Jdrewniak worked on an "EventLogging lite" implementation for Portals. Maybe this can be pulled into an NPM package?

@Krinkle OK, that makes sense to me. As a lead-up to that, would it make sense to run a brief study in which we log an event when an event fails validation on the client? The code for logging of the 'validation failed' event could use an alternate, light-weight code-path so we don't end up wondering how many 'validation failed' events failed validation.

Edit: @Nuria and @Milimetric may have thoughts on this as well.

If this isn't urgent, it might be good to implement these ideas as part of the EventLogging client revamp in T185233: Modern Event Platform this FY.

The lighter EL module is something we will be working on this quarter per our agreement with performance team.

[..] would it make sense to [..] log an event when an event fails validation on the client?

I added mw.eventLog.logFailure a while back for lightweight error tracking (via Statsv). We could re-use that to track validation errors.

However, I learned today that the client actually submits the event to the server even if it fails validation client-side. This would support the theory of client-side validation primarily existing currently to help developers notice issues, given the network request would still happen and yield HTTP 204 (given offline and asynchronous processing).

This is fine actually, it means we already have statistics for validation errors (graph 1, graph 2). Kafka topic eventlogging_EventError breaks down by schema and cause ("validation" or "processor").

Would that work?

However, I learned today that the client actually submits the event to the server even if it fails validation client-side.

Oh, yeah. I completely forgot about that.

Would that work?

Yes, totally.

@Nuria, @Ottomata : wow! I had missed most of that. Cool! All the same, I don't think @Krinkle's change would make the ground shift beneath the larger project-work you have planned. It improves code quality, and the comments above build a credible case that the change would have a positive effect on performance and advance Krinkle's RL work, and that the risk of a regression in data quality is low. So I don't personally see a reason to stall it.

Ok, starting work on this. Basic plan:

  • The startup registry is getting rather large (35K compressed, 101K uncompressed). This needs to be downloaded, parsed, and executed before we can begin to download/parse/execute any other module, after which the page can become interactive. Aside from the CPU/wall time to process it all (ideally without jank), there is also competition with bandwidth for HTML/CSS, and memory footprint on all page views for the registry.
    • I am working with multiple teams to reduce their footprint in the registry. And every extension needs to do its bit, each single bit will not be super impactful, but combined - I hope will be. For EventLogging, the startup registry contains 55 schema.* modules, which is about 2.6K of meta data. Remember, this is not 2.6K of code, this is just meta-data, pulled down on every page view globally. See also "Reduce size of startup manifest" at T127328, and breakdown at T192623.

It seems like the proposed RL wildcard modules system would come in handy here?

Change 450642 merged by jenkins-bot:
[mediawiki/extensions/EventLogging@master] Separate debug code from regular code

https://gerrit.wikimedia.org/r/450642

Change 450643 merged by jenkins-bot:
[mediawiki/extensions/EventLogging@master] debug: Minor clean up

https://gerrit.wikimedia.org/r/450643

Change 453457 had a related patch set uploaded (by Milimetric; owner: Milimetric):
[mediawiki/extensions/EventLogging@master] [WIP] Move event validation to debug module

https://gerrit.wikimedia.org/r/453457

Change 463501 had a related patch set uploaded (by Milimetric; owner: Milimetric):
[mediawiki/extensions/EventLogging@master] [WIP] Move event validation to debug module (2/2)

https://gerrit.wikimedia.org/r/463501

Change 453457 merged by jenkins-bot:
[mediawiki/extensions/EventLogging@master] Move event validation to debug module (1/2)

https://gerrit.wikimedia.org/r/453457

Change 465549 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/UniversalLanguageSelector@master] ext.uls.eventlogger: Remove use of removed setDefaults() method

https://gerrit.wikimedia.org/r/465549

Change 465549 merged by jenkins-bot:
[mediawiki/extensions/UniversalLanguageSelector@master] ext.uls.eventlogger: Remove use of removed setDefaults() method

https://gerrit.wikimedia.org/r/465549

Change 463501 merged by jenkins-bot:
[mediawiki/extensions/EventLogging@master] Move event validation to debug module (2/2)

https://gerrit.wikimedia.org/r/463501

Change 463501 merged by jenkins-bot:
[mediawiki/extensions/EventLogging@master] Move event validation to debug module (2/2)

https://gerrit.wikimedia.org/r/463501

Locally, I tested https://gerrit.wikimedia.org/r/463501 three different ways:

  1. Without this change (master/master).
  2. With this EL change applied.
  3. With this EL change applied, and, locally modified NavTiming to remove use of the now-deprecated schema modules.

In the third case, I was expecting the beacon to be sent earlier than in previous two cases because it no longer needs the schema modules.

The main objective of the lightweight EventLogging initiative was to have a direct path from mw.track('event. *** ') to navigator.sendBeacon(), without any asynchronous code or network roundtrips. To improve performance by reducing overhead, and to reduce loss of events by eliminating the possibility of the client being aborted before it reaches the persistence layer behind the Beacon API.

However, to my surprise that didn't happen. This is because we we haven't merged the "ext.eventLogging" and "ext.eventLogging.subscriber" module. As such, we still lazy-load the remaining code. While the payload is now smaller without the schema modules, the timeline still looks the same as before.

We can either move the code to the ext.eventLogging module, or to the subscriber module. The subscriber module is the entry point we preload on all pages.

  • The module "ext.eventLogging" currently represents the public mw.eventLog class and is referenced by various extensions as dependency for this purpose.
  • The "ext.eventLogging.subscriber" module is considered private for most of its purpose, but is preloaded in the module queue on all (cached) HTML pages. And more recently, the Schema.js class was merged into it, and it gained the inSample method, so it too has become a valid dependency that some extensions can call on.

I'd recommend settling on "ext.eventLogging" (one module, with multiple files), and keeping "ext.eventLogging.subscriber" as (deprecated) empty module that just aliases to "ext.eventLogging", similar to what we did with the schema modules.

I can take care of this, @Krinkle, unless you're doing it as an urgent matter. Let me know.

Change 478228 had a related patch set uploaded (by Milimetric; owner: Milimetric):
[mediawiki/extensions/EventLogging@master] Merge subscriber functionality into core module

https://gerrit.wikimedia.org/r/478228

Change 478228 merged by jenkins-bot:
[mediawiki/extensions/EventLogging@master] Merge subscriber functionality into core module

https://gerrit.wikimedia.org/r/478228

I just checked and I think we've exorcised any async or roundtrip code. @Krinkle, is there a specific test you do or a specific mw.track call that was too slow that you'd like me to check into?

@Milimetric My test is with (a simplified version of) Navigation Timing. I record a Performance timeline and confirm that the event network request happens after only 2 roundtrips of JavaScript (1: module=startup, 2: el.subscriber+ext.navtiming etc).

Screen Shot 2019-01-20 at 18.26.30.png (584×2 px, 167 KB)

@Krinkle: I tested in vagrant with navtiming sample 1/1 i see two requests for js (in sequence) and event happens right after, * I think* this is right, please let me know otherwise. Does this also need to be corrected? https://www.mediawiki.org/wiki/Extension:EventLogging/Guide#See_logging_in_your_browser

Screen Shot 2019-01-22 at 3.45.12 PM.png (1×2 px, 348 KB)

Yep, all good. Hence closed the task.

The debugging looks nicer now, thanks to @Milimetric. But, the way to trigger it is still the same. It's documented in the code (auto published to here), and matches the Wikitech page.

Ping here @AndyRussG and @phuedx @jlinehan so they now Eventlogging is been refactored such it can now be added in every page, it does not validate events client side like it used to but you can set up that validation in your development environment. We recommend any instrumenting code that is not using eventlogging and rather sending events directly to the beacon (virtual pageviews and some FR banners) to be changed so it is resilient to changes on the EL client that might be upcoming.

Ping here @AndyRussG and @phuedx @jlinehan so they now Eventlogging is been refactored such it can now be added in every page, it does not validate events client side like it used to but you can set up that validation in your development environment. We recommend any instrumenting code that is not using eventlogging and rather sending events directly to the becaon (virtual pageviews and some FR banners) to be changed so it is resilient to changes on the EL client that might be upcoming.

Thanks!!!!!! Will do: T214709. :)