Page MenuHomePhabricator

Create logging instrumentation for Wikitext editor not affected by ad blockers
Closed, ResolvedPublic

Description

Per T251464#6182776 and T240697#6217971, it's impossible for us to see how many edits are coming from users without JS because of ad blockers and tracking blockers that block sending data via EventLogging.

There is a concern that prolific editors might have JS turned off for privacy reasons and by lowering non-JS support we would be hurting those high-impact users.

To see what percentage of completed (non-bot) edits are coming from editors with JavaScript disabled (T240697), we should implement some sort of simplistic client-side logging for the Wikitext editor

Requirements

  1. Logging is implemented that will enable us to know the number of edits made by people who do NOT have Javascript enabled
  2. We should be able to aggregate the edits mentioned above by project platform (mobile and desktop) and by country.
  3. No other data should be collected in order to preserve the privacy of people who are using ad blockers for privacy reasons.

Deployment timing

Ideally, this new instrumentation can be live on all wikis by 10-May. This means, the code for this new instrumentation would be code-reviewed by the train starting on 4-May.

Implementation

The new instrumentation that is needed to deliver what is described in the ===Requirements section above should be implemented in the way it was decided upon in T280841 and pasted here :

  • We will send an event to the VisualEditorFeatureUse schema adjacent to the existing server-side logging in WikiEditor. It will be in the form:
    • event.feature = 'mwSave'
    • event.action = 'source-has-js'.

Done

  • The instrumentation needed to fulfill the ===Requirements listed above are implemented

Event Timeline

For what it's worth the Modern Event Platform endpoint https://intake-analytics.wikimedia.org/v1/events (see wikitech:Event_Platform/EventGate) is not yet blocked by uBlock Origin (and probably not by others either) when I recently checked. This could be reason enough to up the priority on migrating EditAttemptStep schema to MEP? cc @jlinehan @sdkim

If we are interested in tracking completed edits, why not place the instrument at the server? Then the client's disposition re: Javascript doesn't mean anything. Is this problem to accurately measure the number of editors with JavaScript disabled? Or to flag them somehow? Or are their edits not being counted at all currently?

For what it's worth the Modern Event Platform endpoint https://intake-analytics.wikimedia.org/v1/events (see wikitech:Event_Platform/EventGate) is not yet blocked by uBlock Origin (and probably not by others either) when I recently checked. This could be reason enough to up the priority on migrating EditAttemptStep schema to MEP?

Sadly this is not a long-term solution, also will not capture the other problem of clients with JavaScript disabled, as opposed to simply ad-blockers.

@sdkim @kzimmerman this is the first step towards really answering T240697. Since the answer to T240697 was undermined by the ad blockers, I'm going to reopen that ticket for now rather than open a new one.

For what it's worth the Modern Event Platform endpoint https://intake-analytics.wikimedia.org/v1/events (see wikitech:Event_Platform/EventGate) is not yet blocked by uBlock Origin

It is now! ;) https://github.com/easylist/easylist/commit/76c6db835f68cfb464632f392dd49d56322735ee

One thing I think we could do here is to add a hidden field to the edit form (see includes/EditPage.php) with an HTML snippet that resolves to different values depending on whether the client has JS enabled, and to submit the resolved value upon form submission. We're already using a hidden field there in a similar way to verify unicode support: https://github.com/wikimedia/mediawiki/blob/master/includes/EditPage.php#L3003-L3004. Then we'd have that data point available on the server to send along in an analytics event.

I'm confused by this task. There's a big leap between "see how many edits are coming from users without JS" and "create instrumentation not affected by ad blockers".

We should not work on anything attempting to bypass ad blockers. Not only is it adversarial to users, but it's also a big waste of effort, as blocking is much easier than blocking the blocking. I have no objection to measuring actions by no-JS users, to prioritize what we work on etc., but the way you phrased this is giving me reservations.

And the existing instrumentation in WikiEditor is already mostly implemented on the server-side (https://codesearch.wmcloud.org/search/?q=doEventLogging&i=nope&files=&excludeFiles=&repos=Extension:WikiEditor), and thus mostly not affected by either ad blockers or by disabling JS (except for things that happen client-side, like the ready timing, or things that require JS, like switching to VisualEditor). I think you'll need to clarify what is desired here.

I'm confused by this task. There's a big leap between "see how many edits are coming from users without JS" and "create instrumentation not affected by ad blockers".

We should not work on anything attempting to bypass ad blockers. Not only is it adversarial to users, but it's also a big waste of effort, as blocking is much easier than blocking the blocking. I have no objection to measuring actions by no-JS users, to prioritize what we work on etc., but the way you phrased this is giving me reservations.

Hi, thanks for flagging this.

I apologize for the original wording of the task. The intent is not to bypass ad blockers and user intentions. In fact, the goal of this measurement is ironically to ensure that privacy conscious editors continue to have support. We know that some people turn off JS intentionally for privacy reasons and we want to make sure we aren't undercounting them. All we need to do is count total edits per country/project/day made with no-JS, so the intent is to drop all other data (it might be hard to drop some data, so someone on eng should definitely chime in if I'm making a ridiculous statement).

Let me know if that clears things up.

I think I agree with @Mholloway's suggested approach -- a client-side input whose value is set via JavaScript which feeds into server-side logging for save events. It'll get us a clean view of the percentage of no-JS edits, without us having to care about bypassing ad-blocking in any meaningful way.

If this is agreeable, we should probably rename the task -- we don't want to create logging instrumentation that bypasses ad-blockers in the general case, just add a little data in this specific area.

[…] All we need to do is count total edits per country/project/day made with no-JS, so the intent is to drop all other data […]

a client-side input whose value is set via JavaScript which feeds into server-side logging for save events. It'll get us a clean view of the percentage of no-JS edits, […]

I wonder if our current instrumentation already provides these numbers with the same level of accuracy. I believe we have Edit-schema instrumentation in most or all JS-based editing software (most notably VisualEditor on desktop and mobile, and WikiEditor via action=edit), and we have Edit-schema instrumentation on the server-side of action=edit.

I would guess that substracting one of these from another one of these, would effectively yield the count of all edits made on action=edit without a Grade A JS-based editor having (sucessfully) been loaded during the process.

I think it may also be important to consider here that "having JS disabled" and "has not sucessfully loaded a Grade A JS-based editor before pressing save" are not the same thing. From a compatibility level we mostly don't have "no js" versions of software. We have a Grade C "Base" level that loads for everyone, with an optional Grade A JS-based layer over top of that. This layer may be skipped, disabled, rejected, time out, or fail to load for any number of reasons. All that to say, labeling this group as "has JS intentionally disabled" might be inaccurate and could de-emphasize the (possibly larger) part of Grade C group that hasn't disabled JS. E.g. using an older browser we choose not to support JS for, or they may have a browser addon or gadget that unintentionally conflicts with our code, or they may be using a device or connection that is sometimes too slow leading to that particular edit attempt to remain at the Grade C level. (We've all been on bad WiFi or with poor connectivity at times, I think!)

I wonder if our current instrumentation already provides these numbers with the same level of accuracy. I believe we have Edit-schema instrumentation in most or all JS-based editing software (most notably VisualEditor on desktop and mobile, and WikiEditor via action=edit), and we have Edit-schema instrumentation on the server-side of action=edit.

So-so -- we've debated this in earlier tickets. We can certainly work out the breakdown between people who're using JS-required editing methods and not (using revision tagging rather than anything adblockable, since VE's logging of edit success is done from the client-side), but in the WikiEditor group we can't currently work out whether they had to or just chose to. The issue is looking at the non-VisualEditor edits and distinguishing between people who have JS disabled and who have adblocking (since our analytics endpoints are all on blocklists).

It's fair to note that my "set a simple value in a hidden input" test doesn't tell us too much about the level of JS support available, but it does bypass the adblock issue entirely and let us differentiate those groups. I guess that if we wanted we could do a bunch of feature-testing to determine what value is set in said input, to log which notional "grade" a visitor is in if they do have JS enabled...

@JKatzWMF I'm assuming that we only need to enhance the logging from WikiEditor, since that's where it's ambiguous whether JS is supported. For VisualEditor/DiscussionTools/etc we can use the revision-log to count revisions tagged as coming from those sources, bypassing any adblocking issues with their analytics.

I apologize for the original wording of the task. The intent is not to bypass ad blockers and user intentions. In fact, the goal of this measurement is ironically to ensure that privacy conscious editors continue to have support. We know that some people turn off JS intentionally for privacy reasons and we want to make sure we aren't undercounting them. All we need to do is count total edits per country/project/day made with no-JS, so the intent is to drop all other data (it might be hard to drop some data, so someone on eng should definitely chime in if I'm making a ridiculous statement).

I think this is well intentioned but missing the point. If someone enables an ad blocker/privacy blocker, they're sending a clear signal that they do not wanted to be tracked or counted or whatever. The implicit tradeoff is that people reviewing analytics data will not see those "invisible" users and make decisions in a different manner than if those users showed up, but that's fundamentally what opting out of tracking is all about it.

Users without JS is a different issue, see https://www.mediawiki.org/wiki/No-JavaScript_notes#Reasons_for_not_having_JavaScript.

I think this is well intentioned but missing the point. If someone enables an ad blocker/privacy blocker, they're sending a clear signal that they do not wanted to be tracked or counted or whatever.

Just to clarify my implementation comments above, they wouldn't involve adding in any new tracking that would catch people who weren't already being tracked. It'd just be adding a field to some tracking that's already happening server-side as part of the submission process.

[…] We can certainly work out the breakdown between people who're using JS-required editing methods and not […], but in the WikiEditor group we can't currently work out whether they had to or just chose to. The issue is looking at the non-VisualEditor edits and distinguishing between people who have JS disabled and who have adblocking (since our analytics endpoints are all on blocklists). […]

It's not clear to me what "had to" means, compared to "chose to". Also, what is the subject of these verbs? to disable WikiEditor JS (on-wiki preference), to disable all JS (browser setting), or to block EventLogging? (ad blocker). And in which bucket should people fall that use a Grade C (stereotypically, not a choice). And people using a modern browser but having spotty connections.

I'm not asking from a concern per se. (That is, whichever classification you choose that product find useful; seems fine to me!) My main concern is that the classification may be labelled or explained in a way that would portray the data and the people they represent incorrectly; which is a recipe for (unintentional) confirmation bias.

I'll partially quote what I said before:

[…] consider that "having JS disabled" and "has not successfully loaded Grade A JS before pressing save" are not the same thing. […]

... and I'll seemingly contradict myself by saying: Most instrumentation methods result in these looking the same. But, you should be fine so long as your classification is well-documented and accounts in which buckets groups of people end up ("disabled WikiEditor", "has Grade C browser", "Grade A fully-enabled but spotty connection thus effectively Grade C", "disabled JS", "blocked EventLogging"), or whether they may end up in either bucket (I imagine some of these will remain ambiguous).

Task description update

Change
Per a conversation with @JKatzWMF in Slack, I've REMOVED the requirement to aggregate edits by country and ADDED the requirement to aggregate edits by platform.

Reason for change
As @MNeisler noted, to aggregate edits by country, we'd need to implement additional instrumentation to be able to associate edits with the locations in which they are made. This is data we had previously been collecting, but stopped after resolving T267343.

Task description update

I've added the ===Implemention section to the task description to reflect what @DLynch + @MNeisler decided on in T280841.

Change 683998 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/WikiEditor@master] Log whether Javascript was supported on saveSuccess

https://gerrit.wikimedia.org/r/683998

Change 683998 merged by jenkins-bot:

[mediawiki/extensions/WikiEditor@master] Log whether Javascript was supported on saveSuccess

https://gerrit.wikimedia.org/r/683998

DLynch added a project: Skipped QA.

QA can't meaningfully check this one, because it's all server-side logging.

QA can't meaningfully check this one, because it's all server-side logging.

Understood. I appreciate you being explicit about the above, @DLynch.

QA will happen in T281409.