Page MenuHomePhabricator

Spike: Does our current pageview data identify whether browsers are non-JS?
Closed, ResolvedPublic

Description

If we capture information about JS compatibility in user agent strings (or other metadata), we can calculate % of edits and pvs coming from users without JS with existing data.

If not, we'll need to consider how we can instrument to collect this data.

Details

Due Date
Oct 25 2019, 7:00 AM

Event Timeline

kzimmerman triaged this task as Medium priority.Oct 7 2019, 11:07 PM
kzimmerman moved this task from Triage to Next Up on the Product-Analytics board.
kzimmerman set Due Date to Oct 25 2019, 7:00 AM.

I found a report, Clients_without_JavaScript, Analytics completed in 2015 that looked at the percent of page requests from browsers without JavaScript support. The MediaWiki page includes useful details about the methodology, preliminary results, and queries used. The reviewed data sample came from webrequest using a combination of the webrequest_source and user_agent fields.

@Nuria - Do you know it is still possible to use this methodology to calculate an estimate of % of requests coming from users without JS using current data?

@MNeisler I chatted with @Nuria about this in our 1:1 today; here are her recommendations:
Use the compatibility table (https://www.mediawiki.org/wiki/Compatibility#General_information) to identify browsers for which we surface JS features vs. ones we do not.
Make sure to filter on agent_type=”user”
Look at browsers that are below grade A (those browsers will not see JS features; even if they technically support JS, that will not be the user experience on our sites)
Compare pageviews from browser grade A vs. pageviews from browsers below-grade-A
Differentiate desktop & mobile

Nuria can help review data once a first pass with queries/tables/graphs is done.

I think this satisfies the spike and is a reasonable way to estimate no-JS use. The major caveat is that we will not get information for grade A browsers where users have turned off JS.

@JKatzWMF thoughts based on your knowledge of the project? (Is there someone else we should consult about this?) If the proposed approach is ok, I'll mark this spike as resolved and figure out when someone on my team can tackle this data dive. What is the current timeline?

The major caveat is that we will not get information for grade A browsers where users have turned off JS.

Correct.

Compare pageviews from browser grade A vs. pageviews from browsers below-grade-A

Note that this can be done in couple ways . Using pageview_hourly data (webrequest is not needed) you can split results of mobile versus desktop per project. Using browser_general table you already have browser numbers that you can tally for the overall user base so you would get an "overall idea" of js usage per monbile/desktop but not per project. See:

hive (wmf)> desc browser_general;
OK
col_name	data_type	comment
access_method       	string              	(desktop|mobile web|mobile app)
os_family           	string              	OS family: Windows, Android, etc.
os_major            	string              	OS major version: 8, 10, etc.
browser_family      	string              	Browser family: Chrome, Safari, etc.
browser_major       	string              	Browser major version: 47, 11, etc.
view_count          	bigint              	Number of pageviews.
year                	int                 	Unpadded year of request
month               	int                 	Unpadded month of request
day                 	int                 	Unpadded day of request

# Partition Information
# col_name            	data_type           	comment

year                	int                 	Unpadded year of request
month               	int                 	Unpadded month of request
day                 	int                 	Unpadded day of request
Time taken: 0.155 seconds, Fetched: 16 row(s)

Also: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/browser/general/browser_general.hql

Please have in mind that pageviews and pageview percentages are quite different from devices which is the most similar measure we have to "users".

@kzimmerman pinging @dr0ptp4kt and @kaldari as my understanding was that unfortunately we do care about people who have turned off JS.
We anticipate there is going to be equal or more concern about/from folks who turn off JS intentionally, so getting the edit % (in particular) from there was primarily to understand that impact.

While <textarea> based editing ought to continue to work without JS, yeah, getting an understanding how much editing happens that way without JS would be helpful as a proxy.

We know that VE (both WYSIWYG and WTE 2017) are by definition JS based edits. What it seems we're probably less clear on is % of users who submit edits using the <textarea> method but still have JS turned on.

I'm wondering would it be possible to understand the approximate theoretical JS support along the same lines as defined on that MW page for edits completed using the <textarea> method?

As an aside, Opera Mini via the full page compression proxy does allow JS execution for consumptive access by running JS at their compression proxy, but it's the client-side JS features it doesn't support in this mode. Also at last check modern Opera Mini and of course other Opera browsers have actual client JS support, but I'm not clear on the various flavors of UAs or other signals that are used when the user is operating with Opera Mini using JS support turned on (instead of the compression proxy).

Also at last check modern Opera Mini and of course other Opera browsers have actual client JS support,

Have in mind that having JS support and mediawiki serving JS to those browsers are different things, mediawiki does not support Opera Mini in its JS enabled list, which means that Opera mini always gets a no-JS runtime: (regardless of support) https://github.com/wikimedia/mediawiki/blob/2aed14b686/resources/src/startup/startup.js#L39

folks who turn off JS intentionally,

If you are interested on users with browsers turned off that are editing using wikitext editor you need to record that information when an edit happens, wikitext editor records "edits" from php and it sends them to EL: https://github.com/wikimedia/mediawiki-extensions-WikiEditor/blob/master/includes/WikiEditorHooks.php#L28

As part of this schema, you can send whether you think JS is enabled. The standard way to do that would be via checking whether a cookie that you have tried to set prior with js does actually exits. If it does not, js is disabled.

Closing the spike as done, but the parent task remains and is an open question.