Here is a quick and dirty query for a related question: The ratio of wikitext edits using the desktop interface made from iOS and Android devices (a good approximation of the "mobile devices" in the sense of this task). It is around 5% on average.
This will need to be compared with the number of mobile web edits (ideally also filtering out those which were made, conversely, from desktop devices, although that might be trickier to determine).
To add more detail regarding the rationale here: As alluded to above and stated in the code, the 64 bits of entropy from generateRandomSessionId() are sufficient to prevent token collisions with ~99% probability in a sample of 500 million, which is enough for the upcoming page issues A/B test which is envisaged to feature up to >100 million pagetokens per project. That said, we also want to keep this future-proof so that users don't even have to bother with such calculations, and it's not totally inconceivable that it could on occasion be used for larger sets too - e.g. the Virtualpageview table currently consists of 7.5 billion link interactions (it doesn't use a token for them and is only an auxiliary schema feeding into the actual aggregate table used for analysis, but this illustrates the possible dimensions). That's a motivation for continuing to use the timestamp as a second source of randomness. However, also adding a second call to generateRandomSessionId() seems overkill indeed.
Mon, Aug 13
Sun, Aug 12
How would it become invalid? The research questions do not include investigation of any novelty effects, i.e. those that result from the user's initial unfamiliarity with the new design. Instead, we are interested in how the new design would change reader behavior over the long run.
To the contrary, we should actually add a run-up time of 1-2 days to the experiment time of two weeks, to reduce any novelty effects, and also to account for caches updating - as we did with the last Popups A/B tests (or tried to T178500#3765787).
This information will impact how we setup and run the A/B test. My recommendation was that before running any official A/B test we might want to validate the data integrity by turning this on on a single wiki separately to the test.
Sat, Aug 11
Can someone now document the difference between mw.eventLog.pageviewToken() and mw.eventlog.newPageInteractionToken() (how is each calculated, and when, and how long does it persist)?
Fri, Aug 10
The Popups schema for page previews has both a linkInteractionToken and a pageToken. The latter is unique for each pageview, the former for each preview.
Instead of maintaining Readme documentation on dumps.wikimedia.org, we should link back to the corresponding documentation pages on Wikitech , which are more reliable and up to date. This is already done on e.g. https://dumps.wikimedia.org/other/pagecounts-raw/ .
Thu, Aug 9
For context, the code in question comes from the Analytics Engineering team's query to calculate the global version of this data, and I can see strong arguments for keeping this new per-country query consistent with that.
So I understand your remarks are about how the Analytics Engineering team could have approached this differently back in 2015 if the tagging infrastructure had been around already.
could be abstracted to a tag so 1) that where clause could be changed to "where tags include ' mobile-pageview' ".
Its usefulness as a general tag would be limited though, considering that it only captures app views where the user has opted in/not opted of data collection, as opposed to the general access_method = 'mobile app'.
This is what we do to , for example, identify wqds requests, we tag them when we refine and subsequent jobs that use that data do not need to do costly regexes. The wdqs tag as an example:
See a similar tag for portal pageviews:
Great! Since I see my name in the task description, I should point out that I haven't yet seen the report myself (not insisting that I need to - just keeping the RACI record straight ;)
Wed, Aug 8
Here is a very rough estimate of the sampling ratio (or bucket sizes) we need in order to answer the research questions.
This is a bit late in the game, but I want to flag that with this new schema we have an opportunity to consider whether we want to ask Analytics Engineering engineering to ingest its data into Druid, in order to potentially make it accessible as a view or dashboard in Superset. As a first step this would require checking whether the schema's format satisfies these (draft) guidelines.
Tue, Aug 7
Following the discussion above and on Slack, I have put some notes about terminology at https://www.mediawiki.org/wiki/Reading/Web/Quantitative_Testing#Sampling_and_bucketing . Please review.
Mon, Aug 6
Thanks, very helpful! BTW it would be good to also include this definition (or at least a pointer to it) at https://www.mediawiki.org/wiki/Reading/Web/Projects/Mobile_Page_Issues#Proposed_changes , so that people can determine which pages on which projects are going to be affected.
So from your above examples, the French templates https://fr.wikipedia.org/wiki/Modèle:Méta_bandeau_d%27avertissement would be excluded, however the Spanish templates would actually be included https://es.wikipedia.org/wiki/Arquitectura
But I didn't see table.ambox in the HTML of https://es.wikipedia.org/wiki/Arquitectura ? (only e.g. ambox-text)
We are also parsing the "severity level" of the templates in a few languages: Italian, Spanish, Russian. See here for details.
I don't think it's been explicitly stated, but I think we would want to run this test only for the article name-space right?
Yes, considering that per your remark above the design will only change in article namespace too.
We still need to document more precisely what we are actually counting as "page issues" with this instrumentation, especially since (per recent conversations, see e.g. T200792#4472739 ) there is now a more pronounced desire to measure things beyond the English Wikipedia. Currently this is only loosely described in the AC:
pages which have page issues on them e.g. ambox templates
Fri, Aug 3
Closing per @dchen
(likewise moved here from the task description - it seem that these are thoughts about the interpretation of the resulting data and suggestions what to take into account during its analysis, which is always valuable but seems offtopic for this task per se:)
== Short term vs long term impact
Note we should be cautious in time we run such as experiment new headings may arose curiosity. It is possible with a new heading, readers are more likely to click it to find out what kind of information they can find inside.
Such novelty effects may be possible, but they don't prevent us from running user interface A/B tests either, and because the ratio of repeat readers to the same page is likely rather low, I would expect them to be even less of a problem here.
It might be that rather than the heading, the content or the delivery of that content is a problem and over time the section headings themselves become associated with that content and are less preferred on mobile. The references section for example may be rarely used, not because of the title, but due to the fact that most mobile users know that clicking on an inline reference will show the associated reference.
Good point (I dwelled on it in my Wikimania presentation too), but I fail to see what it has to do with the implementation of the present task.
I would thus not recommend doing this A/B test for sections such as "External links", "References" but more for sections where technical words are used, where different language may lead to more accessible content.
(Moving a few things from the task description here into the comments as they seems more discussion contributions than something we all are ready to commit to as part of this task:)
What does "the team" refer to? ;)
thought that this ticket could be resolved, and a new one opened for the followup work. Does that seem reasonable to you?
Well, as noted above in T184227#3936244 , the original plan was a different one. But considering that half a year later , neither the URL format requested above on January 16 (necessary to extend the result beyond enwiki) nor the request to repeat the analysis following further SVG optimization work has materialized, I think it's reasonable to close this task now as done, with the option to open a new one once either of these two happens. Especially since within the Product Analytics team'sPhab Review processes as they are currently set up, the presence of such open tickets seems to cause significant distraction and several staff (including you and me right now) repeatedly spending time just for task management purposes.
Count template transclusions
Identify all templates that can render ambox class (Special:Search can help here)
For each template, check corresponding template count https://tools.wmflabs.org/templatecount/index.php?lang=en&namespace=10&name=Ambox#bottom
Note: this approach would lead to duplicates where more than one template is used in the same page.
Actually I think that this tool may count nested transclusions too - it appears that it simply executes the following query:
As discussed at T200792#4475856, there seems to be some confusion here between two related but separate questions:
- the ratio of *pages* with issues among all pages
- the ratio of *pageviews* to pages with issues, among all pageviews
For example, suppose a wiki has two pages, one with issues and one without. The first page gets 8 views, and the second page gets 2 views. Then the answer to question 1 would be "50%", the answer to the second question would be "80%".
Yes (albeit after using "sampling rates" earlier for page previews too), see e.g. https://meta.wikimedia.org/wiki/Schema_talk:Popups . But the "bucket size" parlance used there (e.g. "0.04:0.04:0.92") does not match the usage in this task ("Bucketing will be 50%").
How is that a problem?
It might be possible that pages with issues are less read, so I would not rely on anything page view based to count this.
The question is actually about pageviews, so it seems kind of odd to "not rely on anything page view based to count this".
I'd thus advise against using EventLogging for this, as there are better ways to do this. Have setup some ideas here - T201123 - but I strongly advise we avoid doing this.
I assume that this advice is based on a misunderstanding, see above, and that T201123 is instead about the related but quite different question about the ratio of pages with issues, instead of the ratio of pageviews to pages with issues that is the subject of the research question in this task.
Thu, Aug 2
I guess that's mainly a product question (e.g. do we intend to put work into improving maintenance templates on sister projects, considering that their readership is orders of magnitude below that of Wikipedia - if we expect to have the resources, great, but if not, including them here is more a nice to have).
I don't quite recall that outcome. Does that refer to the size of the sample (how many sessions will be included in the experiment to send data, either as part of the test group - shown the new design - or the control group - old design, which is also what @phuedx refers to above)? Or to the bucketing into test vs. control within that sample, which per usual practice for A/B tests should be 50:50? (I think we have been through this kind of terminology confusion before...)
Just want to flag that the following is a new research question that we added per the discussion in the meeting on Tuesday:
What is the approximate percentage of (mobile) pageviews to pages with issues on (select languages)?
(the rest of the questions com from T191532, dating back to April)
Is this notice still needed?
Wed, Aug 1
Some context: On the web and (since recently, with T192779 and this patch) in the Android app, we do actually send the full url of the referrer as part of the request for the actual article being opened. For the web, that's of course simply part of the HTTP standard. It is then stored temporarily as part of the webrequest table (in the referer field), and processed further to e.g. generate the referer_class field in the pageview_hourly table, or the external referrals data exposed in this dashboard or the public Clickstream datasets.
Tue, Jul 31
Cool - I think it will be rather easy to scale this after the first article, but the instrumentation and infrastructure would need to be in place already.
App searches seem incorrectly high as compared to desktop or mobile web. In your first screenshot, they look like 2-4x more searches from the app, but the web's traffic overall looks like about 100x that of the app, so I'm finding this hard to believe. Can you clarify what's going on here?
Agree that it would be good to know the answer to this question. (Also, are the data source and queries used for this analysis documented somewhere? Does it have to do with the dataset mentioned at https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Cirrus ?)
Actually T116515 had been about images in the first place (from Commons) - it somehow morphed into a task about videos. I have (re-)filed T199407: Enable embedding of images from Wikimedia Commons as a subtask of this one.
Wed, Jul 25
What, if any, impact did this have on browsing behavior? Similar question to T191354
That task was about the impact of the video campaign on search, not the impact of the Hindi main page changes. For the latter, there is T191132.
Sat, Jul 21
Fri, Jul 20
@Tbayer a question has been raised about the values for editCount being a string like "1-4 edits" instead of numbers like 0, 5, 100 etc. Would it make a difference if these values were numbers, in terms of ease of parsing or analyzing?
I now understand that this task has been worked on and a report may already exist, but since there had been no activity here since April, I had ran a couple of queries myself earlier this month to be safe, seeing that the data was going to expire. I'm leaving some raw results below, for the record and just in case. Note that @chelsyx had already posted results about pageviews as part of the separate "understanding search" task: T191354#4176867
Wed, Jul 18
Jul 13 2018
Jul 12 2018
This was a successful internship where @Zareenf did very valuable work in various areas. We should long ago have updated this ticket with details about this work, listing the various tasks that got done - but we haven't gotten around to that in a while, so I'm closing this ticket for now to reflect the conclusion of the internship.
We got a lot of good information here (thanks again, also for sharing SWAP notebook!)
But the three questions spelled out in the task description are still not yet marked as resolved. If I have overlooked the answers, please feel free to point that out and tick the corresponding checkboxes (preferably also linking the answers in the task description so folks can find them easily).
BTW, the example video above in T116515#3309596 doesn't work for me right now, in either Chromium ("Requests to the server have been blocked by an extension." - even when I have no browser extension enabled) or Firefox.
This task was originally mainly about images, but then morphed into a task about videos and was closed as such. I have split the image part off into T199407.
BTW, I started a documentation page about this dataset at https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Virtualpageview_hourly .
A quick (totally non-exhaustive) check of the current data:
A schema named MobileWebSettings doesn't seem to exist, did you mean something else?
Unless you or @Tbayer have used this schema in the last 6 months, I propose we remove this code from mobile.
Thanks for the ping! I don't recall having used it so far. That said, an EL schema doesn't have to be used regularly to be valuable. It's often very useful to be able to resort to an existing schema for a new data question instead of having to build a new instrumentation. How likely is it that the instrumentation is broken at this point?
Jul 11 2018
PS regarding the task description changes in T191532#4395096 about the logging of issue type and severity level:
These were based on the standup conversations earlier that day, where there was a sense that issue type would be hard to get, but that the severity level is already available in more accessible form.
Jul 10 2018
This happened again about an hour ago:
Jul 9 2018
To clarify, I assume that this is separate from the general HTTP referrer header that is already recorded in the referer field in the webrequest data.
Jul 6 2018
Thanks. I'm now trying out excluding all IE traffic from these countries (Iran, Pakistan, Afghanistan).
Excluding just IE11 would not seem sufficient, considering that (as you already indicated above in T193578#4242326 ) the traffic formerly classified as IE7 now falls into several different versions (e.g. besides IE11 also a substantial number for IE8, etc.):
Would this affect https://discovery-dev.wmflabs.org/external/ too?
After some email discussion about possible metrics options, we settled on the number of mobile pageviews (i.e. views of Wikimedia sites using either our mobile web interface or our mobile apps) by country for 2016 and 2017, which I sent over last month as CSV files - attached here too for the record. These use ISO two-letter country codes, based on the Maxmind geoIP database. "--" means unknown.