See also in general T207171 (it looks like the query given there could be reused here with minor modifications), also for some data quality caveats by @Nuria - personally I wouldn't expect these to be a big issue for this particular use case, but it's worth being aware.
Apr 9 2019
Is this also an issue for the topviews that are shown per language?
Yes, it is an issue with any top list. Now, topviews has a "spam" list so titles that are known to be spammy traffic are removed. Those are reported by users and while list is great to have it just removes the major offenders.
Apr 8 2019
Hi @Wargo, this is the right place for this specific form.
Second, even if you might not use the form, doesn't mean that it's not used frequently.
True, but conversely, it also doesn't mean that it is used frequently enough to justify enlarging it at the expense of other page elements ;)
We're trying to get more data behind the usage of those forms and its elements as part of Advanced Mobile Contributions project.
That's a good idea. Does someone happen to have a link to the corresponding data analysis task?
There is T214935, but measuring the usage of those forms has not been in the scope of that ticket (it would likely require new instrumentation).
And in fact, the results there so far point in a very different direction. E.g. in T214935#4917889 we found that for logged-in users on enwiki, only about 6% of clicks on history pages go to "other action=history views". This includes both submissions of this date/tag search form and all clicks to the "older 50" etc. links located at the top and bottom of the revision list.
I.e. usage of the form that has been enlarged here is probably even lower than 6% in that situation. On the other hand, clicks on diff links made up 43%, and links to old revisions 10% . So we already know that the revision list - which has been pushed further down the page by this change - gets more than half of the usage. (It is probably much more than 50%, because it contains other kinds of links - user pages, user talk pages, contributions - which also occur outside the list, so we can't easily determine whether a click on them came from inside or outside the list).
The need outlined in this task still exists; actually it has since become more urgent, as Audiences teams have been making those metrics choices.
That link looks great overall. There seems to be a one-day discrepancy though between the dates given on the x-axis and in the mouseover. Also, I'm having trouble accessing this view (the chart never materializes, the spinner keeps spinning even after waiting for 5-10 minutes - tried both in Firefox and Chromium). Perhaps a general Turnilo issue?
I think we're done here but want to check with @Tbayer to see if there's anything else we'd like to check in order to resolve this
T215597: QA edit tags for moderation actions is still open; should it be considered a subtask of this?
Apr 6 2019
Two further plausibility checks look fine as well:
- The distribution of special pages viewed seems roughly plausible (even if we perhaps don't anticipate MassMessage or CX to be widely used under the AMC interface).
- Almost all request are logged-in.
@elukey Once this is puppetized, how much would it continue to depend on particular users retaining continued access?
Apr 5 2019
Filed a whitelist patch. Rather than the Print schema I mentioned above, the ReadingDepth schema turned out to be a better example to follow here (note that in this case we don't track session IDs to begin with).
The tag is present in the webrequest table (as "b%2Camc", an unencoded "b,amc" would have been slightly more aesthetically pleasing, but it's not a dealbreaker ;) It occurs on exactly the three projects where we expect it after the recent deployment. Will do some further checks.
Thanks! To me this looks good to go now, except perhaps that the x-axis coordinates seem a bit weird (each day appears twice - "Wed 16 Wed 16'", with the second "Wed 16" actually located at the start of Jan 17).
The schema has three different events (actions), and several fields. For QA, all these need to be checked. I have updated the task description with steps that actually achieve this
I can confirm I have verified all of these.
OK thanks! Feel free to mark the new QA steps as passed if we have indeed verified the fields for all three actions.
contains wprov" is not sufficient, it needs to use the parameter value we picked for this, in the format required by Varnish
The value is also there. This I can confirm
I'm not quite sure what you meant by "check if it shows up in a page view table" - wprov parameters don't show up there, only in webrequest
Webrequest is what I'm referring to. My understanding is that there is no webrequest table for the beta cluster so we won't be able to test this unless the code is enabled in production.
The contributions icon is not recognisable to me to be honest
and inconsistent with desktop which uses a black silhouette.
That black silhouette icon on desktop indicates your own user links (user page, notifications, talk page... ), not contributions of other users.
@CKoerner_WMF thanks for pointing that out.
@Tbayer is it possible for us to find out: for a logged-in user on desktop, from a Userpage are they more likely to navigate to History or to Contributions? @Jdlrobson makes a good point about consistency, so I'm not necessarily suggesting that we should base the design on the data entirely, but would be good to know if possible.
I've QAed and verified events are shown on share clicks.
The schema has three different events (actions), and several fields. For QA, all these need to be checked. I have updated the task description with steps that actually achieve this.
As for click through links, I can verify that the URL shared contains wprov - is that enough to call this done?
"contains wprov" is not sufficient, it needs to use the parameter value we picked for this, in the format required by Varnish. (On the other hand, I'm not quite sure what you meant by "check if it shows up in a page view table" - wprov parameters don't show up there, only in webrequest. In any case we don't need to debug the Varnish / refinery pipeline here, just make sure we format the URL as specified in its documentation.) I have updated the task description in that regard too.
Apr 4 2019
Great - the capsule dimensions look good to me. It doesn't yet seem possible to switch to a daily time series, perhaps that is an artifact of the short test period? (The dataset seems to contain data from both Jan 14 and Jan 15. But splitting by time and selecting 1D granularity results in a chart consisting just of one dot, for Jan 15.)
@Edtadros Sure, thanks for checking! I have edited the task description in that regard, let me know if this is helpful.
I also took the opportunity to rewrite rest somewhat, adding more explanatory links and nuance.
Apr 3 2019
@pmiazga I have added the mandatory schema documentation to the talk page (feel rope in other maintainers, or fill out the project name): https://meta.wikimedia.org/wiki/Schema_talk:MobileWebShareButton
It also needs a whitelisting decision. I suggest to follow the example of Schema:Print here too (code).
Thanks @fdans - we'll also need at least some of the standard fields from the event capsule, as in the case of previous EventLogging ingestions (e.g. the aforementioned T202751, where these had been understood to be included without being listed explicitly in the task description. But I should have done that here for clarity, will do so now).
Apr 2 2019
Apr 1 2019
@elukey Sure, that totally makes sense! The end of January estimate from T178802#4647106 turned out a bit optimistic (see again our internal timeline document which I have been trying to keep up to date as information became available to me), but as of a few weeks ago this now indeed looks completed for the foreseeable future. Please remove the bits. What might the turnaround time be to reinstate them if needed?
Mar 29 2019
☝️ If necessary, we could break out migrating from using the unload event to using the pagehide event into another task.
Mar 26 2019
BTW, it may be worth including references to the Wikistats definitions (https://meta.wikimedia.org/wiki/Research:Wikistats_metrics ) which seem to have several parallels (e.g.
https://meta.wikimedia.org/wiki/Research:Wikistats_metrics/Active_editors ) but perhaps with differences.
- Can lifecycleactiveLength be lifecycleActiveLength?
Sounds good, I renamed it in the task description.
- It's my understanding that this change is anticipated to be supported by Chrome only initially. I think that means we have to maintain the existing implementation but I wasn't quite clear from the meeting discussion.
Yes, that's why the task decription said "in addition to visibleLength" (which will remain based on the Visibility API).
For the record: Information about how to work with the new setup was added to https://wikitech.wikimedia.org/wiki/Analytics/Data_access#MariaDB_replicas .
Mar 25 2019
Mar 24 2019
Just to clarify for casual readers: the dumps are currently accessible. (I used them successfully in this notebook earlier this month, thanks @Chicocvenancio for fixing this earlier!) I understand this ticket is now about implementing this in a more future-proof manner - should the task description be updated?
Mar 23 2019
By looking at some of this data I can see that web crawler events are getting into hive but not into mysql (that would be something for us to fix).
Does this (i.e. that T67508 doesn't yet work for the Hive data) apply to all EL schemas? In that case it would seem a valuable addition to the documentation at https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging#Incompatibilities_with_the_MariaDB_setup .
Mar 22 2019
Results for the third question about the most popular non-mainspace pages for logged-in users, expanding on the enwiki results of T198218#4600385 , are now posted at https://www.mediawiki.org/wiki/Reading/Web/Advanced_mobile_contributions/Special_pages_usage#Top_non-mainspace_pages
Note that as before, this is grouped by (and linking to) page roots, e.g. https://es.wikipedia.org/wiki/Wikipedia:Consultas_de_borrado/Cloud9_(League_of_Legends) counts for https://es.wikipedia.org/wiki/Wikipedia:Consultas_de_borrado . For some pages root, a corresponding page may not exist (e.g. there is https://en.wiktionary.org/wiki/Reconstruction:Proto-Slavic/-ati but no https://en.wiktionary.org/wiki/Reconstruction:Proto-Slavic ).
Mar 21 2019
Results for the second question, expanding the earlier result for the top 50 special pages to 14 non-enwiki projects, are now posted on this wiki page (consisting of 15 x 50 = 750 numbers, this information is a bit unwieldy and would not fit well into a table here on Phabricator).
Note though that there seem to be some general concerns about the data quality of the PrefUpdate schema, regarding duplicate events. I have filed T218835 for this. Right now it seems that this problem could be limited to a small number of other preferences (i.e. not the AMC one we're relying on here), but it seems worth checking this later after a larger number of signups have occurred.
Mar 20 2019
@Neil_P._Quinn_WMF Are you going to take care of posting the entire slide deck on the Audiences page again?
@Edtadros and I worked through testing the
4th5th acceptance criterion together during our 1:1 today.
Here's are the server-side EventLogging events that I captured while opting in and out of AMC mode on http://reading-web-staging.wmflabs.org/wiki/Special:MobileOptions:
To the "and compatible" part of the AC: note well the "clientValidated": true in the events above.
Thanks! This looks good enough for now, although it still seems worthwhile to run a query later - as soon as this is live in production - to verify that these events show up in the PrefUpdate EL table in Hive or MariaDB. (IIRC events from reading-web-staging are not logged there, correct?)
Mar 19 2019
Thanks for documenting the sampling on the talk page! Does "0.5 of all clicks ... based on session id" mean 0.5% of sessions or half of them? The low event rate (roughly 1 event/sec on average) would be quite surprising in the former case, considering that we have about 3500 mobile pageviews per second.
Mar 16 2019
- We announce that the tag will be renamed to wikitech-l, wikimedia-l, and on the technical village pump.
I don't think this is Wikimedia-l material. But a notice in Tech News would be a good idea.
What is the problem statement here? Distinguishing old data from new data or updating how it appears in the interface? (the latter is easy but the former is a big undertaking and I want to check it's worth all that effort - from what I remember, when apps started using the tag, web didn't change their tag before for this reason).
I understand it's the latter, although from the data analysis perspective it seems preferable to also rename it in the tables (it would be a permanent source of confusion if the tag's name there differs from its English name in the interface).
Mar 15 2019
In the meantime, here are some dump-based numbers for two Wikipedias (I'll try to run this for enwiki too), for mainspace pages:
@Anomie this is concerned with eventbus data that is real time, most of the data wanted here is less than couple months old as information wether edits where a revert exists on other datasets for data older than that. So (it seems) that having "rollback" or "undo" tags is actually a good measure of whether a revert has happened. In which case adding tags to this schema should be sufficient:
But I think there are several reasons why people have not been using them for reverts and instead relied on content-based revert detection, e.g. the fact that edits tagged "undo" might not actually be reverts because the user can modify the content before saving.
Another possibility is that the "rollback" and "undo" tags were only added a little over a year ago, so things written before that time wouldn't have been able to use them.
But before that, researchers and analysts were able to instead use the characteristic strings those actions leave in the edit summary (certainly a bit less reliable and convenient, but e.g. the Kittur et al. paper mentioned in https://meta.wikimedia.org/wiki/Research:Revert did something like this in 2007 already).
It also depends on your definition of "revert", whether you count those edited undoes or undoing of a revision older than the most recent while keeping the changes from later revisions.
Sure. There's a good overview in https://meta.wikimedia.org/wiki/Research:Revert , I understand this task is about what is called "identity revert" there because that is what is most practical and already implemented in mwreverts and in mediawiki_history.
Side note: I still support the idea of making this a configurable setting. For Wikipedia it'd be great to get data on what percentage of articles use <h3>, <h4>, etc. — I've reached out to @Tbayer for suggestions on how we might get that data.
There are some old (2015) stats on H5 & H6 usage at Enwiki, at T72004#1407800 which halfak kindly generated for me back then (it was an overnight query on the stats boxes)
Thanks! Do you happen to still have the underlying database query and could post it here so we could re-run it?
In the meantime, here are some dump-based numbers for two Wikipedias (I'll try to run this for enwiki too), for mainspace pages:
Mar 14 2019
Another thing to consider may be edit filters. I'm not even sure whether it is possible to include tags in edit filter rules, but in any case, at least on enwiki and dewiki, no filters seem to rely on the mobile web edit tag (admin access to view results).
PS: just asked Neil in person and he confirmed it wouldn't be a concern.
The only entry I found is wikimedia-research/2018-10-Design-Research-mobile-research-samples github repo.
Another data point: Nobody seems to have been using it on Quarry so far.
Resolved after updating (!pip install --upgrade git+https://github.com/neilpquinn/wmfdata.git). I guess this can be closed in favor of T216634?
Mar 13 2019
Could we get here (from the UI) whether the user clicked the "revert" button, even? (per @Milimetric 's suggestion) and send that to the hook so the event data also has this information? This would not catch the totality of revisions but a big percentage of them, which, hey, it is a start.
We still need to check at some point that this is correctly processed on Varnish and stored in the webrequest table (where we need it to conduct the analysis). But that will have to wait until the change is in production, because the domain en.m.wikipedia.beta.wmflabs.org is not captured in webrequest. Probably best to create a separate task for that, and close this one?
Mar 12 2019
PS: Keep in mind that the above data is, as stated, about usage of templates named "Ambox". Some wikis generate the "Ambox" class name manually instead from differently named template (e.g. itwiki, see template source, example article) and will thus be affected/improved by the new design even they show up with 0% in the table above.
PS: I already added some information about contributors metrics gleaned from the recent Insights presentation deck, but it is incomplete.
@Neil_P._Quinn_WMF I started a page at https://www.mediawiki.org/wiki/Wikimedia_Audiences/Data_dictionary - feel free to add any links there that may already exist.
Started a prototype at https://www.mediawiki.org/wiki/Wikimedia_Audiences/Data_dictionary
Mar 11 2019
Leaving this here since we were talking about it earlier today in this context: