Thu, Sep 21
That sounds reasonable, yes. Just a heads-up that I will need to find some time to do these checks (ensuring we have all the necessary queries adapted from MariaDB and did not accidentally lose data in transition), which likely won't happen before the end of the month. (CCing @ovasileva FYI)
Update: While we're still sorting out the session-based metrics (also now in the light of T175918), here are histograms for per-pageview number of cards viewed, for the three largest wikis in the test"
The requester appears to have been quite active on hewiki for over a year, so I think it's fine to grant him/her access to the restricted task.
Not the same thing, but one may want to be aware of T139810: RFC: Overhaul the CheckUser extension too.
I left a link on that talk page to presentation slides I did about this topic years ago, which may still be of some use.
Wed, Sep 20
Verified that onBeforePrint is now sent under Firefox too. I assume that everything else still works after the update; it seems due diligence has been done for now.
@Deskana Thanks for looking into it! Strangely, it works fine for me too now. But I did reproduce it several times before filing this bug, in Firefox and Chromium under Ubuntu (up-to-date versions in each case), and on two consecutive days. (I should have mentioned that the quoted error message is from Chromium; IIRC I did not see an equivalent in the web console in Firefox, even though VE froze in the same way there.)
Works for me now in the most recent TestFlight version: 5.7.0 (1228)
(This comes from the general apps session metrics data, and the plot goes back to December 2015 - need to fix the x-axis.)
Regarding the first question, I made an initial plot of the median session length with (recent) rollout dates:
(for the record, we decided afterwards that this was secondary to investigating the aspects mentioned in T174396#3598749 ; I'll still be happy to give the Hive side a look later if needed)
Tue, Sep 19
You obviously misunderstood what "that option" referred to: The ability to correct these anomalies by using T141506#2582628 in a custom Hive query (please read the full task description including the preceding sentence). And, as also already mentioned in the task, even the warning that the data is faulty is missing in prominent places, leading users of this data astray.
Mon, Sep 18
Thanks! I was able to download one SWAP notebook successfully already.
Sat, Sep 16
Just a quick note that I have been able to reproduce this in Chrome/Chromium 60 on Ubuntu Linux:
Fri, Sep 15
Wed, Sep 13
This task is still open after more than a year, and continues to affect pageview data analysis. I have filed T175870 to remedy that.
Below is a check whether sessions within the sample are correctly bucketed with 50% probability into either the enabled or disabled condition. These numbers look sound per se. (We expect some slight deviation because of users manually disabling and enabling the feature, which however appears to happen rarely enough - generally in less than 0.01% of sessions, per the second query below.) - However, it's quite odd in combination with the corresponding result for pageviews (T175377#3598231 ).
Mon, Sep 11
Sat, Sep 9
@chelsyx and I talked a bit about this today and she gave me some additional explanations; I will try to check the queries next week.
This became obsolete shortly afterwards per T170018
This became obsolete shortly afterwards per T170018
See https://www.mediawiki.org/wiki/Wikimedia_Apps/Short_descriptions/Research and notebook linked there
Closing this now, as the main requests were all done back in July for the occasion at which they were needed, I haven't had time to tackle the bonus task (global numbers yet) since; it would be more complicated and probably justify a separate task,
Fri, Sep 8
I think we can be pragmatic about this and choose whichever is easier to implement. (It doesn't seem to be a very important product question.) If we we go with the second option and log multiple events during one pageview, we will be able to connect them using pageTitle and namespaceId.
Thanks for the offer! I would have preferred to do the analysis in MySQL/MariaDB as usual (the query times don't seem too bad so far BTW), and moving to Hive will involve extra work for me in rewriting all the previously used queries. But if the problems are severe, I guess that's the best option at this point, also considering that it worked well in that recent example (T172322#3526095 ). Do note though that we will need all the fields. Also, we should still keep the MySQL table (with the existing purging policy) in case we need to fall back to it.
Thu, Sep 7
Thank @bearND, that's good to know! But I was also thinking about desktop, and about edits to templates and the possible delay for their result being reflected in pages where these templates are transcluded. I seem to recall long delays in that situation - up to a week or more in the dark ages 5 or more years ago - but was curious about the situation today.
On desktop though, editors can remedy the problem themselves by doing a manual purge. So I guess another question would be if we could enable such manual purges for the RESTBase endpoints that are at issue in this task.
Wed, Sep 6
According to today's blog post, this project has now concluded. I assume this is the outcome? https://www.mediawiki.org/wiki/Citoid/Creating_Zotero_translators
Thanks for your work on this important topic!
Tried it out on PAWS and it works great, thanks! (This will be especially useful for archiving notebooks on our own sites - Commons - apart from/instead of third-party ones like GitHub.)
Thanks @elukey - right now, while the A/B test is still running, it's not too urgent to be able to check the latest data in real-time (although it would be great to get our hands on the Sep 1 data soon, to be able to assess the effect of a bug fix T172291#3572535).
Tue, Sep 5
Edited the task description to outline the reported issue more concretely.
Hm, my takeaway from @elukey's linked comment had actually been that we expect the experiment to fit in the available space with the planned length and event rate.
Now at 10am on August 31:
This bug still exists in production, and presumably occurs for all non-mainspace pages (e.g. https://en.wikipedia.org/wiki/Wikipedia:Verifiability ).
Mon, Sep 4
Looks like we still have replication problems @Ottomata :(
Sun, Sep 3
Reported again on the German Wikipedia (for the Android app); I verified this in the case of https://de.wikipedia.org/wiki/Wikipedia:Fragen_zur_Wikipedia (village pump page with a complex template on top).
Fri, Sep 1
Thanks! PS, BTW: The above queries were run on analytics-store. The copy on s1-analytics-slave replicated for 21 more minutes:
Heads-up - there seems to be a major issue with the data right now: T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana
(Not tagging this with Readers-Web-Backlog for now because it doesn't not seem to be an issue with the instrumentation at this point, but perhaps it's worth putting it into the "Tracking" column there.)
Thu, Aug 31
I had been looking into this from various angles before Wikimania, including reading through the intricate investigations at T143928 (and the bugs that were uncovered, also regarding the existing per-domain uniques) to understand how we ended up with the final version of the queries, reading through the new documentation (fixing various things there myself and leaving some notes on the talk page), and doing some plausibility checks on the data itself. The monthly numbers for Wikipedia in particular look roughly plausible and consistent with the lower bound estimates we have been using previously (derived from the per-domain data), so we have started quoting them as preliminary data for public purposes. I noticed a bug affecting the data for some sister sites (not Wikipedia), which I just filed as T174640 .
I still plan to do some further consistency checks before closing this task. In particular, check that for all project families, countries and months/days,
Regarding prioritization: While this is a clear bug, it does not affect the (from the Readers team's perspective) most important part of the global uniques data, i.e. the numbers for Wikipedia, and on the traffic side I guess the downsides of including some unnecessary cookies for views to a number of smaller projects can be tolerated for some time.
Wed, Aug 30
@mforns The estimate was for around 100 events/second on average, and the new sampling rate was chosen based on the event rate from the previous instrumentation, where the peak hourly rate on weekdays was usually achieved between 13-16h UTC (and was about 4-6 times higher than the daily low). BTW most of the conversation relevant to this launch is now happening at T172291 instead.
Yes, as @phuedx and @ovasileva note, this instrumentation is not meant to run indefinitely.
As a reminder, the disk space issue has already been discussed extensively at T172322 (@Marostegui, I think you were CCed there at some point but the conversation was mainly handled by other people on the Ops side), which resulted in an assessment that this test can go ahead (T172322#3533459 ; also after the Readers team had put in some extra work to help free up space by dropping another table). It looks like the "Notify DBA and Analytics Engineering when launching" part of the present task was misunderstood a bit above as launching another assessment process essentially duplicating T172322. Rather, the Ops suggestion at T172322#3533459 had been to provide a notification so that disk space use can be monitored after the launch (also by ourselves - that's why the Grafana link is in the task description).
Tue, Aug 29
For the record, it looks like this is now documented at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Queries#Run_long_queries_in_a_screen_session_and_in_the_nice_queue .
Aug 25 2017
PS (after discussing with @JKatzWMF ): That means that it is now fine from everyone's perspective to drop log.MobileWebUIClickTracking_10742159_15423246, assuming that we retain the wmf.mobilewebuiclicktracking_10742159_15423246 version on Hive. (And regarding T172322#3537746: yes, moving it to a separate archive database instead of wmf sounds like a good idea.)
The ID numbers are revision IDs of the relevant schema. I'm not sure what it means where there are two numbers in the name
Aug 24 2017
Thanks @elukey! I have made a note for the web team to do this as part of the experiment rollout (currently envisaged for early next week).
How is this task a duplicate of T168848? (The task description there only says "Strip balanced parentheticals", without identifying specific parenthetical elements.)
Aug 23 2017
Aug 22 2017
CCing @diego and @RobH who (judging from IRC scrollback) grappled quite a bit too with the existing onboarding process the other day. Just in case they have useful input from recent memory on the shortcomings of the current documentation.