Tbayer (Tilman Bayer)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 11:21 PM (152 w, 6 d)
Availability
Available
IRC Nick
HaeB
LDAP User
Unknown
MediaWiki User
Tbayer (WMF)

Recent Activity

Thu, Sep 21

Tbayer added a comment to T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana.

@Tbayer: we can put all data in hadoop and remove it entirely from MySQL, does that sound good?. I should be able to refine all data in hadoop in one go and put it on my db for you to take a look, once you let me know is good we can drop table in MySQL and move the popUps table to archive database. Let us know if this sounds good.

That sounds reasonable, yes. Just a heads-up that I will need to find some time to do these checks (ensuring we have all the necessary queries adapted from MariaDB and did not accidentally lose data in transition), which likely won't happen before the end of the month. (CCing @ovasileva FYI)

Thu, Sep 21, 8:29 PM · Patch-For-Review, Analytics-Kanban, Readers-Web-Backlog (Tracking), Analytics-EventLogging
Tbayer added a comment to T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana.
Thu, Sep 21, 6:59 PM · Patch-For-Review, Analytics-Kanban, Readers-Web-Backlog (Tracking), Analytics-EventLogging
Tbayer added a comment to T158172: Analyze results of stage 0 page previews test.

Update: While we're still sorting out the session-based metrics (also now in the light of T175918), here are histograms for per-pageview number of cards viewed, for the three largest wikis in the test"


Thu, Sep 21, 5:31 PM · Page-Previews, Readers-Web-Backlog, Reading-analysis
Tbayer closed T169594: Update Product page and Key Product Metrics with July 2017 Reading data as Resolved.
Thu, Sep 21, 5:15 PM · Reading-analysis
Tbayer added a comment to T170231: Upload spam.

The requester appears to have been quite active on hewiki for over a year, so I think it's fine to grant him/her access to the restricted task.

Thu, Sep 21, 10:01 AM
Tbayer added a comment to T171635: Prototype new models to facilitate sockpuppet detection.

Not the same thing, but one may want to be aware of T139810: RFC: Overhaul the CheckUser extension too.

Thu, Sep 21, 1:56 AM · Anti-Harassment, Research-2017-18-Q2, artificial-intelligence, Research
Tbayer added a comment to T171635: Prototype new models to facilitate sockpuppet detection.

Related work:
https://meta.wikimedia.org/wiki/Research:Newsletter/2013/June#Sockpuppet_evidence_from_automated_writing_style_analysis
https://meta.wikimedia.org/wiki/Research:Newsletter/2013/November#New_sockpuppet_corpus

Thu, Sep 21, 1:31 AM · Anti-Harassment, Research-2017-18-Q2, artificial-intelligence, Research
Tbayer added a comment to T175103: Create a sketch of how sockpuppets are detected now.

I left a link on that talk page to presentation slides I did about this topic years ago, which may still be of some use.

Thu, Sep 21, 1:20 AM · Research-2017-18-Q2, Research

Wed, Sep 20

Tbayer added a comment to T176341: Deploy print styles instrumentation.

@Tbayer - do we still want to do 10%?

Wed, Sep 20, 11:38 PM · Patch-For-Review, Proton, Readers-Web-Kanban-Board, Readers-Web-Backlog
Tbayer closed T169730: Define and implement instrumentation for printing on desktop web as Resolved.

Verified that onBeforePrint is now sent under Firefox too. I assume that everything else still works after the update; it seems due diligence has been done for now.

Wed, Sep 20, 11:30 PM · Reading-analysis, MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), Proton, Readers-Web-Kanban-Board, Readers-Web-Backlog
Tbayer closed T169730: Define and implement instrumentation for printing on desktop web, a subtask of T154965: [EPIC] Print styles - desktop, as Resolved.
Wed, Sep 20, 11:30 PM · Readers-Web-Backlog, Epic
Tbayer closed T169730: Define and implement instrumentation for printing on desktop web, a subtask of T169731: Set up print styles a/b test on all projects, as Resolved.
Wed, Sep 20, 11:30 PM · Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer closed T169730: Define and implement instrumentation for printing on desktop web, a subtask of T170021: Analyze the impact of new print styles , as Resolved.
Wed, Sep 20, 11:30 PM · Readers-Web-Backlog
Tbayer added a comment to T176246: VE freezes when trying to insert a reference with "Cannot read property 'getDataFromNode' of null".

@Deskana Thanks for looking into it! Strangely, it works fine for me too now. But I did reproduce it several times before filing this bug, in Firefox and Chromium under Ubuntu (up-to-date versions in each case), and on two consecutive days. (I should have mentioned that the quoted error message is from Chromium; IIRC I did not see an equivalent in the web console in Firefox, even though VE froze in the same way there.)

Wed, Sep 20, 10:31 PM · VisualEditor
Tbayer added a comment to T176198: Tapping on section heading in TOC does not move to the section.

Works for me now in the most recent TestFlight version: 5.7.0 (1228)

Wed, Sep 20, 7:10 PM · iOS-app-v5.7.0-Corgi-On-A-Surfboard, iOS-app-feature-TOC, Wikipedia-iOS-App-Backlog
Tbayer added a comment to T175195: Analyze impact of appearance controls on usage.

(This comes from the general apps session metrics data, and the plot goes back to December 2015 - need to fix the x-axis.)

Wed, Sep 20, 6:03 PM · Reading-analysis, Wikipedia-iOS-App-Backlog
Tbayer added a comment to T175195: Analyze impact of appearance controls on usage.

Regarding the first question, I made an initial plot of the median session length with (recent) rollout dates:

Wed, Sep 20, 6:01 PM · Reading-analysis, Wikipedia-iOS-App-Backlog
Tbayer added a comment to T169730: Define and implement instrumentation for printing on desktop web.

Thanks @phuedx - since @bmansurov and I already had a call about this earlier (with testing on his local installation), I did some checks on reading-web-staging my own already.

Wed, Sep 20, 2:15 PM · Reading-analysis, MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), Proton, Readers-Web-Kanban-Board, Readers-Web-Backlog
Tbayer moved T169730: Define and implement instrumentation for printing on desktop web from Triage to In progress on the Reading-analysis board.
Wed, Sep 20, 1:31 PM · Reading-analysis, MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), Proton, Readers-Web-Kanban-Board, Readers-Web-Backlog
Tbayer added a project to T169730: Define and implement instrumentation for printing on desktop web: Reading-analysis.
Wed, Sep 20, 1:31 PM · Reading-analysis, MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), Proton, Readers-Web-Kanban-Board, Readers-Web-Backlog
Tbayer added a comment to T174396: Investigate mobile search dashboard data .

@chelsyx and I talked a bit about this today and she gave me some additional explanations; I will try to check the queries next week.

(for the record, we decided afterwards that this was secondary to investigating the aspects mentioned in T174396#3598749 ; I'll still be happy to give the Hive side a look later if needed)

Wed, Sep 20, 12:35 PM · Reading-analysis, Discovery-Analysis (Current work)

Tue, Sep 19

Tbayer added a comment to T175870: Correct pageview_hourly and derived data for T141506.

We already discussed this issue on this ticket: https://phabricator.wikimedia.org/T141506#2575088 and I second @BBlack 's
opinion. In a gist: i do not think this traffic should be removed, it is real (if unintentional), we count real requests coming to our servers and these are very real requests.

Tue, Sep 19, 11:22 PM · Analytics
Tbayer added a comment to T175870: Correct pageview_hourly and derived data for T141506.

And community members and the public do not have that option and are thus left with the faulty data.

This is most certainly not true, any dataset comes with caveats and issues and this is just one of several that you should be aware of: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly#Changes_and_known_problems_since_2015-06-16

You obviously misunderstood what "that option" referred to: The ability to correct these anomalies by using T141506#2582628 in a custom Hive query (please read the full task description including the preceding sentence). And, as also already mentioned in the task, even the warning that the data is faulty is missing in prominent places, leading users of this data astray.

Tue, Sep 19, 10:40 PM · Analytics
Tbayer added a comment to T135762: A/B Testing solid framework .

You can use eventlogging and wikimediaevents code at this time , there are quite
a bit of examples of how to run ab tests on discovery's code.

My concern is mainly with the bucketing mechanism for which no standard (but many self cooked solutions) exists. This is what I would like to see standardized, since it seems to be programmed again and again.

Tue, Sep 19, 8:29 PM · Operations, Traffic, Analytics
Tbayer created T176246: VE freezes when trying to insert a reference with "Cannot read property 'getDataFromNode' of null".
Tue, Sep 19, 6:54 PM · VisualEditor
Tbayer created T176198: Tapping on section heading in TOC does not move to the section.
Tue, Sep 19, 5:16 AM · iOS-app-v5.7.0-Corgi-On-A-Surfboard, iOS-app-feature-TOC, Wikipedia-iOS-App-Backlog

Mon, Sep 18

Tbayer added a comment to T176068: Stop page previews A/B test on enwiki and dewiki.

@Tbayer, @phuedx - are we sure we're ready for this given what we've seen in T175918: EventLogging subscriber module in ready state but not sending tracked events - or is Firefox safe from that bug? (in which case, let's go ahead as planned)

Mon, Sep 18, 10:31 PM · Readers-Web-Kanban-Board, Patch-For-Review, Page-Previews, Readers-Web-Backlog
Tbayer updated subscribers of T159617: Enable downloading notebooks as PDF.

Thanks! I was able to download one SWAP notebook successfully already.

Mon, Sep 18, 9:28 PM · Patch-For-Review, PAWS

Sat, Sep 16

Tbayer created T176068: Stop page previews A/B test on enwiki and dewiki.
Sat, Sep 16, 11:48 PM · Readers-Web-Kanban-Board, Patch-For-Review, Page-Previews, Readers-Web-Backlog
Tbayer added a comment to T175918: EventLogging subscriber module in ready state but not sending tracked events.

Just a quick note that I have been able to reproduce this in Chrome/Chromium 60 on Ubuntu Linux:

Sat, Sep 16, 11:42 PM · MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), Performance-Team (Radar), Analytics-Kanban, Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Patch-For-Review, Analytics-EventLogging, Readers-Web-Backlog, Page-Previews
Tbayer created T176028: Data analysis support for legal team.
Sat, Sep 16, 12:14 AM · Reading-analysis

Fri, Sep 15

Tbayer created T176023: Implement IE7 correction for long-term trend charts.
Fri, Sep 15, 10:48 PM · Reading-analysis
Tbayer moved T175377: [Spike 4hrs] Verify EventLogging instrumentation/bucketing for the enwiki/dewiki A/B test from Triage to In progress on the Reading-analysis board.
Fri, Sep 15, 10:34 PM · Spike, Readers-Web-Kanban-Board, Reading-analysis, Readers-Web-Backlog, Page-Previews

Wed, Sep 13

Tbayer updated the task description for T175870: Correct pageview_hourly and derived data for T141506.
Wed, Sep 13, 10:23 PM · Analytics
Tbayer added a comment to T141506: Suddenly outrageous higher pageviews for main pages.

This task is still open after more than a year, and continues to affect pageview data analysis. I have filed T175870 to remedy that.

Wed, Sep 13, 9:53 PM · Reading-analysis, Analytics, Pageviews-API
Tbayer created T175870: Correct pageview_hourly and derived data for T141506.
Wed, Sep 13, 9:52 PM · Analytics
Tbayer updated the task description for T175377: [Spike 4hrs] Verify EventLogging instrumentation/bucketing for the enwiki/dewiki A/B test.
Wed, Sep 13, 8:50 PM · Spike, Readers-Web-Kanban-Board, Reading-analysis, Readers-Web-Backlog, Page-Previews
Tbayer added a comment to T175377: [Spike 4hrs] Verify EventLogging instrumentation/bucketing for the enwiki/dewiki A/B test.

Below is a check whether sessions within the sample are correctly bucketed with 50% probability into either the enabled or disabled condition. These numbers look sound per se. (We expect some slight deviation because of users manually disabling and enabling the feature, which however appears to happen rarely enough - generally in less than 0.01% of sessions, per the second query below.) - However, it's quite odd in combination with the corresponding result for pageviews (T175377#3598231 ).

Wed, Sep 13, 8:45 PM · Spike, Readers-Web-Kanban-Board, Reading-analysis, Readers-Web-Backlog, Page-Previews

Mon, Sep 11

Tbayer updated subscribers of T175377: [Spike 4hrs] Verify EventLogging instrumentation/bucketing for the enwiki/dewiki A/B test.

Thanks @phuedx - BTW, on the request of @Jdlrobson, I had already done a very rough check for this right after the launch on August 28 ("too early to tell, but so far not totally out of whack").

Mon, Sep 11, 10:08 PM · Spike, Readers-Web-Kanban-Board, Reading-analysis, Readers-Web-Backlog, Page-Previews

Sat, Sep 9

Tbayer added a comment to T174396: Investigate mobile search dashboard data .

@chelsyx and I talked a bit about this today and she gave me some additional explanations; I will try to check the queries next week.

Sat, Sep 9, 5:59 AM · Reading-analysis, Discovery-Analysis (Current work)
Tbayer added a project to T174396: Investigate mobile search dashboard data : Reading-analysis.
Sat, Sep 9, 5:57 AM · Reading-analysis, Discovery-Analysis (Current work)
Tbayer added a project to T175377: [Spike 4hrs] Verify EventLogging instrumentation/bucketing for the enwiki/dewiki A/B test: Reading-analysis.
Sat, Sep 9, 2:30 AM · Spike, Readers-Web-Kanban-Board, Reading-analysis, Readers-Web-Backlog, Page-Previews
Tbayer closed T169581: Histogram for the number of duplicated events per browser session as Declined.

This became obsolete shortly afterwards per T170018

Sat, Sep 9, 2:29 AM · Reading-analysis
Tbayer closed T169581: Histogram for the number of duplicated events per browser session, a subtask of T167391: [Spike] Isolate the source of Popups event duplication, as Declined.
Sat, Sep 9, 2:29 AM · Reading-analysis, Readers-Web-Kanban-Board, Spike, Readers-Web-Backlog, Page-Previews
Tbayer closed T169582: Examine several sessions which contained duplicate events as Declined.

This became obsolete shortly afterwards per T170018

Sat, Sep 9, 2:29 AM · Reading-analysis
Tbayer closed T169582: Examine several sessions which contained duplicate events, a subtask of T167391: [Spike] Isolate the source of Popups event duplication, as Declined.
Sat, Sep 9, 2:29 AM · Reading-analysis, Readers-Web-Kanban-Board, Spike, Readers-Web-Backlog, Page-Previews
Tbayer created T175441: Update Audiences page and Key Product Metrics with October 2017 Readers data.
Sat, Sep 9, 2:26 AM · Reading-analysis
Tbayer moved T175440: Update Audiences page and Key Product Metrics with September 2017 Readers data from Triage to Blocked on the Reading-analysis board.
Sat, Sep 9, 2:25 AM · Reading-analysis
Tbayer created T175440: Update Audiences page and Key Product Metrics with September 2017 Readers data.
Sat, Sep 9, 2:25 AM · Reading-analysis
Tbayer moved T169595: Update Product page and Key Product Metrics with August 2017 Reading data from Blocked to Next Up on the Reading-analysis board.
Sat, Sep 9, 2:23 AM · Reading-analysis
Tbayer closed T170217: Publish results of impact of wikidata description rollout as Resolved.

Added to https://www.mediawiki.org/wiki/Wikimedia_Apps/Short_descriptions/Research

Sat, Sep 9, 2:21 AM · Reading-analysis
Tbayer closed T169596: Prepare Wikimania presentation on global readership metrics as Resolved.

Slides: https://commons.wikimedia.org/wiki/File:Readership_metrics._Trends_and_stories_from_our_global_traffic_data_(Wikimania_2017_presentation).pdf

Sat, Sep 9, 2:21 AM · Reading-analysis
Tbayer closed T170218: Editor distribution analysis for wikidata desc editing as Resolved.

See https://www.mediawiki.org/wiki/Wikimedia_Apps/Short_descriptions/Research and notebook linked there

Sat, Sep 9, 2:18 AM · Reading-analysis
Tbayer moved T174976: Analyze Android feed activity from Triage to In progress on the Reading-analysis board.
Sat, Sep 9, 2:17 AM · Android-app-feature-Feeds, Wikipedia-Android-App-Backlog, Reading-analysis
Tbayer closed T169378: Estimate the daily number of mobile devices who already visited on the preceding day as Resolved.

Closing this now, as the main requests were all done back in July for the occasion at which they were needed, I haven't had time to tackle the bonus task (global numbers yet) since; it would be more complicated and probably justify a separate task,

Sat, Sep 9, 2:16 AM · Reading-analysis

Fri, Sep 8

Tbayer added a comment to T174993: Vandalism in "In the news" articles persisting in the app' ?.

(e.g. you make an edit and want to show people the result).

@Tbayer If you are talking about making an edit to a page and then seeing the results being reflected in the summary and mobile-sections endpoints, this should be near-instant (within a few seconds) since ChangeProp knows how to actively update the MCS contents in RB for that particular page.

(This was probably already explained in this task, but: the issue that Nthep saw where vandalism seemed to be persisting hours after it was fixed, could the same thing happen to an edit to a Wikidata description? Or the type of content, position etc. makes the difference here? Thanks.)

Fri, Sep 8, 8:51 PM · Reading-Infrastructure-Team-Backlog, Services (watching), Mobile, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, iOS-app-Bugs, Android-app-Bugs
Tbayer added a comment to T169730: Define and implement instrumentation for printing on desktop web.
  • Implement the schema's purging strategy by submitting a patch to the whitelist or filing a task with Analytics Engineering.

@Tbayer what should be the strategy? The default is 90 days, and no change is needed if we want to keep it.

Fri, Sep 8, 5:20 PM · Reading-analysis, MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), Proton, Readers-Web-Kanban-Board, Readers-Web-Backlog
Tbayer updated the task description for T169730: Define and implement instrumentation for printing on desktop web.
Fri, Sep 8, 1:42 AM · Reading-analysis, MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), Proton, Readers-Web-Kanban-Board, Readers-Web-Backlog
Tbayer added a comment to T169730: Define and implement instrumentation for printing on desktop web.

@Tbayer could you take a look at the schema: https://meta.wikimedia.org/wiki/Schema:Print

Please feel free to update as needed. Thanks!

Fri, Sep 8, 1:40 AM · Reading-analysis, MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), Proton, Readers-Web-Kanban-Board, Readers-Web-Backlog
Tbayer added a comment to T169730: Define and implement instrumentation for printing on desktop web.

@Tbayer, since some browsers trigger onbeforeprint for each time a preview is rendered, do you think it would be wise to send this event only once per page view? So no matter how many times a user prints a page, this event will be sent only once. Or do we want to capture all such events for a given page when the user prints the page multiple times without reloading?

I think we can be pragmatic about this and choose whichever is easier to implement. (It doesn't seem to be a very important product question.) If we we go with the second option and log multiple events during one pageview, we will be able to connect them using pageTitle and namespaceId.

Fri, Sep 8, 1:22 AM · Reading-analysis, MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), Proton, Readers-Web-Kanban-Board, Readers-Web-Backlog
Tbayer updated the task description for T169730: Define and implement instrumentation for printing on desktop web.
Fri, Sep 8, 1:05 AM · Reading-analysis, MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), Proton, Readers-Web-Kanban-Board, Readers-Web-Backlog
Tbayer added a comment to T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana.

@Tbayer We talked about this and the mysql issues are many, what we can do is to make this data available on hadoop in a table form, similar (but not identical) to the couple tables we migrated recently. The gist of it is that mysql issues are many, among them the hardware and as we have mentioned before for big amounts of data MySQL just does not work.

Thanks for the offer! I would have preferred to do the analysis in MySQL/MariaDB as usual (the query times don't seem too bad so far BTW), and moving to Hive will involve extra work for me in rewriting all the previously used queries. But if the problems are severe, I guess that's the best option at this point, also considering that it worked well in that recent example (T172322#3526095 ). Do note though that we will need all the fields. Also, we should still keep the MySQL table (with the existing purging policy) in case we need to fall back to it.

Fri, Sep 8, 12:47 AM · Patch-For-Review, Analytics-Kanban, Readers-Web-Backlog (Tracking), Analytics-EventLogging
Tbayer added a comment to T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana.

@Tbayer : also please take a look at rate of intake of events, it is quite a bit higher than 100 sec on average, more like 150 per sec: https://grafana.wikimedia.org/dashboard/db/eventlogging-schema?refresh=5m&orgId=1&var-schema=Popups&from=now-30d&to=now

Fri, Sep 8, 12:05 AM · Patch-For-Review, Analytics-Kanban, Readers-Web-Backlog (Tracking), Analytics-EventLogging

Thu, Sep 7

Tbayer added a comment to T174993: Vandalism in "In the news" articles persisting in the app' ?.

Thank @bearND, that's good to know! But I was also thinking about desktop, and about edits to templates and the possible delay for their result being reflected in pages where these templates are transcluded. I seem to recall long delays in that situation - up to a week or more in the dark ages 5 or more years ago - but was curious about the situation today.
On desktop though, editors can remedy the problem themselves by doing a manual purge. So I guess another question would be if we could enable such manual purges for the RESTBase endpoints that are at issue in this task.

Thu, Sep 7, 11:37 PM · Reading-Infrastructure-Team-Backlog, Services (watching), Mobile, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, iOS-app-Bugs, Android-app-Bugs
Tbayer added a comment to T174993: Vandalism in "In the news" articles persisting in the app' ?.

My 2 cents as an Android app user: there is no difference to me if the time frame is 10 mins or 1h since the typical workflow is (i)) spot something is wrong; (ii) refresh 2, 3 times (takes way less than a second); and then either complain or ignore. As somebody that on the back-end side of this story, I opt for ignoring it (if I'm not in the capacity to purge it from Varnish right away), but I can relate to people that complain about it. I think that informing users about this edge case would go a long way. Posting something somewhere where people complain most often would greatly help, given the fact that solving this problem properly is not a small endeavour in technical terms.

Thu, Sep 7, 8:53 PM · Reading-Infrastructure-Team-Backlog, Services (watching), Mobile, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, iOS-app-Bugs, Android-app-Bugs

Wed, Sep 6

Tbayer added a comment to T115158: Write a Zotero translator and document process for creating new Zotero translator and getting it live in production.

According to today's blog post, this project has now concluded. I assume this is the outcome? https://www.mediawiki.org/wiki/Citoid/Creating_Zotero_translators
Thanks for your work on this important topic!

Wed, Sep 6, 9:06 PM · Possible-Tech-Projects, Outreach-Programs-Projects, Outreachy (Round-14), Documentation, VisualEditor, Citoid
Tbayer added a comment to T159617: Enable downloading notebooks as PDF.

Tried it out on PAWS and it works great, thanks! (This will be especially useful for archiving notebooks on our own sites - Commons - apart from/instead of third-party ones like GitHub.)

Wed, Sep 6, 5:48 PM · Patch-For-Review, PAWS
Tbayer added a comment to T92457: PageImages should blacklist webm files.

I disagree. We should measure how many pages are impacted by this issue. My guess is low. I'd argue an article with only a video in the lead would probably benefit from an image too.

Wed, Sep 6, 5:28 PM · Readers-Web-Backlog, Page-Previews, PageImages
Tbayer added a comment to T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana.

Thanks @elukey - right now, while the A/B test is still running, it's not too urgent to be able to check the latest data in real-time (although it would be great to get our hands on the Sep 1 data soon, to be able to assess the effect of a bug fix T172291#3572535).

Wed, Sep 6, 1:44 PM · Patch-For-Review, Analytics-Kanban, Readers-Web-Backlog (Tracking), Analytics-EventLogging

Tue, Sep 5

Tbayer added a comment to T174993: Vandalism in "In the news" articles persisting in the app' ?.

Edited the task description to outline the reported issue more concretely.

Tue, Sep 5, 11:14 PM · Reading-Infrastructure-Team-Backlog, Services (watching), Mobile, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, iOS-app-Bugs, Android-app-Bugs
Tbayer renamed T174993: Vandalism in "In the news" articles persisting in the app' ? from Vandalism persisting in the app? to Vandalism in "In the news" articles persisting in the app' ?.
Tue, Sep 5, 11:13 PM · Reading-Infrastructure-Team-Backlog, Services (watching), Mobile, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, iOS-app-Bugs, Android-app-Bugs
Tbayer added a comment to T172291: Launch page previews A/B test on enwiki and dewiki.

Yes, as @phuedx and @ovasileva note, this instrumentation is not meant to run indefinitely.
As a reminder, the disk space issue has already been discussed extensively at T172322 (@Marostegui, I think you were CCed there at some point but the conversation was mainly handled by other people on the Ops side), which resulted in an assessment that this test can go ahead (T172322#3533459 ; also after the Readers team had put in some extra work to help free up space by dropping another table). It looks like the "Notify DBA and Analytics Engineering when launching" part of the present task was misunderstood a bit above as launching another assessment process essentially duplicating T172322. Rather, the Ops suggestion at T172322#3533459 had been to provide a notification so that disk space use can be monitored after the launch (also by ourselves - that's why the Grafana link is in the task description).

Thanks for providing some context, @Tbayer.
I am fine with this, but I do encourage everyone to check the disk space graph and stop as soon as the available space is less than 900-850G (right now we have 950G available). Otherwise, we will run into serious problem (/cc @elukey )
This host grows quite a lot daily so that's why I am being very careful here, as if we do not control it, we will have a read-only host because of no disk space available :-)

Hm, my takeaway from @elukey's linked comment had actually been that we expect the experiment to fit in the available space with the planned length and event rate.

Tue, Sep 5, 8:21 AM · Patch-For-Review, Readers-Web-Kanban-Board, Page-Previews, Readers-Web-Backlog
Tbayer added a comment to T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana.

Now at 10am on August 31:

Tue, Sep 5, 8:16 AM · Patch-For-Review, Analytics-Kanban, Readers-Web-Backlog (Tracking), Analytics-EventLogging
Tbayer added a comment to T102192: TOC has "Read more" even if no Read more section exists.

This bug still exists in production, and presumably occurs for all non-mainspace pages (e.g. https://en.wikipedia.org/wiki/Wikipedia:Verifiability ).

Tue, Sep 5, 4:29 AM · Android-app-Bugs, WorkType-Maintenance, Wikipedia-Android-App-Backlog

Mon, Sep 4

Tbayer added a comment to T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana.

...

Looks like we still have replication problems @Ottomata :(

For reference, I think Joseph was referring to the fact that earlier, similar issues (such as T155639#3068167) were meant to have been addressed with T124307.

Mon, Sep 4, 9:48 PM · Patch-For-Review, Analytics-Kanban, Readers-Web-Backlog (Tracking), Analytics-EventLogging

Sun, Sep 3

Tbayer added a comment to T65874: Edit link loads the wrong section in some cases (for Mobileview API).

Reported again on the German Wikipedia (for the Android app); I verified this in the case of https://de.wikipedia.org/wiki/Wikipedia:Fragen_zur_Wikipedia (village pump page with a complex template on top).

Sun, Sep 3, 5:22 AM · Android-app-Bugs, WorkType-Maintenance, Wikipedia-Android-App-Backlog

Fri, Sep 1

Tbayer updated subscribers of T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana.
Fri, Sep 1, 7:38 PM · Patch-For-Review, Analytics-Kanban, Readers-Web-Backlog (Tracking), Analytics-EventLogging
Tbayer added a comment to T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana.

Thanks! PS, BTW: The above queries were run on analytics-store. The copy on s1-analytics-slave replicated for 21 more minutes:

Fri, Sep 1, 7:38 PM · Patch-For-Review, Analytics-Kanban, Readers-Web-Backlog (Tracking), Analytics-EventLogging
Tbayer added a comment to T172291: Launch page previews A/B test on enwiki and dewiki.

Heads-up - there seems to be a major issue with the data right now: T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana

Fri, Sep 1, 6:42 PM · Patch-For-Review, Readers-Web-Kanban-Board, Page-Previews, Readers-Web-Backlog
Tbayer added a comment to T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana.

(Not tagging this with Readers-Web-Backlog for now because it doesn't not seem to be an issue with the instrumentation at this point, but perhaps it's worth putting it into the "Tracking" column there.)

Fri, Sep 1, 6:39 PM · Patch-For-Review, Analytics-Kanban, Readers-Web-Backlog (Tracking), Analytics-EventLogging
Tbayer created T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana.
Fri, Sep 1, 6:38 PM · Patch-For-Review, Analytics-Kanban, Readers-Web-Backlog (Tracking), Analytics-EventLogging
Tbayer added a comment to T172291: Launch page previews A/B test on enwiki and dewiki.

@Tbayer - can you confirm that the sample rates are giving us adequate amounts of data?

Fri, Sep 1, 4:59 PM · Patch-For-Review, Readers-Web-Kanban-Board, Page-Previews, Readers-Web-Backlog

Thu, Aug 31

Tbayer updated subscribers of T169948: Stop RelatedArticles A/B test and clean up config.

Confirmed that no events are logged after July 20

Thu, Aug 31, 8:19 PM · RelatedArticles, Readers-Web-Kanban-Board, Readers-Web-Backlog, Wikimedia-Site-requests
Tbayer updated subscribers of T169550: Final Vetting of Family Wide unique devices data .

I had been looking into this from various angles before Wikimania, including reading through the intricate investigations at T143928 (and the bugs that were uncovered, also regarding the existing per-domain uniques) to understand how we ended up with the final version of the queries, reading through the new documentation (fixing various things there myself and leaving some notes on the talk page), and doing some plausibility checks on the data itself. The monthly numbers for Wikipedia in particular look roughly plausible and consistent with the lower bound estimates we have been using previously (derived from the per-domain data), so we have started quoting them as preliminary data for public purposes. I noticed a bug affecting the data for some sister sites (not Wikipedia), which I just filed as T174640 .
I still plan to do some further consistency checks before closing this task. In particular, check that for all project families, countries and months/days,

Thu, Aug 31, 3:04 AM · Reading-analysis, Analytics-Kanban
Tbayer added a comment to T174640: Invalid "wikimedia" family in unique devices data due to misplaced WMF-Last-Access-Global cookie .

Regarding prioritization: While this is a clear bug, it does not affect the (from the Readers team's perspective) most important part of the global uniques data, i.e. the numbers for Wikipedia, and on the traffic side I guess the downsides of including some unnecessary cookies for views to a number of smaller projects can be tolerated for some time.

Thu, Aug 31, 2:12 AM · Traffic, Operations, Analytics
Tbayer created T174640: Invalid "wikimedia" family in unique devices data due to misplaced WMF-Last-Access-Global cookie .
Thu, Aug 31, 2:08 AM · Traffic, Operations, Analytics

Wed, Aug 30

Tbayer added a comment to T172322: Calculate how much Popups events EL databases can host.

@mforns The estimate was for around 100 events/second on average, and the new sampling rate was chosen based on the event rate from the previous instrumentation, where the peak hourly rate on weekdays was usually achieved between 13-16h UTC (and was about 4-6 times higher than the daily low). BTW most of the conversation relevant to this launch is now happening at T172291 instead.

Wed, Aug 30, 7:19 PM · User-Elukey, Analytics-Kanban
Tbayer added a comment to T172291: Launch page previews A/B test on enwiki and dewiki.

Yes, as @phuedx and @ovasileva note, this instrumentation is not meant to run indefinitely.
As a reminder, the disk space issue has already been discussed extensively at T172322 (@Marostegui, I think you were CCed there at some point but the conversation was mainly handled by other people on the Ops side), which resulted in an assessment that this test can go ahead (T172322#3533459 ; also after the Readers team had put in some extra work to help free up space by dropping another table). It looks like the "Notify DBA and Analytics Engineering when launching" part of the present task was misunderstood a bit above as launching another assessment process essentially duplicating T172322. Rather, the Ops suggestion at T172322#3533459 had been to provide a notification so that disk space use can be monitored after the launch (also by ourselves - that's why the Grafana link is in the task description).

Wed, Aug 30, 1:26 PM · Patch-For-Review, Readers-Web-Kanban-Board, Page-Previews, Readers-Web-Backlog

Tue, Aug 29

Tbayer added a comment to T156841: Hadoop: Add a lower priority queue: nice queue.

For the record, it looks like this is now documented at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Queries#Run_long_queries_in_a_screen_session_and_in_the_nice_queue .

Tue, Aug 29, 6:49 PM · Patch-For-Review, Analytics-Kanban, Analytics-Cluster

Aug 25 2017

Tbayer added a comment to T172322: Calculate how much Popups events EL databases can host.

PS (after discussing with @JKatzWMF ): That means that it is now fine from everyone's perspective to drop log.MobileWebUIClickTracking_10742159_15423246, assuming that we retain the wmf.mobilewebuiclicktracking_10742159_15423246 version on Hive. (And regarding T172322#3537746: yes, moving it to a separate archive database instead of wmf sounds like a good idea.)

Aug 25 2017, 1:49 AM · User-Elukey, Analytics-Kanban
Tbayer added a comment to T171049: Collect registration data.

...

The ID numbers are revision IDs of the relevant schema. I'm not sure what it means where there are two numbers in the name

If there are two numbers, that means that the table dates from before the big March 2017 renaming. The appended "15423246" refers to a previous revision of the event capsule schema.

Aug 25 2017, 1:27 AM · User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Banner-Campaign-Summer)

Aug 24 2017

Tbayer added a comment to T172322: Calculate how much Popups events EL databases can host.

So because this seems like a rather simple but performance-intensive query, perhaps we could use it as a test case for how querying EventLogging data might work in case such a table is moved to HDF

Table is now on hdfs (minus sensitive fields) please @JKatzWMF and @Tbayer take a look .

This query:

use wmf;
select count(*) as num_events,event_name, 
wiki,SUBSTR(timestamp,0,6), editCountBucket
from mobilewebuiclicktracking_10742159_15423246
where event_mobileMode='stable'
GROUP BY event_name, wiki, SUBSTR(timestamp,0,6), 
editCountBucket;

Took 2 minutes to run. I think that is the hive equivalent to the one
@JKatzWMF had running on mysql.

Some docs:

https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging#Hadoop._Archived_Data

Ping us when you have time to look at table so we can delete it from mysql (cc @elukey)

Aug 24 2017, 9:50 PM · User-Elukey, Analytics-Kanban
Tbayer added a comment to T172322: Calculate how much Popups events EL databases can host.

We dropped the PageContentSaveComplete table and re-gained only ~100GB , that is not what we expected. I checked some numbers on the databases and reported them in https://phabricator.wikimedia.org/T170720#3533452; I think that TokuDB's compression makes the files on the file system way smaller than what reported by information_schema.TABLES.

The current state of dbstore1002 should allow this experiment to proceed, but I have two suggestions to make:

  1. keep dbstore1002's space consumption checked via this link
  2. alert @Marostegui when you start, together with @Ottomata

Thanks @elukey! I have made a note for the web team to do this as part of the experiment rollout (currently envisaged for early next week).

Aug 24 2017, 9:34 PM · User-Elukey, Analytics-Kanban
Tbayer added a comment to T91344: Identify specific parenthetical elements to exclude from Hovercards.

How is this task a duplicate of T168848? (The task description there only says "Strip balanced parentheticals", without identifying specific parenthetical elements.)

Aug 24 2017, 9:06 PM · Readers-Web-Backlog (Tracking), Page-Previews
Tbayer updated the task description for T172291: Launch page previews A/B test on enwiki and dewiki.
Aug 24 2017, 7:25 PM · Patch-For-Review, Readers-Web-Kanban-Board, Page-Previews, Readers-Web-Backlog

Aug 23 2017

Tbayer updated the task description for T172291: Launch page previews A/B test on enwiki and dewiki.
Aug 23 2017, 11:16 PM · Patch-For-Review, Readers-Web-Kanban-Board, Page-Previews, Readers-Web-Backlog

Aug 22 2017

Tbayer added a comment to T170893: Disable page previews on various special pages.

Thanks @Jdlrobson - this was more an example as part of a general observation about the process and tradeoffs for this feature, but I have filed a task at T173865 .

Aug 22 2017, 6:57 PM · Readers-Web-Kanban-Board, Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer created T173865: Remove Special:CiteThisPage from previews blacklist.
Aug 22 2017, 6:54 PM · Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer added a comment to T170893: Disable page previews on various special pages.

On enwiki, Special:CiteThisPage contains several mainspace links (with useful previews), see e.g. https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&page=Dire_wolf&id=791135679

Aug 22 2017, 4:30 PM · Readers-Web-Kanban-Board, Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer updated subscribers of T160941: Improve SSH access information in onboarding documentation.

CCing @diego and @RobH who (judging from IRC scrollback) grappled quite a bit too with the existing onboarding process the other day. Just in case they have useful input from recent memory on the shortcomings of the current documentation.

Aug 22 2017, 1:21 AM · Operations, Documentation