Tbayer (Tilman Bayer)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 11:21 PM (161 w, 2 d)
Availability
Available
IRC Nick
HaeB
LDAP User
Unknown
MediaWiki User
Tbayer (WMF)

Recent Activity

Yesterday

Tbayer moved T180651: Calculate Android app daily active users from Nigeria from Next Up to In progress on the Reading-analysis board.
Wed, Nov 22, 12:29 AM · New-Readers, Reading-analysis

Tue, Nov 21

Tbayer added a comment to T181064: Purge refined JSON data after 90 days.

...

that will be a problem in case of the Popups schema (and possibly others too which are no longer stored in MySQL),

If the Popups experiment is over and the volume of events will remain low, we can re-enable MySQL imports for it.

Tue, Nov 21, 10:44 PM · Analytics-Kanban, Analytics-EventLogging
Tbayer added a comment to T181064: Purge refined JSON data after 90 days.

I see, but that will be a problem in case of the Popups schema (and possibly others too which are no longer stored in MySQL), as the advice in the documentation doesn't work for them: "If you want to access EL historical data (that has been kept for longer than 90 days), you'll find it in the MariaDB hosts".
So we should exempt that table until the proper purging strategies are implemented on Hive too. Is there already a task for that BTW?

Tue, Nov 21, 9:14 PM · Analytics-Kanban, Analytics-EventLogging
Tbayer added a comment to T179915: Determine expected amount of usage of mobile print to PDF button per browser.

@Tbayer: Can we confirm that the Print events with event_skin = 'minerva' skin are coming from the mobile domain? I ask because it'd be interesting to see if users were printing the mobile site (and by proxy from a mobile device?) prior to us launching the print button.

Tue, Nov 21, 5:54 PM · Reading-analysis, Readers-Web-Backlog
Tbayer added a comment to T180356: Popups EventLogging events occasionally invalid.

...

So, not very often. Only about 6 invalid events in the last ~24hours.

Interesting, but the Popups schema has a very low event rate in general, because the experiment was deactivated last week (T178500). It would be good to know the ratio of errors to correctly logged events during the time of the experiment.

Tue, Nov 21, 5:50 PM · Readers-Web-Backlog, Page-Previews
Tbayer closed T179914: Deploy print to PDF button for Chrome on Android as Resolved.
Tue, Nov 21, 4:52 PM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Readers-Web-Kanban-Board, Patch-For-Review, Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer closed T179914: Deploy print to PDF button for Chrome on Android, a subtask of T179915: Determine expected amount of usage of mobile print to PDF button per browser, as Resolved.
Tue, Nov 21, 4:52 PM · Reading-analysis, Readers-Web-Backlog
Tbayer added a comment to T179914: Deploy print to PDF button for Chrome on Android.

Or to put it differently: Has someone checked that the onBeforePrint/ matchmedia event (cf. T171162#3457776 ) behaves as expected for Chrome mobile on Android?

I don't see any specific record of us testing Chrome on mobile (I don't actually see much QA on T169730),

There was quite a bit of testing, see e.g. the link in my previous comment. But only on desktop, as far as I'm aware. Hence the question.

Tue, Nov 21, 4:51 PM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Readers-Web-Kanban-Board, Patch-For-Review, Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer updated the task description for T181064: Purge refined JSON data after 90 days.
Tue, Nov 21, 4:31 PM · Analytics-Kanban, Analytics-EventLogging
Tbayer added a comment to T181064: Purge refined JSON data after 90 days.

From https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Data_retention_and_auto-purging#Work_in_progress I understand that this will implement the existing purging whitelist. I'll clarify the task description accordingly.

Tue, Nov 21, 4:29 PM · Analytics-Kanban, Analytics-EventLogging
Tbayer updated the task description for T179915: Determine expected amount of usage of mobile print to PDF button per browser.
Tue, Nov 21, 3:34 PM · Reading-analysis, Readers-Web-Backlog
Tbayer added a comment to T180862: Download icon is confusing in file namespace.

Totally agree about the usability issues described. But I'm not quite sure I understand the additional data argument - what exactly is meant by "Number in file namespace (6) seems high"? That users are more likely to tap the button on files pages than on others? Such a conclusion would need a comparison with the pageview numbers in general.

Tue, Nov 21, 8:16 AM · Proton, Readers-Web-Backlog (Design)

Mon, Nov 20

Tbayer added a comment to T180036: Instrument time to first user link interaction.

@bmansurov let's come up with testing criteria for this card on Monday. Also who do we want to do this QA? Let's make that explicit. Anthony? Tilman? A developer?

Mon, Nov 20, 4:39 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Readers-Web-Kanban-Board, Reading-analysis, Readers-Web-Backlog, Page-Previews

Sat, Nov 18

Tbayer moved T179915: Determine expected amount of usage of mobile print to PDF button per browser from Blocked to Next Up on the Reading-analysis board.
Sat, Nov 18, 6:31 AM · Reading-analysis, Readers-Web-Backlog
Tbayer moved T180621: Number of nlwiki (biography) articles getting consistently ~70 hits per day for the past months from Triage to Next Up on the Reading-analysis board.
Sat, Nov 18, 6:30 AM · Analytics-Data-Quality, Reading-analysis
Tbayer moved T180651: Calculate Android app daily active users from Nigeria from Triage to Next Up on the Reading-analysis board.
Sat, Nov 18, 6:30 AM · New-Readers, Reading-analysis
Tbayer created T180870: Update Audiences page and Key Product Metrics with December 2017 Readers data.
Sat, Nov 18, 6:29 AM · Reading-analysis
Tbayer created T180869: Update Audiences page and Key Product Metrics with November 2017 Readers data.
Sat, Nov 18, 6:29 AM · Reading-analysis
Tbayer closed T175441: Update Audiences page and Key Product Metrics with October 2017 Readers data as Resolved.
Sat, Nov 18, 6:29 AM · Reading-analysis
Tbayer closed T175440: Update Audiences page and Key Product Metrics with September 2017 Readers data as Resolved.
Sat, Nov 18, 6:28 AM · Reading-analysis
Tbayer closed T169595: Update Product page and Key Product Metrics with August 2017 Reading data as Resolved.
Sat, Nov 18, 6:28 AM · Reading-analysis
Tbayer created T180868: Add Wiktionary branding and/or attribution to "Define" popups.
Sat, Nov 18, 6:15 AM · Android-app-Bugs, Wikipedia-Android-App-Backlog
Tbayer added a comment to T180193: Create summary event for event logging.

As already alluded to in the comments to the doc, this has tradeoffs (saves some bytes of transferred data, but makes many queries more complicated and slower, and could also be an impediment to ingesting data into Druid/Pivot).

Sat, Nov 18, 1:46 AM · Wikipedia-Android-App-Backlog

Fri, Nov 17

Tbayer renamed T178500: Stop sending data for Page Previews enwiki and dewiki A/B test (again) from Stop Page Previews enwiki and dewiki A/B test (again) to Stop sending data for Page Previews enwiki and dewiki A/B test (again).
Fri, Nov 17, 11:58 PM · Patch-For-Review, Readers-Web-Kanban-Board, Readers-Web-Backlog, Wikimedia-Site-requests, Page-Previews, Easy
Tbayer added a comment to T178500: Stop sending data for Page Previews enwiki and dewiki A/B test (again).

@Tbayer - looks like all is okay. Good to resolve?

Fri, Nov 17, 11:57 PM · Patch-For-Review, Readers-Web-Kanban-Board, Readers-Web-Backlog, Wikimedia-Site-requests, Page-Previews, Easy
Tbayer created T180825: Investigate increase in pageviews with v190.
Fri, Nov 17, 6:17 PM · Reading-analysis, Wikipedia-Android-App-Backlog, Android-app-Bugs

Thu, Nov 16

Tbayer updated subscribers of T179540: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version.

Aye! but MySQL users are not the only user of this data! The performance team folks use it build Grafana dashboards.

Hm, I wonder if they need timestamp though. They might not. If we add a ISO-8601 dt field, and no one else but MySQL uses timestamp, perhaps we can just keep timestamp and make it full on Mediawiki format in the JSON data for backwards compatibility. Am asking Timo. We'll also need to change Camus imports to use dt instead of timestamp.

Thu, Nov 16, 11:07 AM · Analytics-Kanban, Analytics-EventLogging
Tbayer added a comment to T179914: Deploy print to PDF button for Chrome on Android.

...

I followed up on #wikimedia-analytics; this was caused by yesterday's migration of the EventLogging master database.

Thu, Nov 16, 10:55 AM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Readers-Web-Kanban-Board, Patch-For-Review, Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer added a comment to T179625: Resolve EventCapsule / MySQL / Hive schema discrepancies.

Thanks! This has stopped now (T178500), so feel free to go ahead.

Thu, Nov 16, 10:40 AM · Analytics-Kanban, Patch-For-Review, Analytics-EventLogging
Tbayer added a comment to T178500: Stop sending data for Page Previews enwiki and dewiki A/B test (again).

BTW, as noted by @Jdlrobson over at T179914#3764603 , there was a weird gap followed by a spike earlier on Nov 15, a few hours before the stop of the test, which similarly happened for the Print schema.

Thu, Nov 16, 10:18 AM · Patch-For-Review, Readers-Web-Kanban-Board, Readers-Web-Backlog, Wikimedia-Site-requests, Page-Previews, Easy
Tbayer added a comment to T179914: Deploy print to PDF button for Chrome on Android.

...


Not sure what happened between 9 and 12:40 (but a similar no events and spike happened on Popups https://grafana.wikimedia.org/dashboard/db/eventlogging-schema?orgId=1&from=now-24h&to=now&var-schema=Popups)

Thu, Nov 16, 10:05 AM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Readers-Web-Kanban-Board, Patch-For-Review, Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer added a comment to T178500: Stop sending data for Page Previews enwiki and dewiki A/B test (again).

BTW, as noted by @Jdlrobson over at T179914#3764603 , there was a weird gap followed by a spike earlier on Nov 15, a few hours before the stop of the test, which similarly happened for the Print schema.

Thu, Nov 16, 9:40 AM · Patch-For-Review, Readers-Web-Kanban-Board, Readers-Web-Backlog, Wikimedia-Site-requests, Page-Previews, Easy
Tbayer added a comment to T178500: Stop sending data for Page Previews enwiki and dewiki A/B test (again).

@Jdlrobson and I discussed this a bit more on IRC right after T178500#3764662 . Apparently the "blocked until Wednesday" note in the task description had caused some confusion, although I'm not seeing anything unclear about the subsequent, more specific wording "It should be stopped on Thursday, Nov 16th, after we have collected four full weeks of data". (For those unfamiliar with the rationale, it is much preferable to do analysis for timespans of entire weeks because of the strong weekly (and daily) seasonality of reader behavior might distort results otherwise. And after launch, the experiment took at least a day to reach the full event rate, clearly an effect of caching, which we had similarly observed in previous iteration.) Jon and I briefly discussed re-enabling it at the next opportunity a few hours afterwards, but that would not have served this purpose of addressing seasonality.

Thu, Nov 16, 9:31 AM · Patch-For-Review, Readers-Web-Kanban-Board, Readers-Web-Backlog, Wikimedia-Site-requests, Page-Previews, Easy
Tbayer added a comment to T179914: Deploy print to PDF button for Chrome on Android.

This is a bit late in the game, but did we ever test the Schema:Print instrumentation for mobile/Minerva?
Recall that there had been a bit of confusion at T169730: Define and implement instrumentation for printing on desktop web, which (as the task name still says) was initially intended for desktop only, but came to be extended to mobile later. However, that was only after @bmansurov and I had done our testing rounds.

Thu, Nov 16, 12:46 AM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Readers-Web-Kanban-Board, Patch-For-Review, Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer created T180651: Calculate Android app daily active users from Nigeria .
Thu, Nov 16, 12:26 AM · New-Readers, Reading-analysis

Wed, Nov 15

Tbayer added a project to T180621: Number of nlwiki (biography) articles getting consistently ~70 hits per day for the past months: Analytics-Data-Quality.
Wed, Nov 15, 9:01 PM · Analytics-Data-Quality, Reading-analysis
Tbayer added a comment to T180621: Number of nlwiki (biography) articles getting consistently ~70 hits per day for the past months.

For context, this came out of this discussion on Facebook.
The goal here is to first confirm the conclusion that the bot/spider by that particular organization that came up in my quick spot check is indeed causing these anomalies across the board, and then to enable @Effeietsanders to contact them so that they can update their user agent to be in line with the request at https://meta.wikimedia.org/wiki/User-Agent_policy of including the string "bot", which would prevent this from distorting the pageview data for such articles.

Wed, Nov 15, 9:00 PM · Analytics-Data-Quality, Reading-analysis
Tbayer edited projects for T180621: Number of nlwiki (biography) articles getting consistently ~70 hits per day for the past months, added: Reading-analysis; removed Traffic, Operations.
Wed, Nov 15, 8:56 PM · Analytics-Data-Quality, Reading-analysis
Tbayer added a comment to T178500: Stop sending data for Page Previews enwiki and dewiki A/B test (again).

Wait, this was meant to be deployed tomorrow, not today. See task description.

Wed, Nov 15, 8:48 PM · Patch-For-Review, Readers-Web-Kanban-Board, Readers-Web-Backlog, Wikimedia-Site-requests, Page-Previews, Easy
Tbayer added a comment to T178174: Remove AppInstallIId from EventLogging purging white-list.

(To record some more information here while other conversations are ongoing:)

Wed, Nov 15, 8:12 PM · Patch-For-Review, Analytics-Kanban
Tbayer updated subscribers of T158071: Check abnormal pageviews for XHamster .
Wed, Nov 15, 1:38 AM · Analytics-Kanban
Tbayer added a comment to T171881: CL support for Wikipedia Zero piracy problems.

What I really need to dig on this further is an easy way to see a list of recent WP0-abuse-related deletions on various wikis. Am I missing some way to use the deletion log search interfaces?

Z591 should be the best list we have.

If I understand correctly, that's reporting on popular WP0 downloads (which is where I was noticing the parenetheses). I was looking for logs of WP0-abuse-related administrative deletions, to compare against that and find the change in URL encoding between the deletion and the WP0 accesses.

To obtain some examples, one could start from Z591#12542 (@Jdx' most recent report on files that had already been deleted but needed purging) and search Special:Log on the corresponding wiki for the file names derived from each URL, arriving at these entries.
Unless I'm overlooking something, there are no encoding discrepancies in these four examples.

Wed, Nov 15, 12:34 AM · Patch-For-Review, Community-Liaisons (Oct-Dec 2017), Zero

Tue, Nov 14

Tbayer closed T179623: Draft task for New Readers data analysis as Resolved.
Tue, Nov 14, 10:42 PM · Reading-analysis
Tbayer added a comment to T179914: Deploy print to PDF button for Chrome on Android.

Clarified the intention of the last checkbox item per today's standup.

Tue, Nov 14, 6:56 PM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Readers-Web-Kanban-Board, Patch-For-Review, Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer updated the task description for T179914: Deploy print to PDF button for Chrome on Android.
Tue, Nov 14, 6:55 PM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Readers-Web-Kanban-Board, Patch-For-Review, Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer updated the task description for T180036: Instrument time to first user link interaction.
Tue, Nov 14, 6:32 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Readers-Web-Kanban-Board, Reading-analysis, Readers-Web-Backlog, Page-Previews
Tbayer moved T179915: Determine expected amount of usage of mobile print to PDF button per browser from Triage to Blocked on the Reading-analysis board.
Tue, Nov 14, 6:02 PM · Reading-analysis, Readers-Web-Backlog
Tbayer added a comment to T175395: Implement Schema:Print purging strategy.

@mforns It's not necessary for analysis purposes, but can't hurt much either.
BTW I will follow up on some other loose ends here soon and then close this task.

Tue, Nov 14, 4:36 PM · MW-1.31-release-notes (WMF-deploy-2017-10-31 (1.31.0-wmf.6)), Analytics, Patch-For-Review, Readers-Web-Backlog, Proton

Mon, Nov 13

Tbayer added a comment to T178500: Stop sending data for Page Previews enwiki and dewiki A/B test (again).

@ovasileva: It seems like what you're looking for is a way to disable the A/B test without disabling the feature for logged-in users. I'll update the description accordingly.

Can you confirm that, from a product perspective, disabling the feature for a much larger cohort of logged-out users is OK?

Mon, Nov 13, 11:18 PM · Patch-For-Review, Readers-Web-Kanban-Board, Readers-Web-Backlog, Wikimedia-Site-requests, Page-Previews, Easy
Tbayer added a comment to T178500: Stop sending data for Page Previews enwiki and dewiki A/B test (again).

Note that we may still want to reactivate it, probably with a lower rate, to measure a new thing after T180036: Instrument time to first user link interaction is implemented.

Mon, Nov 13, 7:24 PM · Patch-For-Review, Readers-Web-Kanban-Board, Readers-Web-Backlog, Wikimedia-Site-requests, Page-Previews, Easy
Tbayer added a comment to T179625: Resolve EventCapsule / MySQL / Hive schema discrepancies.

Could this be held off two more days, when the data collection for this one ends (T178500)? Having to join two tables with incompatible formats is likely to add a lot of unnecessary complexity to the analysis.

Mon, Nov 13, 7:09 PM · Analytics-Kanban, Patch-For-Review, Analytics-EventLogging
Tbayer updated the task description for T178500: Stop sending data for Page Previews enwiki and dewiki A/B test (again).
Mon, Nov 13, 7:07 PM · Patch-For-Review, Readers-Web-Kanban-Board, Readers-Web-Backlog, Wikimedia-Site-requests, Page-Previews, Easy

Fri, Nov 10

Tbayer renamed T179915: Determine expected amount of usage of mobile print to PDF button per browser from Determine expected amount of usage of print to PDF button per browser to Determine expected amount of usage of mobile print to PDF button per browser.
Fri, Nov 10, 4:07 AM · Reading-analysis, Readers-Web-Backlog
Tbayer added a comment to T180036: Instrument time to first user link interaction.

Is performance.now always available?

Considering that the Popups schema is restricted to sendBeacon capable user agents anyway,
and comparing https://caniuse.com/#feat=beacon with the "Browser compatibility" section at https://developer.mozilla.org/en-US/docs/Web/API/Performance/now , I guess that the theoretical answer is yes, in this context.

If not, for completeness what should we send in its absence. I'm guessing undefined should be fine?

Something that ends up as NULL in the EventLogging MySQL table might be best.

Fri, Nov 10, 2:04 AM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Readers-Web-Kanban-Board, Reading-analysis, Readers-Web-Backlog, Page-Previews

Wed, Nov 8

Tbayer updated the task description for T180036: Instrument time to first user link interaction.
Wed, Nov 8, 11:44 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Readers-Web-Kanban-Board, Reading-analysis, Readers-Web-Backlog, Page-Previews
Tbayer updated the task description for T180036: Instrument time to first user link interaction.
Wed, Nov 8, 8:00 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Readers-Web-Kanban-Board, Reading-analysis, Readers-Web-Backlog, Page-Previews
Tbayer added a project to T180036: Instrument time to first user link interaction: Reading-analysis.
Wed, Nov 8, 7:58 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Readers-Web-Kanban-Board, Reading-analysis, Readers-Web-Backlog, Page-Previews
Tbayer updated subscribers of T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

...

In the longer run, analytics is removing the mysql server that hosts event logging next quarter, so things are going to have to move off mysql anyways sooner than later.

I was curious about the provenance of this statement, so Erik and I talked a bit about this yesterday. It turned out to be based on a remark by @Milimetric on IRC last week, but @Milimetric has since clarified that while the Analytics Engineering team does start to recommend to move schemas/analysis to Hive, there are no set plans to switch off the MySQL access at this point. (In fact Ops spun up a new MySQL server for EventLogging just last week, which I understand alleviates some of the immediate infrastructure concerns.) The Analytics Engineering team has previously stated that they don't want to take decisions about the future setup of EL unilaterally (T159170#3064701).

Wed, Nov 8, 6:29 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery

Tue, Nov 7

Tbayer added a comment to T179426: Identify typical time to first user interaction.

@Gilles: Agreed. Page Previews already sends a fair bit of data to statsv, so logging and visualizing this metric shouldn't be too much effort. Indeed, Performance-Team helped us set up and reviewed that dashboard way back when 🙂

Do we already have experience creating histograms such as the above in Grafana? (keeping in mind T179426#3737738 )
Also, would we aim to send this metric only for the first (earliest) link interaction, or select the minimum (per pageview) server-side using a page token?

Tue, Nov 7, 9:51 PM · Reading-analysis, Readers-Web-Kanban-Board, Spike, Readers-Web-Backlog, Page-Previews
Tbayer added a comment to T177969: [Spike 2h] Which browsers are downloading PDFs (including OS percentages)?.

Thanks @phuedx! So I think this discrepancy shows it's better to rely on a longer timespan. Below, I have extended it to four weeks (Oct 1-28), which is actually OK to do provided one is content to relegate one's query to new "nice" queue on Hive (and prepared to wait a bit longer in case there are more timely queries running in the normal, non-nice queue - but even so, this query only took less than two hours for these four weeks' worth of data).

Tue, Nov 7, 5:44 PM · Electron-PDFs, User-Jdlrobson, Readers-Web-Kanban-Board, Spike, Proton, Readers-Web-Backlog
Tbayer added a comment to T179426: Identify typical time to first user interaction.

And here is the same data in form of a cumulative histogram, to make it easier to read out percentiles (e.g. the median is around 5 seconds, the tenth percentile is <0.5 seconds - again, subject to rounding errors):

Tue, Nov 7, 5:12 PM · Reading-analysis, Readers-Web-Kanban-Board, Spike, Readers-Web-Backlog, Page-Previews
Tbayer added a comment to T179426: Identify typical time to first user interaction.

Here is a histogram:


As indicated above, the restriction to integer timestamps introduces some rounding errors, basically smearing out the graph a bit horizontally.

Tue, Nov 7, 4:55 PM · Reading-analysis, Readers-Web-Kanban-Board, Spike, Readers-Web-Backlog, Page-Previews
Tbayer renamed T179426: Identify typical time to first user interaction from Identify average time to user interaction to Identify typical time to first user interaction.
Tue, Nov 7, 4:46 PM · Reading-analysis, Readers-Web-Kanban-Board, Spike, Readers-Web-Backlog, Page-Previews
Tbayer moved T179426: Identify typical time to first user interaction from Triage to In progress on the Reading-analysis board.
Tue, Nov 7, 12:21 AM · Reading-analysis, Readers-Web-Kanban-Board, Spike, Readers-Web-Backlog, Page-Previews

Mon, Nov 6

Tbayer added a project to T179836: Remove EL capsule from meta and add it to codebase: Analytics-EventLogging.
Mon, Nov 6, 7:41 PM · Analytics-EventLogging, Analytics-Kanban
Tbayer added a comment to T179836: Remove EL capsule from meta and add it to codebase.

I don't quite understand the "cannot evolve on its own" argument; isn't that the case for any and all schema pages on Meta? (They are all tied to code, whether generic or instrumentation-specific.)

Mon, Nov 6, 7:41 PM · Analytics-EventLogging, Analytics-Kanban
Tbayer added a comment to T166752: Add OpenGraph tags to blog.

Thanks anyway, @Ladsgroup! Just to confirm and for the benefit of others who may pick up this task one day: Did your exploration include any of the options mentioned in T166752#3336898 (adding the image information to the beginning of the description, or adapting a watermarking plugin)?

Mon, Nov 6, 11:02 AM · User-Ladsgroup, Wikimedia-Blog

Fri, Nov 3

Tbayer updated subscribers of T179540: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version.

As documented in https://meta.wikimedia.org/wiki/Schema:EventCapsule , EventLogging tables are currently using timestamps in the YYYYMMDDHHMMSSformat.

Tilman, you made the schema change that changed the EventCapsule timestamp field to Mediawiki timestamp format! :o

https://meta.wikimedia.org/w/index.php?title=Schema:EventCapsule&diff=prev&oldid=16479585

This is actually very incorrect. The EventCapsule timestamp is utc-millisec[1].

No, these are not in the utc-millisec format, at least according to the documentation I can find: https://www.npmjs.com/package/json-gate : "A number or an integer containing the number of milliseconds that have elapsed since midnight UTC, 1 January 1970." There are no milliseconds in this data, these are integer Epoch seconds only.

Fri, Nov 3, 8:36 PM · Analytics-Kanban, Analytics-EventLogging

Thu, Nov 2

Tbayer moved T175195: Analyze impact of appearance controls on usage from Triage to In progress on the Reading-analysis board.
Thu, Nov 2, 10:48 PM · Reading-analysis, Wikipedia-iOS-App-Backlog
Tbayer claimed T175195: Analyze impact of appearance controls on usage.
Thu, Nov 2, 10:48 PM · Reading-analysis, Wikipedia-iOS-App-Backlog
Tbayer updated the task description for T176023: Implement IE7 correction for long-term trend charts.
Thu, Nov 2, 9:58 PM · Reading-analysis
Tbayer moved T176023: Implement IE7 correction for long-term trend charts from Triage to In progress on the Reading-analysis board.
Thu, Nov 2, 9:52 PM · Reading-analysis
Tbayer created T179623: Draft task for New Readers data analysis.
Thu, Nov 2, 9:51 PM · Reading-analysis
Tbayer closed T169285: Data analysis support regarding WP0 uploads as Resolved.

Closing this now - the aforementioned daily query is still running, but now automated (T175227). I have also been sharing some insights from observing the data it has been generating (e.g. here and here).

Thu, Nov 2, 9:48 PM · Reading-analysis
Tbayer added a comment to T179540: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version.

Interesting findings! Food for thought... we should probably reach out to other users of this data to get more input on the best choice going forward; how about posting to Analytics-l?

Thu, Nov 2, 7:09 PM · Analytics-Kanban, Analytics-EventLogging
Tbayer added a comment to T177969: [Spike 2h] Which browsers are downloading PDFs (including OS percentages)?.

@phuedx Thanks for documenting the query used (T177969#3687130 )! Can you also specify the timespan for which it was ran? (I.e. the concrete values of M, N, O, P and Q.) I re-ran it for a different timespan - October 31 - and got quite different results. E.g. 38.15% for Chrome 61 on Windows 10 instead of 8.83%, but only 6.55% for "Other" instead of 14.38%, etc. This may be because the total number of downloads in the timespan used was too low, and hence the statistical error (random variation) too large.

Thu, Nov 2, 2:45 AM · Electron-PDFs, User-Jdlrobson, Readers-Web-Kanban-Board, Spike, Proton, Readers-Web-Backlog

Wed, Nov 1

Tbayer added a parent task for T179540: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version: T162610: Implement EventLogging Hive refinement.
Wed, Nov 1, 10:09 PM · Analytics-Kanban, Analytics-EventLogging
Tbayer added a subtask for T162610: Implement EventLogging Hive refinement: T179540: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version.
Wed, Nov 1, 10:09 PM · Patch-For-Review, Analytics-Kanban, Analytics-EventLogging
Tbayer created T179540: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version.
Wed, Nov 1, 10:09 PM · Analytics-Kanban, Analytics-EventLogging
Tbayer added a comment to T175195: Analyze impact of appearance controls on usage.

Regarding bonus question 1, I took a quick look at that curve in Pivot (restricted to North America because timezones), but this doesn't seem to be something that can be eyeballed easily. Can discuss more in person.

Wed, Nov 1, 8:29 PM · Reading-analysis, Wikipedia-iOS-App-Backlog
Tbayer added a comment to T175195: Analyze impact of appearance controls on usage.

Regarding the second question: day-7 retention does not seem to have changed notably with the rollout of 5.6.0.

Wed, Nov 1, 8:12 PM · Reading-analysis, Wikipedia-iOS-App-Backlog
Tbayer added a project to T179426: Identify typical time to first user interaction: Reading-analysis.
Wed, Nov 1, 5:29 PM · Reading-analysis, Readers-Web-Kanban-Board, Spike, Readers-Web-Backlog, Page-Previews
Tbayer claimed T179426: Identify typical time to first user interaction.
Wed, Nov 1, 5:28 PM · Reading-analysis, Readers-Web-Kanban-Board, Spike, Readers-Web-Backlog, Page-Previews
Tbayer added a comment to T179426: Identify typical time to first user interaction.

Yes, the Popups schema has both a pageLoaded event and events for every link interaction, so this is doable (assuming pageloaded is a good starting point to count this time from). Be aware though that it will need to be based on the server-side timestamp field which only has a resolution of one second (combined with the client-side totalInteractionTime field that has a millisecond resolution).

Wed, Nov 1, 5:14 PM · Reading-analysis, Readers-Web-Kanban-Board, Spike, Readers-Web-Backlog, Page-Previews
Tbayer added a comment to T178500: Stop sending data for Page Previews enwiki and dewiki A/B test (again).

[15:47:59] <ottomata> HaeB: no storage issues in hadoop. we are maintaining a temporary custom import/refine job for this schema, while we work on more generically supporting eventlogging data in hadoop
[15:48:23] <ottomata> i think we can keep running the custom job for yall a while longer, seems fine with me

Wed, Nov 1, 3:49 PM · Patch-For-Review, Readers-Web-Kanban-Board, Readers-Web-Backlog, Wikimedia-Site-requests, Page-Previews, Easy
Tbayer added a comment to T178500: Stop sending data for Page Previews enwiki and dewiki A/B test (again).

From my understanding, there is no space issue on Hadoop, so it would be no problem to continue the test for say a week or two. But that's the area of expertise of Analytics Engineering and/or Ops - I'll ask in #wikimedia-analytics to confirm.

Wed, Nov 1, 3:12 PM · Patch-For-Review, Readers-Web-Kanban-Board, Readers-Web-Backlog, Wikimedia-Site-requests, Page-Previews, Easy

Tue, Oct 31

Tbayer added a comment to T179436: Set up measurement of page loading timings.

Perhaps take a look at the NavigationTiming schema and how the Performance team uses it - or consult with that team?

Tue, Oct 31, 9:51 PM · Wikipedia-iOS-App-Backlog, iOS-app-v5.8.0-Manatee-On-A-Skateboard

Mon, Oct 30

Tbayer added a comment to T171881: CL support for Wikipedia Zero piracy problems.

Yes, it's an interesting theory, but please note that the reports in that channel are not listing all new files or deleted files in general, but those likely to be WP0 piracy uploads. And the uploaders of these problematic files have adapted a naming scheme that includes parentheses.

Mon, Oct 30, 3:31 AM · Patch-For-Review, Community-Liaisons (Oct-Dec 2017), Zero

Oct 23 2017

Tbayer closed T178802: Add Tilman to analytics-admins as Resolved.

Thanks all!

Oct 23 2017, 8:27 PM · Patch-For-Review, Operations, Ops-Access-Requests, Analytics-Kanban
Tbayer added a comment to T177215: Build download button for mobile PDF download.

User agent sniffing generally should be avoided and is likely to lead to a very complicated regex so I want to challenge that.

A quick Google yields that navgiator.userAgent.indexOf( 'Android' ) !== -1 matches the majority of Android devices and adding navigator.userAgent.indexOf( 'Kindle' ) !== -1 should catch older Kindle devices. I'd note that:

I guess the state of the art here may be ua-parser, which indeed seems to rely on the presence of the string "Android" (with this exception to exclude mobile IE on Windows Phone, and two other tests to catch Kindle and UCWEB on Android).

Oct 23 2017, 8:19 PM · MW-1.31-release-notes (WMF-deploy-2017-10-24 (1.31.0-wmf.5)), Readers-Web-Kanban-Board, Readers-Web-Backlog, Proton, New-Readers

Oct 17 2017

Tbayer added a comment to T170956: Instrument the location search API usage from the iOS app.

Thanks for the update on the timing! We can chat about this in person at the offsite now, but to record a result about the first part already (populating the map): This seems to be getting a bit less than 400k requests per day on average, with 94-95% of them for a single query string (all parameters identical - is this for some kind of initial coordinate?):

Oct 17 2017, 6:31 PM · Reading-analysis, Wikipedia-iOS-App-Backlog

Oct 16 2017

Tbayer updated the task description for T176469: Relaunch page previews a/b test on en and de wiki.
Oct 16 2017, 7:38 PM · Readers-Web-Kanban-Board, Patch-For-Review, Wikimedia-Site-requests, Page-Previews, Readers-Web-Backlog
Tbayer updated subscribers of T176469: Relaunch page previews a/b test on en and de wiki.

@phuedx, @Niedzielski: The currently stated sampling rates should work, but I sense some confusion behind T176469#3626758 and T176469#3672668 :

Oct 16 2017, 7:34 PM · Readers-Web-Kanban-Board, Patch-For-Review, Wikimedia-Site-requests, Page-Previews, Readers-Web-Backlog
Tbayer added a comment to T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana.

Ping @Tbayer , does my explanation make sense? The struct-shaped columns will come with automgically refined eventlogging data, scopping from MySQL leaves data with column format more similar to MySQL.

Yes, I think I understand the difference. (I would be fine with calling the process of making the data available in Hive "import" too, but I can see why you prefer to call it "refining". BTW, for cross-reference, I understand that T177783 refers to this process.)

Oct 16 2017, 7:09 PM · Analytics-Kanban, Readers-Web-Backlog (Tracking), Analytics-EventLogging
Tbayer added a subtask for T176469: Relaunch page previews a/b test on en and de wiki: T177783: Eventlogging refine popups, temporary cron.
Oct 16 2017, 4:09 PM · Readers-Web-Kanban-Board, Patch-For-Review, Wikimedia-Site-requests, Page-Previews, Readers-Web-Backlog
Tbayer added a parent task for T177783: Eventlogging refine popups, temporary cron: T176469: Relaunch page previews a/b test on en and de wiki.
Oct 16 2017, 4:09 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban
Tbayer added a comment to T158172: Analyze results of stage 0 page previews test.

For the record, the equivalent of T158172#3625184 for the "seen" limit we later settled on (1000ms instead of 1500ms):

Oct 16 2017, 12:55 PM · Page-Previews, Readers-Web-Backlog, Reading-analysis

Oct 13 2017

Tbayer added a comment to T178174: Remove AppInstallIId from EventLogging purging white-list.

Also, what about raw userAgents?

See T164125#3284763 , it seems we lack a bit of institutional memory here. In general, as noted at T164125#3234590 , app users agents contain a lot less entropy than general user agents.

Oct 13 2017, 9:36 PM · Patch-For-Review, Analytics-Kanban
Tbayer updated subscribers of T178174: Remove AppInstallIId from EventLogging purging white-list.

Folks, we spent quite a bit of time just a few months ago on a comprehensive review of the purging settings for all apps schemas, which included discussion of this field (cf. e.g. T164125 ). Which schemas are affected exactly? Please do not change the settings without an opportunity for the apps teams to review the tradeoffs involved (CC @JMinor).

Oct 13 2017, 9:35 PM · Patch-For-Review, Analytics-Kanban
Tbayer added a comment to T158172: Analyze results of stage 0 page previews test.

Update: Here is a look at the disable actions, which are still very rare.

SELECT LEFT(timestamp, 8) AS yearmonthday, COUNT(*) AS disables
FROM log.Popups_16364296
WHERE event_action = 'disabled'
GROUP BY yearmonthday
ORDER BY yearmonthday;

+--------------+----------+
| yearmonthday | disables |
+--------------+----------+
| 20170703     |        1 |
| 20170705     |        2 |
| 20170706     |        4 |
| 20170718     |        1 |
| 20170723     |        1 |
| 20170724     |        1 |
| 20170727     |        1 |
| 20170728     |        1 |
| 20170730     |        1 |
| 20170731     |        2 |
| 20170801     |        1 |
| 20170803     |        1 |
| 20170804     |        1 |
| 20170806     |        1 |
| 20170807     |        1 |
| 20170808     |        1 |
+--------------+----------+
16 rows in set (1 min 5.56 sec)
Oct 13 2017, 7:07 PM · Page-Previews, Readers-Web-Backlog, Reading-analysis