Tbayer (Tilman Bayer)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 11:21 PM (191 w, 3 d)
Availability
Available
IRC Nick
HaeB
LDAP User
Unknown
MediaWiki User
Tbayer (WMF)

Recent Activity

Yesterday

Tbayer moved T148262: Vet and explore new readership engagement metric from Triage to Backlog on the Product-Analytics board.
Thu, Jun 21, 8:25 PM · Product-Analytics, Patch-For-Review, Reading-analysis
Tbayer moved T148263: Vet and explore new readership retention metric from Triage to Backlog on the Product-Analytics board.
Thu, Jun 21, 8:24 PM · Product-Analytics, Reading-analysis
Tbayer moved T148287: Outreachy Project Proposal - 5 subprojects with the Reading Department from Triage to Backlog on the Product-Analytics board.
Thu, Jun 21, 8:23 PM · Product-Analytics, Outreachy (Round-13), Reading-analysis
Tbayer moved T157307: Analyze performance of related pages feature from Triage to Backlog on the Product-Analytics board.
Thu, Jun 21, 8:20 PM · Product-Analytics, RelatedArticles, Reading-Web-Sprint-96, Reading-analysis, Readers-Web-Backlog
Tbayer moved T196159: Turn off Schema:Print (timing TBD) from Triage to Blocked on the Product-Analytics board.
Thu, Jun 21, 8:08 PM · MediaWiki-extensions-WikimediaEvents, Product-Analytics, Readers-Web-Backlog
Tbayer added a comment to T92457: PageImages not compatible with webm files.

Well, but that's a very narrow interpretation of the issue that this task should have resolved.

Sure, but at least it provides an option for certain use cases. I also hope we'd agree this is better than blacklisting all webm files.

Right, that true of course.

It doesn't solve the case described in T92457#4173454 (the same video used with two different thumbtimes...)

That's correct.

I think this task was always a bit unhelpfully vague, so I've opened a more specific task at T197839. I'm not sure if it's feasible for us to fix it, as page images only captures the title of the associated page image and would likely need quite a large re-architecture to support that but we'll see.

Thu, Jun 21, 1:00 AM · Readers-Web-Backlog, Page-Previews, PageImages

Wed, Jun 20

Tbayer added a comment to T92457: PageImages not compatible with webm files.

...

First, it would create a lot of extra work for the editors who would have to transfer locally chosen thumbtimes to Commons as the default thumbtime, and update them there in case they are changed locally.

I'm not sure it's true that it creates a lot of work.
We've only seen 1 case in the wild so far where the page image has been the initial screen and the initial screen of that video was a black screen.

Well, but that's a very narrow interpretation of the issue that this task should have resolved. It is more appropriately described as "the initial screen is not a suitable page image".

Wed, Jun 20, 9:34 PM · Readers-Web-Backlog, Page-Previews, PageImages
Tbayer added a comment to T191532: Mobile page issues - instrument page issues.

To record another takeaway from today's meeting: We did consider sampling/bucketing by page instead of by session ID, which (cf. T191532#4162147 ) could potentially avoid the FOUC, and which one might be able to implement based on page IDs alone. However, @Jdlrobson pointed out the pragmatic argument for using session IDs, which is that by now we have a lot of experience with that approach, and can use existing code.

Wed, Jun 20, 7:14 PM · Readers-Web-Backlog, Page-Issue-Warnings

Mon, Jun 18

chelsyx awarded T194424: Provide separate edit tags for Android and iOS apps a Like token.
Mon, Jun 18, 9:29 PM · Reading-Infrastructure-Team-Backlog (Kanban), Product-Analytics
Tbayer added a comment to T197542: Pageviews data missing for June 14.

Might be related to T197281, see also this announcement.

Mon, Jun 18, 4:35 AM · Pageviews-API, Analytics

Fri, Jun 15

Tbayer added a comment to T196904: Some VirtualPageView are too long and fail EventLogging processing.

That's definitely an option. I'm not sure what the limit would be though - we'd need to fine tune that. @Tbayer any down sides of that?

No, that sounds like a reasonable suggestion, and @mforns makes a great point about the importance of keeping project and language_variant consistent with pageview_hourly.
If there is interest, we could estimate how often the truncation would happen, by looking at the distribution of page name length (encoded and unencoded) weighted by pageviews.

Fri, Jun 15, 3:36 PM · Readers-Web-Kanbanana-Board, Page-Previews, Analytics, Readers-Web-Backlog, Analytics-EventLogging

Wed, Jun 13

Tbayer added a comment to T196904: Some VirtualPageView are too long and fail EventLogging processing.

@Jdlrobson capsule does not have project has wiki (frwiki) and "webhost". I think we can probably get away with webhost here since this is only deployed to desktop. For other schemas this webhost might be m.fr.wikipedia.org and thus not equivalent to project.

Wed, Jun 13, 10:27 PM · Readers-Web-Kanbanana-Board, Page-Previews, Analytics, Readers-Web-Backlog, Analytics-EventLogging
Tbayer renamed T194424: Provide separate edit tags for Android and iOS apps from Determine rate of edits made via the Android Wikipedia app to Provide separate edit tags for Android and iOS apps (was: Determine rate of edits made via the Android Wikipedia app).
Wed, Jun 13, 5:37 PM · Reading-Infrastructure-Team-Backlog (Kanban), Product-Analytics
Tbayer added a comment to T196904: Some VirtualPageView are too long and fail EventLogging processing.

@Ottomata @Tbayer i believe the idea was to be consistent with pageviews.

Consistent in what sense? Recall that the main purpose of this schema is as an intermediate step for the virtualpageview_hourly table, where we only need the source's page title, page ID and namespace (consistent with the pageview_hourly table) - not the full URL.

Wed, Jun 13, 4:18 PM · Readers-Web-Kanbanana-Board, Page-Previews, Analytics, Readers-Web-Backlog, Analytics-EventLogging

Tue, Jun 12

mpopov awarded T194424: Provide separate edit tags for Android and iOS apps a Mountain of Wealth token.
Tue, Jun 12, 8:58 PM · Reading-Infrastructure-Team-Backlog (Kanban), Product-Analytics
Tbayer reassigned T189307: Instrument the Proton service to match mediawiki-services-electron-render from Tbayer to phuedx.
Tue, Jun 12, 4:06 PM · Readers-Web-Kanbanana-Board, Services (watching), Readers-Web-Backlog, Graphite, Proton
Tbayer added a comment to T184793: [EPIC] Instrument page interactions.

Why are there events for versions of IE before 11?

The number of events for those should be 0. The ratio should be 0. It's worth digging into those events and working out if they can be filtered out. Likely to be from different hosts.

Tue, Jun 12, 3:57 PM · Product-Analytics, Epic, Reading-analysis, MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), Readers-Web-Kanbanana-Board, Page-Previews, Readers-Web-Backlog
Tbayer added a comment to T196904: Some VirtualPageView are too long and fail EventLogging processing.

What was our reason again (in T184793 and T186728) to record both source_title and source_url in the VirtualPageView schema? Would removing that redundancy avoid this problem?

Tue, Jun 12, 3:52 PM · Readers-Web-Kanbanana-Board, Page-Previews, Analytics, Readers-Web-Backlog, Analytics-EventLogging

Mon, Jun 11

Tbayer added a comment to T193578: Assess impact of ua-parser update on core metrics.

@Tbayer can you clarify? I think you're linking to the wrong task.

Mon, Jun 11, 11:18 AM · Reading-analysis, Product-Analytics, Analytics

Fri, Jun 8

Tbayer added a comment to T191532: Mobile page issues - instrument page issues.

Schema:PageIssues

Would it make more sense to use a (approximate) date for issuesVersion and a (threshold) number for editCountBucket? This might make queries or data manipulation easier since no conversions would be necessary.

I should have noted that editCountBucket was pulled from https://meta.wikimedia.org/wiki/Schema:Popups in the hope that we might have some easily reusable code there. (It wasn't mentioned in the task description, but @ovasileva and I agree that having this data might be useful for understanding how the effect of the new design might differ by editor experience level.)
I agree it could be preferable to use numbers (0,1,5,100, 1000) instead of strings ("0 edits", "1-4 edits", "5-99 edits", ...), but if we just use the existing setup, that's totally fine too.

Fri, Jun 8, 8:00 PM · Readers-Web-Backlog, Page-Issue-Warnings
Tbayer added a comment to T179915: Determine expected amount of usage of mobile print to PDF button per browser.

Here is a chart of PDF downloads per day (we already used this data last month in the quarterly check-in):

Fri, Jun 8, 12:50 PM · Product-Analytics, New-Readers, Readers-Web-Backlog (Tracking), Reading-analysis
Tbayer added a comment to T138207: [Open question] Improve bot identification at scale.

Is this going to be carried forward into the 2018-19 annual plan? Improvements in this area would be very valuable for reader analytics in Audiences, too.

Fri, Jun 8, 12:11 PM · Research, Analytics

Thu, Jun 7

Tbayer added a comment to T120170: [Epic] Paid editing (COI) detection model.

Although there is also the closely related issue of undisclosed paid editing, which is in infringement of Wikimedia's Terms of Use.

Thu, Jun 7, 1:49 PM · Scoring-platform-team (Current), Research Ideas, artificial-intelligence

Wed, Jun 6

Tbayer added a comment to T138207: [Open question] Improve bot identification at scale.

Is this going to be carried forward into the 2018-19 annual plan? Improvements in this area would be very valuable for reader analytics in Audiences, too.

Wed, Jun 6, 4:34 PM · Research, Analytics
Tbayer added a comment to T196558: Evaluate alternate means to send X-Analytics information from Varnish to Hadoop..

See the investigation results at T188807 and the recently updated documentation at https://wikitech.wikimedia.org/wiki/X-Analytics .

Wed, Jun 6, 3:52 PM · Performance-Team (Radar), Analytics
Tbayer added a comment to T193578: Assess impact of ua-parser update on core metrics.

Thanks for the explanation, @fdans ! It seems like the best option for now regarding T189307 is to convert that kind of pageview_hourly query into an equivalent (if much slower) webrequest query that uses the old ua-parser regex. (Or do you happen to see a better solution?)

Wed, Jun 6, 3:39 PM · Reading-analysis, Product-Analytics, Analytics
Tbayer added a comment to T194427: Deploy Turnilo (possible pivot replacement).

Since Turnilo has now officially replaced Pivot, it would be great to update the documentation on Wikitech (the main page at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Pivot , and others where Pivot is mentioned).

Wed, Jun 6, 3:00 PM · Patch-For-Review, Analytics-Kanban, Analytics

Tue, Jun 5

Tbayer edited Description on Product-Analytics.
Tue, Jun 5, 4:35 PM
Tbayer added a comment to T191532: Mobile page issues - instrument page issues.

I threw up a first draft of the schema at https://meta.wikimedia.org/wiki/Schema:PageIssues , based on the current task description. We still need to fully decide on the sampling/bucketing strategy.

Tue, Jun 5, 4:05 PM · Readers-Web-Backlog, Page-Issue-Warnings
Tbayer updated the task description for T191532: Mobile page issues - instrument page issues.
Tue, Jun 5, 3:59 PM · Readers-Web-Backlog, Page-Issue-Warnings
Tbayer moved T196113: Update Audiences page and Key Product Metrics with May 2018 Readers data from Blocked to Backlog on the Product-Analytics board.
Tue, Jun 5, 2:17 PM · Product-Analytics
Tbayer added a project to T194424: Provide separate edit tags for Android and iOS apps: Android-app-Bugs.
Tue, Jun 5, 2:02 PM · Reading-Infrastructure-Team-Backlog (Kanban), Product-Analytics

Sat, Jun 2

Tbayer added a comment to T195819: Pageviews-daily broken after move from Pivot to Turnilo.

Sounds good, thanks!

Sat, Jun 2, 8:02 PM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics
Tbayer reopened T195819: Pageviews-daily broken after move from Pivot to Turnilo as "Open".

Sorry, but the pageviews-daily dataset as it was linked in the task description is still broken: While the quoted error message is gone, it still only offers the "Count" measure (which is rather meaningless and likely to mislead users into believing we get only between 3 and 4 million pageviews per day overall) and not the "View Count" measure that we need.

Sat, Jun 2, 10:15 AM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics
Tbayer reopened T195819: Pageviews-daily broken after move from Pivot to Turnilo, a subtask of T194427: Deploy Turnilo (possible pivot replacement), as Open.
Sat, Jun 2, 10:15 AM · Patch-For-Review, Analytics-Kanban, Analytics

Fri, Jun 1

Tbayer added a comment to T191532: Mobile page issues - instrument page issues.

What does the modal action "external links" mean? (does the modal contain any links pointing outside Wikipedia, or is it still possible to click on external links in the article while the modal shows?)

Fri, Jun 1, 5:29 PM · Readers-Web-Backlog, Page-Issue-Warnings
Tbayer updated subscribers of T194961: Count link previews on the Android app .

Hi @mpopov Is this for you?

Analytics Engineering as far as I know

Yes - I discussed briefly with @Dbrant before filing this task, and it could turn out there's additional effort needed on the app's side, but for now it seems that the work done in the context of T110702 (probably in connection with the addition of referrer data T192779 ) could suffice.

Fri, Jun 1, 5:17 PM · Analytics, Android-app-Bugs, Wikipedia-Android-App-Backlog
Tbayer closed T181297: Instrument print to PDF button as Resolved.

This task has a somewhat a convoluted history, but after spending some time reviewing the various parts I think we can check all the boxes and "sign off". The only part that is not entirely clear to me from the comments above is whether anyone tested the standard print path ("Share" --> "Print") for Chrome mobile (per T179915). Clarifying that might still be useful, but it doesn't block the main analysis at T179915 regarding the download button usage.

Fri, Jun 1, 12:57 PM · Product-Analytics, New-Readers, Readers-Web-Backlog (Tracking), Reading-analysis, MW-1.31-release-notes (WMF-deploy-2018-01-16 (1.31.0-wmf.17)), Patch-For-Review, Proton
Tbayer closed T181297: Instrument print to PDF button, a subtask of T179915: Determine expected amount of usage of mobile print to PDF button per browser, as Resolved.
Fri, Jun 1, 12:57 PM · Product-Analytics, New-Readers, Readers-Web-Backlog (Tracking), Reading-analysis
Tbayer updated the task description for T181297: Instrument print to PDF button.
Fri, Jun 1, 12:42 PM · Product-Analytics, New-Readers, Readers-Web-Backlog (Tracking), Reading-analysis, MW-1.31-release-notes (WMF-deploy-2018-01-16 (1.31.0-wmf.17)), Patch-For-Review, Proton
Tbayer created T196159: Turn off Schema:Print (timing TBD).
Fri, Jun 1, 12:42 PM · MediaWiki-extensions-WikimediaEvents, Product-Analytics, Readers-Web-Backlog
Tbayer updated the task description for T179915: Determine expected amount of usage of mobile print to PDF button per browser.
Fri, Jun 1, 11:26 AM · Product-Analytics, New-Readers, Readers-Web-Backlog (Tracking), Reading-analysis
Tbayer awarded T170019: Script that synchronizes EL purging white-list with schema talk pages a Like token.
Fri, Jun 1, 10:37 AM · Analytics
Tbayer added a comment to T195880: Problems with external referrals?.

From @JAllemandou 's e-mail:

[...]

  • We did change the referer_class code, but we deployed it beginning of May, not April (5th to be precise).

Does that refer to T191714: Add Ecosia and Startpage to list of search engines (which according to this log was deployed on May 2), or to some other change that could have affected the data as well?

Fri, Jun 1, 3:56 AM · Analytics-Kanban, Analytics

Thu, May 31

Tbayer moved T196114: Update Audiences page and Key Product Metrics with June 2018 Readers data from Triage to Blocked on the Product-Analytics board.
Thu, May 31, 9:23 PM · Product-Analytics
Tbayer created T196114: Update Audiences page and Key Product Metrics with June 2018 Readers data.
Thu, May 31, 9:23 PM · Product-Analytics
Tbayer moved T196113: Update Audiences page and Key Product Metrics with May 2018 Readers data from Triage to Blocked on the Product-Analytics board.
Thu, May 31, 9:22 PM · Product-Analytics
Tbayer created T196113: Update Audiences page and Key Product Metrics with May 2018 Readers data.
Thu, May 31, 9:22 PM · Product-Analytics
Tbayer closed T190601: Update Audiences page and Key Product Metrics with April 2018 Readers data as Resolved.
Thu, May 31, 9:20 PM · Product-Analytics, Reading-analysis
Tbayer added a comment to T196091: Investigate recent metamorphosis of unreferred views into internally referred views on mobile web.

Per @JKatzWMF this may be related to T154702 .

Thu, May 31, 8:12 PM · Product-Analytics
Tbayer added a comment to T196100: Traffic and content data for SDG impact report.

Related: https://www.itu.int/en/ITU-D/Statistics/Documents/publications/wsisreview2014/WSIS2014_review.pdf (a UN report from a few years back that made use of quite a bit of - mostly or even entirely - publicly available Wikipedia data)

Thu, May 31, 7:48 PM · Product-Analytics
Tbayer created T196100: Traffic and content data for SDG impact report.
Thu, May 31, 7:38 PM · Product-Analytics
Tbayer created T196091: Investigate recent metamorphosis of unreferred views into internally referred views on mobile web.
Thu, May 31, 6:10 PM · Product-Analytics

Tue, May 29

Tbayer added a comment to T195819: Pageviews-daily broken after move from Pivot to Turnilo.

This is clearly not right, segments have been loaded and nothing changed. Moreover, from the title (that I didn't pay attention to before) says that it was used to work with pivot but not turnilo, so definitely something that has been happening for a while (not started today).

I'm actually not 100% certain when I last saw it working (I do recall checking some pageview date in recent days since the switchover, but probably only used the hourly version).

Tue, May 29, 12:44 PM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics
Tbayer added a comment to T176023: Implement IE7 correction for long-term trend charts.

For the record, below is an example of the queries I have been using for this. This was based on the detailed analysis in https://phabricator.wikimedia.org/T157404 (for Pakistan - task set to private because the examination involved looking at some IP information), while including two other countries - Iran and Afghanistan - that showed a similarly anomalous pattern of IE7 views widely surpassing those from newer IE versions.

Tue, May 29, 10:12 AM · Product-Analytics, Reading-analysis
Tbayer added a comment to T193578: Assess impact of ua-parser update on core metrics.

Related to question #3 in the task description, I noticed that the number of IE7 pageviews has dropped at lot from May 21 to May 22; it seems these are now counted as (mainly) IE11 (and some as IE8 and IE9):

Tue, May 29, 9:59 AM · Reading-analysis, Product-Analytics, Analytics
Tbayer added a parent task for T195819: Pageviews-daily broken after move from Pivot to Turnilo: T194427: Deploy Turnilo (possible pivot replacement).
Tue, May 29, 9:36 AM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics
Tbayer added a subtask for T194427: Deploy Turnilo (possible pivot replacement): T195819: Pageviews-daily broken after move from Pivot to Turnilo.
Tue, May 29, 9:36 AM · Patch-For-Review, Analytics-Kanban, Analytics
Tbayer created T195819: Pageviews-daily broken after move from Pivot to Turnilo.
Tue, May 29, 9:35 AM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics
Tbayer updated the task description for T184677: Measure impact of Singapore data center on Wikimedia usage.
Tue, May 29, 9:16 AM · Product-Analytics, Discovery-Analysis (Current work)
Tbayer added a comment to T184677: Measure impact of Singapore data center on Wikimedia usage.

@chelsyx
A possible way to go about this is to look at is at mean of "daily pageviews per device per country for desktop and mobile" to see if there is a meaningful difference there. This would be a simple calculation to get started. Since in Japan the changes were significant (1.4 to 1.2 secs of median load times per blogpost) it seems that looking at Japan first might make sense.

Japan is on the list, but the Performance team had found larger changes in some of the other countries listed in the task description.

You could calculate: "daily-user-pageviews-for-jp.wikipedia.org-in-Japan-in-desktop" divided by "daily-unique-devices-in-Japan-in-jp.wikipedia.org" and get a timeseries for that would have 1 point per day. If effect of datacenter is significant I would expect to see a hiccup on that timeseries after the datacenter launch, meaning that there are "longer sessions".

Thanks for the suggestion! But per the task description, we are already examining the numerator and denominator separately. I don't expect that this quotient (views / devices) would yield much additional insight. Or to put it differently: Unless the switchover caused a decrease in the number of unique devices for some reason, these "longer sessions" would already be reflected in the pageview metric.

Tue, May 29, 9:14 AM · Product-Analytics, Discovery-Analysis (Current work)
Tbayer added a comment to T167005: Update per-domain uniques fresh-sessions computation.

@JAllemandou Did the "about 10% of the offset" estimate in the task description refer to the daily metric?
For the monthly unique devices, the impact may have been much larger (looking at the total uniques_estimate - haven't examined the offset part separately yet):

Tue, May 29, 8:57 AM · Patch-For-Review, Analytics-Kanban

Fri, May 25

Tbayer added a comment to T195520: Multiple projects reporting Cannot access the database: No working replica DB server.

Incident report (in progress): https://wikitech.wikimedia.org/wiki/Incident_documentation/20180524-wikidata

Fri, May 25, 5:22 AM · User-Addshore, Wikidata-Campsite, MW-1.32-release-notes (WMF-deploy-2018-05-29 (1.32.0-wmf.6)), Wikidata-Ministry-Of-Magic, Wikimedia-Incident, Wikidata, Patch-For-Review, Wikimedia-General-or-Unknown, Wikimedia-log-errors, Operations

May 19 2018

Tbayer updated the task description for T191036: Contributing from the mobile web: workflows that editors do.
May 19 2018, 2:18 PM · Readers-Web-Backlog (Tracking), Wikimedia-Hackathon-2018
Tbayer added a comment to T186728: Record and aggregate page previews.

@Tbayer

I understand your position now, thanks.

I believe it is natural that there's a tension between your team (speaking more about data analysts here) and ours, because your team is in several occasions dependent on ours, and our team does not have the bandwidth that you guys would like, considering that there are other demands from other teams and from the natural growth of the data and tools we're maintaining. On the other hand, we try to be a steward of the privacy policy and data retention guidelines, which in practice end up hindering the flexibility of your data and thus adding to your work when performing data analysis. So yea, I understand your frustration.

I am not sure what privacy and bandwidth limitations had to do with this regrettable communications issue. Sure, those are necessary and sometimes difficult topics to discuss. But any questions related to these two had already been resolved at that point - we had agreed what data would be stored in the aggregate table and who would implement the aggregation (again, thanks for your work on this!). Rather, the frustration on my side was about things like people making strong but erroneous statements about what our team's data needs supposedly were, and what this task supposedly consisted in, instead of simply acknowledging and fixing the clear oversight that had been pointed out. And on your side, I understand that much of the frustration was about seeing yourself accused of deliberately diverging from the task as written when you implemented the aggregation. Again, that was not my intention and I had thought that this had been clear in T186728#4170881 , but perhaps there is something I missed, and I'm interested in what could have done to avoid that misunderstanding.

Please understand, though, that we Analytics are trying our best on our side as well. And that in the view of this situation where conflicts already exist between some of us, aggressive comments do usually not help reach results, but rather make the conflicts bigger.

May 19 2018, 12:59 PM · MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), Patch-For-Review, Analytics-Kanban
Tbayer added a comment to T194555: Proposal for excluding desktop downloads from the iOS app download number we are reporting.

Thanks (belatedly) for solving this mystery by following up with App Annie and doing further research on this! BTW this is also consistent with the earlier observation that these spikes have always been confined to a single country and a single day, and appear to increase the baseline by a round number like 10,000.

May 19 2018, 11:24 AM · Product-Analytics

May 18 2018

Tbayer created T194961: Count link previews on the Android app .
May 18 2018, 3:31 PM · Analytics, Android-app-Bugs, Wikipedia-Android-App-Backlog

May 16 2018

Tbayer added a comment to T192305: Index and store page preview agreggates on Druid so they are visible in pivot/superset.

Thanks! Already looked at it in Superset with the web team yesterday.

May 16 2018, 11:12 AM · Patch-For-Review, Analytics, Analytics-Kanban

May 11 2018

Tbayer added a comment to T186728: Record and aggregate page previews.

Thanks @mforns, also for keeping the existing data up earlier while the fix was implemented (I was able to use it for our quarterly check-in deck this week). Will take a look at the new version soon.

May 11 2018, 8:40 PM · MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), Patch-For-Review, Analytics-Kanban
Tbayer added a comment to T178174: Remove AppInstallIId from EventLogging purging white-list.

(Since March, this conversation has been continuing elsewhere, mainly with @JMinor on the Readers team's side. Since it appeared not all participants were seeing benefits of being able to whitelist this field in appropriate cases, I'm posting here a sketch of use cases that Josh drafted earlier with my support:)

May 11 2018, 3:20 PM · Analytics-EventLogging, Patch-For-Review, Analytics-Kanban
Tbayer added a comment to T184096: Wikidata editing behaviour on Android app.

The calculation kept failing because - I think - of time limits on PAWS; the revert analysis is a bit computationally expensive and took 8 hours when it finally completed successfully; next time we run this one probably needs to invest some time to split it into several shorter timespans.

Have you considered using SWAP instead? I've never run into resource limitations like those, probably because the service is optimized for a small number of heavy users rather than a large number of light to moderate users.

That may be a good idea, although the limiting factor here is not database load or computations in Python but the large number of web API accesses. (I have been using the API version of mwreverts because last year I couldn't get the database version to work on PAWS, even with Aaron's support - filed at https://github.com/mediawiki-utilities/python-mwreverts/issues/8 .)

May 11 2018, 12:22 PM · Contributors-Analysis, Reading-analysis, Product-Analytics

May 10 2018

Tbayer added a comment to T172137: Estimate both beta and production downloads of Zim files.

@Fjalapeno Do you need any additional data from us here?

May 10 2018, 8:20 PM · Product-Analytics, Reading-Infrastructure-Team-Backlog, Reading-analysis, Wikipedia-Android-App-Backlog, Android-app-feature-Compilations
Tbayer moved T169550: Final Vetting of Family Wide unique devices data from Triage to Doing on the Product-Analytics board.
May 10 2018, 8:09 PM · Product-Analytics, Reading-analysis, Analytics-Kanban
Tbayer added a comment to T184096: Wikidata editing behaviour on Android app.

Clarified the scope of this task per @Charlotte, and split off the question about the volume of general edits into T194424.
I also updated my analysis of reverts for description edits for the last few months, addressing question 2 in this task. (This consisted just of re-running the existing PAWS notebook and thus wasn't much work at all, but still took a while as the calculation kept failing because - I think - of time limits on PAWS; the revert analysis is a bit computationally expensive and took 8 hours when it finally completed successfully; next time we run this one probably needs to invest some time to split it into several shorter timespans).

May 10 2018, 8:01 PM · Contributors-Analysis, Reading-analysis, Product-Analytics
Tbayer created T194424: Provide separate edit tags for Android and iOS apps.
May 10 2018, 7:55 PM · Reading-Infrastructure-Team-Backlog (Kanban), Product-Analytics
Tbayer updated subscribers of T193578: Assess impact of ua-parser update on core metrics.

Super interesting findings, thanks @fdans! CCing @chelsyx regarding the implications for iOS.

May 10 2018, 5:26 PM · Reading-analysis, Product-Analytics, Analytics
Tbayer renamed T184096: Wikidata editing behaviour on Android app from Editing behaviour on Android app to Wikidata editing behaviour on Android app.
May 10 2018, 3:50 AM · Contributors-Analysis, Reading-analysis, Product-Analytics

May 9 2018

Tbayer updated subscribers of T193912: Change URL for Wikimedia Blog when new Wikimedia Foundation website launches.

@Varnent Mel and I chatted quick—we're fine with either approach as long as it works on a technical level. I'm personally a little worried about the link to the home page from the logo in the top left, but that could be disabled in the current blog's code as part of this process.

Or we directly change it from the old URL to the new one ( https://blog.wikimedia.org/ to https://wikimediafoundation.org/news/ )

@Tbayer do you have an idea of the amount of work that would be required to add a banner like WMDE's?

I imagine this is a fairly harmless change to the theme, but @Volker_E should be able to provide more of an expert answer. And IMHO we should do this anyway (i.e. even if the existing blog is moved to a newly created "archive" domain, which, as mentioned above, I would recommend against).

May 9 2018, 10:22 PM · WMF-Blog-Social-Team, WMF-Communications, Wikimedia-Blog
Tbayer added a comment to T180336: PAWS public-link is lowercase but the paws-public server is case-sensitive.

Thanks for your work on this! (speaking as one of the victims, as user HaeB != haeb)
It seems that your upstream pull request has since gotten merged?

May 9 2018, 10:01 PM · Upstream, PAWS
Tbayer added a comment to T193578: Assess impact of ua-parser update on core metrics.

..

mmm.. the research for this metric about bots was plentiful and much of the "user" marked traffic is excluding from counting, the bulk the work we did on how did we excluded bot traffic wrongly labeled as user is explained here:
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageviews/Bots_Research

Yes, I'm familiar with that page. As far as I can see, it doesn't report any results about how many non-nocookie requests might be coming from undetected bots. And in any case, this was 2015 and it's 2018 now.
Apropos, I see that under "worklog" there you mentioned "[then-] recent updates to bot regex that affect this data, pour regex catches more bots via user agent (quite a bit more)", underlining the importance of the present task in general.

May 9 2018, 6:23 AM · Reading-analysis, Product-Analytics, Analytics

May 8 2018

Tbayer updated the task description for T184677: Measure impact of Singapore data center on Wikimedia usage.
May 8 2018, 10:06 PM · Product-Analytics, Discovery-Analysis (Current work)
Tbayer updated the task description for T184677: Measure impact of Singapore data center on Wikimedia usage.
May 8 2018, 9:30 PM · Product-Analytics, Discovery-Analysis (Current work)
Tbayer added a comment to T191429: Estimate impact of Facebook's article context feature.

Great work, @MNeisler! Some additional remarks inline.

Here are the updated daily facebook referred pageviews based on data through April 25th.

There are no significant changes following the full rollout of the article context feature on April 3, 2018. In addition, any potential effects from Facebook's article context feature appear to be too small to determine from the overall number of Facebook referrals.

Looking just at the daily pageviews to a set of some news media related articles more clearly display the effect of the article context feature. The plot below includes a selection 9 new-media related articles including all those found in the top 30 Facebook referred pages the week following April 4th.

A week prior to the feature rollout (March 26 - April 3rd), there was an average of about 3 daily pageviews to these pages with a Facebook referrer. There is a significant increase of daily pageviews on April 4th; however, pageviews quickly decline after April 4th to an average of around 40 pageviews between April 11 and April 18th. Some of the higher pageviews around April 4th seen for Breitbart and potentially other sources may be the result of Facebook posts linking to the article directly in the context of news related coverage released that day.

To add, for the record: The Breitbart-related coverage and social media attention which likely is a confounding factor here is summarized e.g. in https://www.haaretz.com/us-news/.premium-breitbart-declares-war-on-wikipedia-in-facebook-s-fight-against-fake-news-1.5991915 (paywalled, but may be accessible via Google).

Potential next steps if needed could include trying to expand the set of new articles pages pending a complete list from Facebook or by using Wikipedia categories

For the record: we have been thinking about using https://en.wikipedia.org/wiki/Category:Media_in_the_United_States

but overall it appears that the article context feature had a very small effect on Facebook referred pageviews. The excerpt displayed by Facebook in the article context feature is large so it's possible that many Facebook users who access this feature do not clickthrough to the Wikipedia articles.

To clarify: by now we can safely assert that the effect on overall Facebook-referred pageviews is very small. That said, we still have options for exploring question 2 ("Estimate the number of additional daily pageviews resulting from the feature") further. From F17344233, it appears that we could state a lower bound of about 400 daily pageviews for this just based on this fairly small sample of 9 articles, which we might be able to increase a lot when including the long tail of all articles in the aforementioned category. Focusing on the top-referred news media articles first was a great initial approach, but (besides the shape of the chart) the fact that the article about a comparatively small website like the Daily Wire surpassed those for e.g. the NYT or Fox News is another indicator that this was indeed dominated by the controversies/attention generated by Facebook's announcement itself, rather then the feature per se.

May 8 2018, 9:08 PM · Product-Analytics, Reading-analysis
Tbayer added a comment to T184677: Measure impact of Singapore data center on Wikimedia usage.

Singapore itself, for non-sensical reasons related to the wild world of network peering, doesn't tend to be our best comparison point anyways, even though it's the first one we turned on.

May 8 2018, 6:47 PM · Product-Analytics, Discovery-Analysis (Current work)
Tbayer added a comment to T192779: Include HTTP Referer header when navigating through internal links.

Great! Adding Analytics so that the Analytics Engineering team is aware and can double-check that this works for ingestion in the referer field of the webrequest table.

May 8 2018, 7:49 AM · Analytics, Patch-For-Review, Wikipedia-Android-App-Backlog (Android-app-release-v2.7.24x-I-Ice-lolly)
Tbayer added a project to T192779: Include HTTP Referer header when navigating through internal links: Analytics.
May 8 2018, 7:48 AM · Analytics, Patch-For-Review, Wikipedia-Android-App-Backlog (Android-app-release-v2.7.24x-I-Ice-lolly)

May 7 2018

Tbayer added a comment to T193912: Change URL for Wikimedia Blog when new Wikimedia Foundation website launches.

For communicating the dormancy, I think one might want to add a little note to the theme in either case (a bit like what WMDE does on their old website, see the orange note on top of https://wikimedia.de/wiki/Hauptseite ); and on the other hand, all blog posts carry their publication date in the URL anyway.

May 7 2018, 10:55 PM · WMF-Blog-Social-Team, WMF-Communications, Wikimedia-Blog
Tbayer closed T180825: Investigate increase in pageviews with Android app v190 as Resolved.

Congratulations, Sherlock! ;)
To add: From our conversation on Friday, I also understand that the new behavior is now a bit closer to how a web browser would handle it, i.e. the app views are now more comparable to the pageviews we are registering on the web.
For our core metrics reporting for Q3, I think the takeaway is that the year-over-year comparison is still not valid yet until the next quarter, but perhaps we can limit it to March 2017 vs. March 2018 - CC @mpopov.

May 7 2018, 10:42 PM · Product-Analytics, Reading-analysis, Wikipedia-Android-App-Backlog, Android-app-Bugs
Tbayer added a comment to T184677: Measure impact of Singapore data center on Wikimedia usage.

This has been live in (I understand) all the planned countries for several weeks now, so we should have enough traffic data for a before vs. after comparison; also, the Performance team has published their data on the immediate speed changes they have been measuring. @MNeisler is going to take on this task; we should meet soon and discuss the approach in detail.

May 7 2018, 10:25 PM · Product-Analytics, Discovery-Analysis (Current work)

May 6 2018

Tbayer added a project to T181878: techblog.wikimedia.org should redirect to blog.wikimedia.org/c/technology: Wikimedia-Blog.
May 6 2018, 10:57 PM · Wikimedia-Blog, Wikimedia-Apache-configuration, Patch-For-Review
Tbayer added a comment to T193912: Change URL for Wikimedia Blog when new Wikimedia Foundation website launches.

Why not simply leave the existing post URLs (like https://blog.wikimedia.org/2018/05/03/why-i-women-wikipedia/ ) intact and just redirect the blog's main page https://blog.wikimedia.org/ to https://wikimediafoundation.org/news/ ? That might save quite a bit of work and help avoid unforeseen technical complications. What is the rationale for creating and maintaining a new domain like blogarchives.wikimedia.org ?

May 6 2018, 10:08 PM · WMF-Blog-Social-Team, WMF-Communications, Wikimedia-Blog

May 4 2018

Tbayer moved T181297: Instrument print to PDF button from Next Up to Doing on the Product-Analytics board.
May 4 2018, 5:29 PM · Product-Analytics, New-Readers, Readers-Web-Backlog (Tracking), Reading-analysis, MW-1.31-release-notes (WMF-deploy-2018-01-16 (1.31.0-wmf.17)), Patch-For-Review, Proton

May 3 2018

Tbayer raised the priority of T190601: Update Audiences page and Key Product Metrics with April 2018 Readers data from Normal to High.
May 3 2018, 8:14 PM · Product-Analytics, Reading-analysis
Tbayer moved T190601: Update Audiences page and Key Product Metrics with April 2018 Readers data from Triage to Next Up on the Product-Analytics board.
May 3 2018, 8:14 PM · Product-Analytics, Reading-analysis
Tbayer moved T190601: Update Audiences page and Key Product Metrics with April 2018 Readers data from Blocked to Triage on the Product-Analytics board.
May 3 2018, 8:03 PM · Product-Analytics, Reading-analysis
Tbayer added a comment to T191859: [EPIC] Reading List Sync service analytics.

The reason to hash app_install_id is because these events would end up somewhere where we would be able to join with behavioral data sent by mobile apps, which we DON'T want

To clarify just in case, it's fine to log app_install_id in connection with user actions, it has been done in many different schemas for years. And "behavioral data" would seem to describe this data here too.
So I guess the "don't want" here refers to connecting users IDs with those other schemas via the app install ID, right? (in which case, fully agreed, although it seems we had been trying to prevent that with Method 1 or Method 2 anyway)

May 3 2018, 12:26 PM · Product-Analytics, Analytics, Privacy, Reading-Infrastructure-Team-Backlog, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, Reading List Service
Tbayer added a comment to T191859: [EPIC] Reading List Sync service analytics.

Method 1 has the disadvantage that we would be able to find out username given crossDeviceID, which is not the case for Method 2.

How is that not the case for Method 2?

Good question! I'm no expert in cryptography but as far as I've been able to tell it is impossible to reverse a good hash function. Any attacker would basically need to make their own mapping table of usernames to hashes

Yes, it is considered impossible for practical purposes to come up with a source value when given the hash value alone (assuming that we choose a well-established hash function whose security has been widely vetted).
But in situations where one has the additional information that the hash can only come from a fairly limited set of source values, this is no longer true. That is well known and for example forms the reason why passwords hashes are always stored with a salt. The situation here is even worse - the list of existing users is public and fairly small (<200 million accounts across all WMF wikis, much less when applying some easy heuristics, e.g. limiting to recently active users).

even if they knew exactly which hashing function was used.

Which they would, considering that our code is open source ;)

May 3 2018, 12:05 PM · Product-Analytics, Analytics, Privacy, Reading-Infrastructure-Team-Backlog, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, Reading List Service

May 2 2018

Tbayer updated subscribers of T189230: Update UA parser .

Thanks for working on this! As @Nuria points out in the task description, it looks important for data quality to keep this updated. Back in 2015 (T106134), @dr0ptp4kt suggested to automate these updates. Is that possible?

May 2 2018, 3:25 AM · Patch-For-Review, Analytics-Kanban, Analytics
Tbayer added a comment to T92457: PageImages not compatible with webm files.

A default thumbtime for the File: page can't currently be set,

I think that would be the perfect solution here.

May 2 2018, 3:04 AM · Readers-Web-Backlog, Page-Previews, PageImages
Tbayer updated subscribers of T191888: Change page previews configuration for opt-in accounts.

I agree that option b) is obviously preferable (because it simplifies things in the future).

May 2 2018, 2:50 AM · MW-1.32-release-notes (WMF-deploy-2018-06-12 (1.32.0-wmf.8)), Patch-For-Review, Readers-Web-Kanbanana-Board, Readers-Web-Backlog, Page-Previews