Page MenuHomePhabricator
Feed Advanced Search

Mar 12 2019

Tbayer claimed T217842: [Research] Get data on page actions usage.
Mar 12 2019, 7:28 PM · Web-Team-Backlog, Product-Analytics, Desktop Improvements (Vector 2022)

Mar 11 2019

Tbayer updated the task description for T198218: Generate list of most used special pages.
Mar 11 2019, 7:07 PM · Chinese-Sites, Advanced Mobile Contributions, Reading-analysis, Product-Analytics, Web-Team-Backlog (Tracking)
Tbayer added a comment to T215477: Tag Thanks actions with AMC tag.

Leaving this here since we were talking about it earlier today in this context:

Mar 11 2019, 6:31 PM · Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q4), Product-QA (RW-Test-Cases), MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Product-Analytics, Advanced Mobile Contributions, Thanks, Growth-Team
Tbayer updated the task description for T198218: Generate list of most used special pages.
Mar 11 2019, 5:15 PM · Chinese-Sites, Advanced Mobile Contributions, Reading-analysis, Product-Analytics, Web-Team-Backlog (Tracking)
Tbayer added a comment to T217851: [SPIKE] Check MobileWebMainMenuClickTracking schema is still functional and determine how we can use it for AMC.

[...]

@Jdlrobson - a question here. Are we able to add anything outside the menu to the schema (for example the history button, things that will go into the overlay menu, etc)?

Mar 11 2019, 4:23 PM · Spike, Web-Team-Backlog, Advanced Mobile Contributions
Tbayer added a comment to T217851: [SPIKE] Check MobileWebMainMenuClickTracking schema is still functional and determine how we can use it for AMC.

I guess we should have been more specific in the task description - "functionaI" is not quite synonymous to "sending data" ;) (I had already run basically the same query beforehand...) But presumably the above also means that the underlying instrumentation code looks solid on first glance?

Mar 11 2019, 4:17 PM · Spike, Web-Team-Backlog, Advanced Mobile Contributions

Mar 9 2019

Tbayer added a comment to T181195: Add a share button to the mobile site (starting by enabling in beta).

I have updated T207280 with more concrete proposals on instrumentation and metrics based on the above discussion.

Mar 9 2019, 6:58 AM · Web-Team-Backlog (Design), MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26))
Tbayer updated the task description for T207280: Track share button usage.
Mar 9 2019, 6:45 AM · Patch-For-Review, Web-Team-Backlog (Tracking), User-Jdlrobson, MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), MinervaNeue
Tbayer added a comment to T207280: Track share button usage.

I started to draft a schema at https://meta.wikimedia.org/wiki/Schema:MobileWebShareButton , per the discussion at T181195#4982077 ff. It is partly modeled after the Print schema, e.g. includes an event for when the button is shown (triggered during a normal page load) so we can calculate the button's clickthrough rate directly, which is a more useful success metric than the absolute number of clicks.

Mar 9 2019, 3:21 AM · Patch-For-Review, Web-Team-Backlog (Tracking), User-Jdlrobson, MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), MinervaNeue
Tbayer added a comment to T207280: Track share button usage.

...

Relatedly, we could query the wmf.webrequest table for requests with a specific query parameter rather than include the code that I wrote in T207280#4676470.

Pro: We wouldn't have to write any server-side code like in my comment above.
Con: We wouldn't be able to plot a graph in Grafana without any server-side code.

Both this solution and the solution in my comment above would both be impossible if we were to use hash fragments as they aren't sent to the server.

BTW, one can actually get a graph in Turnilo when using a wprov query parameter, using the (albeit heavily sampled) webrequest dataset there. See e.g. this example for the existing "Share a link to an article (from lead image toolbar, or link preview)" Android app feature. With the usual benefits ;)

Mar 9 2019, 2:54 AM · Patch-For-Review, Web-Team-Backlog (Tracking), User-Jdlrobson, MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), MinervaNeue

Mar 8 2019

Tbayer created T217926: Import of wmfdata fails while trying to access analytics-store .
Mar 8 2019, 10:24 PM · Wmfdata-Python, Product-Analytics

Mar 7 2019

Tbayer added a comment to T214180: Add informational links to AMC opt-in toggle.

@Tbayer is there an official way to reserve wprov value? What do we have to do, to introduce a wprov value for AMC shares? I hope it's just editing the wiki page and reserving a new value.

Mar 7 2019, 9:27 PM · Product-QA (RW-Test-Cases), MW-1.33-notes (1.33.0-wmf.16; 2019-02-05), Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q3), Advanced Mobile Contributions
Tbayer added a comment to T214180: Add informational links to AMC opt-in toggle.

@nray - yes, that is correct, and for now lets keep it that way. I know that there are some ideas to remove the mobile domain (I have no idea if this is going to happen)
@alexhollender - we can add some param to the the URL, like https://www.mediawiki.org/wiki/Special:MyLanguage/Talk:Reading/Web/Advanced_mobile_contributions?amc_source=optin (I wouldn't worry about caching as this is served to logged-in users only).

The official, cache-kosher way to do this is to use a wprov parameter, which also makes querying the data slightly easier. But as you note, cache fragmentation shouldn't be a big issue here, also because it's just about two pages ;)

Then with we would have to query hive to get the number of page views with that GET amc_source param. Difficulty - easy to medium. /cc @Tbayer

Mar 7 2019, 8:20 PM · Product-QA (RW-Test-Cases), MW-1.33-notes (1.33.0-wmf.16; 2019-02-05), Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q3), Advanced Mobile Contributions
Tbayer added a comment to T217851: [SPIKE] Check MobileWebMainMenuClickTracking schema is still functional and determine how we can use it for AMC.

And once we have confirmed that the schema is working for this purpose, we need to extend it with a field indicating whether the user has enabled AMC (should probably be a separate task).

Mar 7 2019, 6:05 PM · Spike, Web-Team-Backlog, Advanced Mobile Contributions
Tbayer added a comment to T217851: [SPIKE] Check MobileWebMainMenuClickTracking schema is still functional and determine how we can use it for AMC.

We should also document the sampling method and ratio on the talk page. (Apparently InitialiseSettings.php sets a default value of 0.5, but that's obviously not the actual rate.)

Mar 7 2019, 5:59 PM · Spike, Web-Team-Backlog, Advanced Mobile Contributions
Tbayer updated the task description for T217851: [SPIKE] Check MobileWebMainMenuClickTracking schema is still functional and determine how we can use it for AMC.
Mar 7 2019, 5:55 PM · Spike, Web-Team-Backlog, Advanced Mobile Contributions
Tbayer added a comment to T214998: RFC: Remove .m. subdomain, serve mobile and desktop variants through the same URL.

It complicates SEO in the sense that, when I wrote this task, I was looking at Google Search Console for a few of our domains with an eye towards SEO for sister projects, and found that mobile traffic was split 50/50 between the dashboards for the m and non-m subdomains. So it was hard to draw any conclusions without manually aggregating the data.

Mar 7 2019, 9:03 AM · Web-Team-Backlog, Traffic-Icebox, MobileFrontend (Tracking), TechCom-RFC, SRE
Tbayer added a comment to T214998: RFC: Remove .m. subdomain, serve mobile and desktop variants through the same URL.

[..] This feels like an RFC to me. [..]

I've put it in our inbox to discuss this (or next) week, to figure out if it needs an RFC, and if not, we'll suggest an alternate facilitator to help solve the use cases and problems described in this task.

Mar 7 2019, 8:40 AM · Web-Team-Backlog, Traffic-Icebox, MobileFrontend (Tracking), TechCom-RFC, SRE

Mar 6 2019

Tbayer assigned T211842: Update Audiences page and Key Product Metrics deck with January 2019 Readers data to mpopov.
Mar 6 2019, 11:22 PM · Product-Analytics
Tbayer added a comment to T201339: Cannot access user contributions when following red link to user page on mobile.

...

  • We won't be implementing this server side as part of this ticket.
    • The reasons for frontend changes only (care of @Jdlrobson):
      • Context is important. If a user is sharing a URL in IRC or elsewhere it's probably meant to contain action=edit.

(I understand you mean "it's probably meant for the 'edit page' use case, not for the 'access contributions and other user-centered tools' use case".)

Mar 6 2019, 9:29 PM · Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q4), MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Advanced Mobile Contributions, MobileFrontend
Tbayer updated the task description for T201339: Cannot access user contributions when following red link to user page on mobile.
Mar 6 2019, 12:39 AM · Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q4), MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Advanced Mobile Contributions, MobileFrontend

Mar 5 2019

Tbayer updated subscribers of T215675: Provide mechanism to allow dynamically tag log entries.
Mar 5 2019, 9:06 PM · Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q4), MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), MediaWiki-Logevents, Advanced Mobile Contributions, MobileFrontend
Tbayer added a comment to T203498: Upgrade Hive to ≥ 2.0.
In T203498#5001047, @Neil_P._Quinn_WMF wrote:

It looks like CDH 6.1, which includes Hive 2.1.1, was released in December.

@elukey, what's the current thinking about deploying this? I'm sure there are many complexities: going from CDH 5 to 6 sounds like a tricky upgrade, I know there's been discussion of switching from CDH to Hortonworks or BigTop, and I've heard the larger plan is to move away from Hive and towards Presto anyway.

Is that indeed the plan? (The linked page doesn't mention whether we intend to actually abandon Hive altogether, or just to add Presto as an alternative for certain use cases like the Public Data Lake.)
If yes, what is the anticipated timeframe for this? Depending on how long we are still going to use Hive, an upgrade would still seem worthwhile.

Mar 5 2019, 8:02 PM · Product-Analytics, Analytics-Clusters
Tbayer awarded T217619: Publishing html files generated on notebook hosts a Mountain of Wealth token.
Mar 5 2019, 2:22 AM · Patch-For-Review, Analytics-Kanban, Product-Analytics, Data-Engineering-Jupyter, Analytics

Mar 4 2019

Tbayer added a comment to T217438: Requesting access to stat1007 for sukhe.

@Tbayer what kind of access is needed? I guess analytics-privatedata-users but just want to be sure :)

Per https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Host_access_granted that looks correct, but @Nuria / Analytics Engineering should be able to confirm for sure.

Mar 4 2019, 7:46 PM · SRE, SRE-Access-Requests

Mar 3 2019

Tbayer added a comment to T217438: Requesting access to stat1007 for sukhe.

See also https://wikitech.wikimedia.org/wiki/Analytics/Data_access

Mar 3 2019, 5:04 AM · SRE, SRE-Access-Requests

Mar 2 2019

Tbayer added a comment to T207280: Track share button usage.

(Update: we discussed this earlier this week and decided that a simple EventLogging schema would be useful. I'm going to create one, but there are still some relevant questions open in the other task, see T181195#4982077 ff.)

Mar 2 2019, 1:37 AM · Patch-For-Review, Web-Team-Backlog (Tracking), User-Jdlrobson, MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), MinervaNeue
Tbayer added a comment to T181195: Add a share button to the mobile site (starting by enabling in beta).

It seems that the Web Share API also supports a "text" field to accompany the shared link. Do we want to make use of this or just pass the naked link (plus maybe page title) for now?
See also T56829 and various other tasks linked there.

Mar 2 2019, 1:31 AM · Web-Team-Backlog (Design), MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26))
Tbayer added a comment to T181195: Add a share button to the mobile site (starting by enabling in beta).
In T181195#4982077, @alexhollender wrote:

@pmiazga here is a first stab at filling out the Beta feature "template". We can discuss and iterate when we meet.

  1. Feature rationale: we believe that by adding a share button to the article page on mobile web people will share Wikipedia articles more often, ultimately leading to more knowledge being spread and consumed.
  2. What we want to learn: we want to learn if people will discover and use this feature. We know that there is a share functionality built into most browsers, however our hypothesis is that since our share button will be more front-and-center, it will help make sharing top of mind for users, as well as facilitate an easier sharing process.
  3. Instrumentation: [TBD] T207280
  4. Success/failure criteria: we will be tracking the number of taps to the share button. We will control for experimental/exploratory taps.

This seems a good idea but might be a bit of a challenge regarding the instrumentation (see above).

If the number of taps is above __ we will consider the feature successful.

I guess this means the number of taps per pageview? And what might be a good way to get a baseline/benchmark (to fill in the __)?

  1. Duration: we think that four months worth of data will give us a sufficient understanding of how the feature is being used.

How did we arrive at that number?

Mar 2 2019, 1:25 AM · Web-Team-Backlog (Design), MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26))
Tbayer added a comment to T181195: Add a share button to the mobile site (starting by enabling in beta).

can we instrument it in a way so that we can find out if they followed-through with sharing a the link?

yes, that's possible, we need to add some argument to the shared url, something like ....?share_btn, then we will be able to verify how many users are visiting Wikipedia via share links.

But sharing the link and clicking on the shared link are two different actions, taken by (usually) two different users.

Mar 2 2019, 1:18 AM · Web-Team-Backlog (Design), MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26))

Feb 27 2019

Tbayer added a comment to T216096: Whitelist sample flags and page/rev ID fields for ReadingDepth schema.

@Tbayer event_sanitized.readingdepth is backfilled using the new whitelist.
I have vetted the resulting data and it looks good to me, but please do a quick check.

I spot-checked by comparing the daily number of events with each sample field set between the sanitized and unsanitized version, and they matched. Thanks again!

Feb 27 2019, 10:17 PM · Analytics-Radar, Reading Depth, Analytics-Kanban, Web-Team-Backlog (Tracking), Product-Analytics
Tbayer updated the task description for T215976: Data Dictionary for Core Metrics.
Feb 27 2019, 2:15 AM · Product-Analytics, Better Use Of Data
Tbayer added a comment to T214134: YoY mobile edit rates in mobile-heavy wiki segment.

The target in the annual plan was:
"Number of edits on mobile increases by 20% year-over-year in target languages. "

We should be aware that this is from the org-wide annual plan, not from the more specific Audiences annual plan which gives 10% as target instead - e.g. "Mobile web edit rate on target wikis: 10% increase", similarly for the other three related team goals. Also, "target wikis" are interpreted very differently there, see e.g. T210660 for the web team's version; I believe the apps teams may not be using the mobile-heavy segment for this either. Where did we document the decision to interpret "target wikis" in the org-wide annual plan as mobile-heavy wikis? This would also be useful input for T215976. (Back in July, there had been a sense that these are two different things - will also follow up on an email thread from back then.)

Feb 27 2019, 1:59 AM · Product-Analytics
Tbayer updated the task description for T215976: Data Dictionary for Core Metrics.
Feb 27 2019, 1:12 AM · Product-Analytics, Better Use Of Data
Tbayer updated subscribers of T213488: Superset's rolling average feature results in error message.

...

Also @HaeB, do you have an example of dashboard that I can use to trigger this issue? I tried today and failed to reproduce :(

Actually it works for me too now, on the same view where I previously encountered this error (result now; compare to the unsmoothened chart I ended up using instead in our presentation last month). Good news! Did anything change since the time that this bug was filed? (I'm seeing "0.26.3-wikimedia2" in https://superset.wikimedia.org/static/assets/version_info.json .)

Feb 27 2019, 12:45 AM · Analytics-Kanban, Product-Analytics, Analytics

Feb 26 2019

Tbayer added a comment to T216628: Update AMC setting name and description text.
In T216628#4986235, @alexhollender wrote:

@ovasileva @Tbayer proposed text:

We are actively developing new features for advanced editors (contributors?). By turning this mode on you will automatically get new features as they are released. Any feedback or collaboration would be appreciated.

Feb 26 2019, 11:10 PM · Product-QA (RW-Test-Cases), Patch-For-Review, Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q3), Advanced Mobile Contributions
Tbayer moved T215477: Tag Thanks actions with AMC tag from Triage to Tracking on the Product-Analytics board.
Feb 26 2019, 6:06 PM · Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q4), Product-QA (RW-Test-Cases), MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Product-Analytics, Advanced Mobile Contributions, Thanks, Growth-Team
Tbayer added a project to T215477: Tag Thanks actions with AMC tag: Product-Analytics.
Feb 26 2019, 6:06 PM · Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q4), Product-QA (RW-Test-Cases), MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Product-Analytics, Advanced Mobile Contributions, Thanks, Growth-Team
Tbayer added a comment to T216628: Update AMC setting name and description text.

Shouldn't this still somehow indicate it's a mobile-only feature that won't affect people's desktop editing experience? (The "Over the next few months..." text that follows doesn't clarify this either. And btw, it seems oddly time-bound - are we going to update it after the rollout starts?)

Feb 26 2019, 5:10 PM · Product-QA (RW-Test-Cases), Patch-For-Review, Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q3), Advanced Mobile Contributions
Tbayer added a project to T217110: Popups schema has the wrong type for popupDelay: Web-Team-Backlog.
Feb 26 2019, 4:58 PM · Web-Team-Backlog, Product-Analytics, Analytics-Kanban, Analytics
Tbayer awarded T216883: Document contributors movement metrics a Cup of Joe token.
Feb 26 2019, 12:47 AM · Contributors-Analysis, Product-Analytics
Tbayer updated the task description for T215976: Data Dictionary for Core Metrics.
Feb 26 2019, 12:46 AM · Product-Analytics, Better Use Of Data

Feb 25 2019

Tbayer added a comment to T214524: LDAP login advice on https://superset.wikimedia.org/ specifies wrong kind of login name.

@Tbayer should be fixed now! Thanks for the report!

Feb 25 2019, 11:08 PM · Patch-For-Review, User-Elukey, Analytics-Kanban, Analytics
Tbayer updated the task description for T215597: QA edit tags for moderation actions.
Feb 25 2019, 10:27 PM · Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q4), Product-QA (RW-Test-Cases), Advanced Mobile Contributions
Tbayer updated the task description for T210660: [EPIC] AMC Metrics .
Feb 25 2019, 10:23 PM · Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q4), Chinese-Sites, Epic, Advanced Mobile Contributions
Tbayer added a comment to T213461: Define moderation actions.

Marked the list as final in the task description, and added some more detail from the recent investigations in e.g. T215597. I also clarified for pending changes that we should not count automatic approvals of pending edits, only manual ones.

Feb 25 2019, 10:16 PM · Advanced Mobile Contributions
Tbayer updated the task description for T213461: Define moderation actions.
Feb 25 2019, 10:14 PM · Advanced Mobile Contributions
Tbayer added a comment to T207280: Track share button usage.

@pmiazga Let's talk about this and avoid rolling our own here - e.g. there is already an existing mechanism that avoids cache fragmentation: https://wikitech.wikimedia.org/wiki/Provenance

Feb 25 2019, 8:14 PM · Patch-For-Review, Web-Team-Backlog (Tracking), User-Jdlrobson, MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), MinervaNeue

Feb 22 2019

Tbayer updated the task description for T215976: Data Dictionary for Core Metrics.
Feb 22 2019, 10:38 PM · Product-Analytics, Better Use Of Data
Tbayer updated the task description for T182314: Analyze results of enwiki and dewiki page previews a/b test.
Feb 22 2019, 9:57 PM · Product-Analytics, Web-Team-Backlog (Tracking), Reading-analysis, Page-Previews
Tbayer closed T182314: Analyze results of enwiki and dewiki page previews a/b test as Resolved.

Per discussion with @ovasileva , I'm closing this task now, and splitting off the investigation of the timing anomalies into T216852. (To recap, while we found out a lot of things about how the underlying timers work and reached a level of confidence that these anomalies should not materially affect the main takeaways for this A/B test, we have not yet found a satisfying explanation for them.)

Feb 22 2019, 9:56 PM · Product-Analytics, Web-Team-Backlog (Tracking), Reading-analysis, Page-Previews
Tbayer closed T182314: Analyze results of enwiki and dewiki page previews a/b test, a subtask of T154635: [EPIC] Deploy page previews to English and German Wikipedia, as Resolved.
Feb 22 2019, 9:56 PM · Readers-Web-Kanbanana-Board-Old, Epic, Wikimedia-Site-requests, Documentation, Page-Previews, Web-Team-Backlog
Tbayer created T216852: [Spike, 8hrs] Explain JS timer artifacts observed in EventLogging data.
Feb 22 2019, 9:53 PM · Spike, Web-Team-Backlog, Product-Analytics
Tbayer closed T174976: Analyze Android feed activity as Resolved.
Feb 22 2019, 8:54 PM · Product-Analytics, Android-app-feature-Feeds, Reading-analysis, Wikipedia-Android-App-Backlog
Tbayer updated subscribers of T174976: Analyze Android feed activity.

Closing this now (I trust @mpopov can handle any followup questions, or may already have done so in other contexts).
For the record, in case they might be of use for future reference, below are some of the results I had shared in form of a SWAP notebook back in October 2017:

Wikipeda Android app: Time spent on the feed.png (507×716 px, 32 KB)

Wikipeda Android app: Time spent on the feed (>= 2 seconds).png (512×716 px, 36 KB)

Feb 22 2019, 8:53 PM · Product-Analytics, Android-app-feature-Feeds, Reading-analysis, Wikipedia-Android-App-Backlog
Tbayer added a comment to T205681: Metrics request on portal namespace usage.

I'm going to set aside some time again on Monday (Feb 25) to wrap this up, including documenting what we now know about the data issue with the referrers (and whether/how it might affect the validity of the results for this request). Let me know in case the needs here have changed in the meantime, or also if anything else occurred to you that should be considered in the analysis.

Feb 22 2019, 7:54 PM · Data-Engineering-Icebox, Analytics-Radar, Product-Analytics
Tbayer added a comment to T186016: Analyze time to first link interaction.

PS: There is a detailed discussion of how various time-related fields in the Popups schema are generated at T182314#3956099 .

Feb 22 2019, 2:23 AM · Product-Analytics, Web-Team-Backlog (Tracking), Page-Previews, Reading-analysis
Tbayer closed T186016: Analyze time to first link interaction as Resolved.

Following up here:

Feb 22 2019, 1:52 AM · Product-Analytics, Web-Team-Backlog (Tracking), Page-Previews, Reading-analysis
Tbayer closed T186016: Analyze time to first link interaction, a subtask of T176211: Page Previews could load less JS on pageload, as Resolved.
Feb 22 2019, 1:52 AM · MW-1.31-release-notes (WMF-deploy-2018-03-20 (1.31.0-wmf.26)), Readers-Web-Kanbanana-Board-Old, Performance-Team (Radar), Web-Team-Backlog, Page-Previews
Tbayer added a comment to T172410: Replace the current multisource analytics-store setup.

Yup, thank you DBA crew! Your work is very much appreciated.

Feb 22 2019, 12:17 AM · Analytics-Radar, Product-Analytics, WMDE-Analytics-Engineering, User-Addshore, User-Elukey, Research

Feb 21 2019

Tbayer added a comment to T216658: Timestamp column in EventLogging tables have incompatible collation.

I haven't looked into it, but the naming of PrefUpdate_5563398_15423246 is unusual. IIRC, tables with an extra suffix are some kind of backup or archive table that exist because of either some migration or bug. I'm sure it has real data in it, but possibly the collation mismatch has something to do with some old data issue or migration?

I guess this comment crossed with T216658#4973658 - see the explanation there. One could for example double-check if any of the changes implemented in T160454 could have affected collation.

Feb 21 2019, 9:20 PM · Analytics, Product-Analytics
Tbayer added a comment to T216658: Timestamp column in EventLogging tables have incompatible collation.

This sucks but we're not likely to work on it, as we're moving away from mysql. We don't want to be mean though, so we can help sqoop this stuff into Hadoop if you need to use your painful workaround too much.

Understood, but do we know the reason for this discrepancy? Does it have to do with the general changes to the event capsule (T179625, see also T179540) that happened inbetween PrefUpdate_5563398_15423246 and PrefUpdate_5563398 ?

PS: actually I think that happened later; rather, the relevant capsule change was T160454 (I also just added that to the documentation).

If it's an issue that affects more than one EL schema, it would be worth documenting it on e.g. https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging , with a link to @nettrom_WMF's workaround.

Feb 21 2019, 8:46 PM · Analytics, Product-Analytics
Tbayer added a comment to T216658: Timestamp column in EventLogging tables have incompatible collation.

This sucks but we're not likely to work on it, as we're moving away from mysql. We don't want to be mean though, so we can help sqoop this stuff into Hadoop if you need to use your painful workaround too much.

Feb 21 2019, 8:32 PM · Analytics, Product-Analytics
Tbayer added a comment to T211197: Build AMC opt-in toggle.

@Edtadros and I worked through testing the 4th 5th acceptance criterion together during our 1:1 today.

Here's are the server-side EventLogging events that I captured while opting in and out of AMC mode on http://reading-web-staging.wmflabs.org/wiki/Special:MobileOptions:

[...]

To the "and compatible" part of the AC: note well the "clientValidated": true in the events above.

Feb 21 2019, 12:05 AM · Product-QA (RW-Test-Cases), MW-1.33-notes (1.33.0-wmf.16; 2019-02-05), Patch-For-Review, Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q3), Advanced Mobile Contributions

Feb 20 2019

Tbayer added a comment to T212516: WikimediaEvents do not track logged in beta users on Special:MobileOptions.

Thanks! Out of paranoia an abundance of caution, I double-checked that the "value" values are consistent with the "isDefault" values, which seems to be the case:

Feb 20 2019, 11:54 PM · Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q3), MW-1.33-notes (1.33.0-wmf.12; 2019-01-08), Advanced Mobile Contributions, MobileFrontend, MediaWiki-extensions-WikimediaEvents
Tbayer updated subscribers of T214093: Modern Event Platform: Schema Guidelines and Conventions.

technically there could be 2 different headers that differ only by _ vs -.

Good thing we are doing this as map type with string keys then. Specific (object/struct) field names should never have '-' in them. :)

Feb 20 2019, 10:41 PM · Product-Data-Infrastructure, Analytics-Kanban, Analytics, Better Use Of Data, Product-Analytics, Goal, Services (watching), MediaWiki-extensions-EventLogging, Event-Platform
Tbayer added a comment to T214093: Modern Event Platform: Schema Guidelines and Conventions.

I don't think the actual date inside the WMF-Last-Access header makes any difference.

I assume the discussion here is confined to the particular CirrusSearch use case, correct?
(The actual date is definitely being used elsewhere currently and could be relevant to various other use cases of MEP.)

Sure, but the context was important. All pages visited will set your WMF-Last-Access data to today. Unless you can guarantee that your event stream will fire on the first possible web request that a user makes in a day the WMF-Last-Access data will be useless outside of the webrequest table. It will simply say "today" (but with a date).

Feb 20 2019, 10:38 PM · Product-Data-Infrastructure, Analytics-Kanban, Analytics, Better Use Of Data, Product-Analytics, Goal, Services (watching), MediaWiki-extensions-EventLogging, Event-Platform
Tbayer added a comment to T214093: Modern Event Platform: Schema Guidelines and Conventions.

I don't think the actual date inside the WMF-Last-Access header makes any difference.

I assume the discussion here is confined to the particular CirrusSearch use case, correct?
(The actual date is definitely being used elsewhere currently and could be relevant to various other use cases of MEP.)

Feb 20 2019, 10:20 PM · Product-Data-Infrastructure, Analytics-Kanban, Analytics, Better Use Of Data, Product-Analytics, Goal, Services (watching), MediaWiki-extensions-EventLogging, Event-Platform
Tbayer updated subscribers of T216297: Develop method for identifying reverts in EventBus data.

If we are trying to track revert rates closer to real time, our current best strategy is querying the API and using the mw-reverts package. However, this isn't very performant.

Indeed, but mwreverts also offers the option to use the (MySQL replica) database instead of the API, which should be much faster.
(The db option did not work on PAWS last time I tried to use it there. I filed https://github.com/mediawiki-utilities/python-mwreverts/issues/8 about this, @Halfak looked a bit into it and said it should work there too in principle, but would need some work fixing.)

Feb 20 2019, 8:23 PM · MW-1.36-notes (1.36.0-wmf.4; 2020-08-11), Patch-For-Review, Platform Engineering, Contributors-Analysis, Product-Analytics
Tbayer added a comment to T211842: Update Audiences page and Key Product Metrics deck with January 2019 Readers data.

o/ @chelsyx @mpopov

Feb 20 2019, 2:47 AM · Product-Analytics
Tbayer updated the task description for T211842: Update Audiences page and Key Product Metrics deck with January 2019 Readers data.
Feb 20 2019, 2:46 AM · Product-Analytics
Tbayer added a project to T212961: Add X-Analytics tag for AMC webrequests: XAnalytics.
Feb 20 2019, 12:54 AM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Product-QA (RW-Test-Cases), Patch-For-Review, XAnalytics, Product-Analytics, Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q3), Advanced Mobile Contributions

Feb 19 2019

Tbayer added a comment to T214444: Update ReadingDepth instrumentation to avoid deprecated schema module (blocks loads event).

I understand this shouldn't have any impact on the logged events and their data (except maybe as consequence of the performance improvement in general), but please flag it in case that assumption turns out to be wrong.

Feb 19 2019, 9:52 PM · MW-1.33-notes (1.33.0-wmf.17; 2019-02-12), Patch-For-Review, Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q3), Performance-Team (Radar)
Tbayer updated the task description for T215616: Improve interlingual links across wikis through Wikidata IDs.
Feb 19 2019, 4:10 AM · Data-Engineering-Icebox, Analytics-Radar, Research-Freezer, MediaWiki-General, Wikidata

Feb 18 2019

Tbayer added a comment to T201123: What % of pages feature issues?.

To wrap this up, I extended the above queries for all Wikipedias (using a PAWS notebook).

Feb 18 2019, 4:43 AM · Product-Analytics, Web-Team-Backlog (Tracking), Reading-analysis, Page-Issue-Warnings
Tbayer added a comment to T193051: Remove all page previews instrumentation code.

Thanks @Tbayer! I had a look now and I am wondering if the data for page previews is still being tracked

  • The Popups schema is no longer collecting data, although you should be able to reactivate it with a simple configuration change, as we haven't yet removed the instrumentation code (this task).
  • The less detailed VirtualPageView schema is still sending data. Its main purpose is to provide aggregated content consumption numbers (how often a given page has been previewed for >1sec) that are stored in the Virtualpageview_hourly table - rather than answering product questions about how the previews feature is being used per se; we used the Popups schema for that.

/ if we still have data from previous tracking endevours that I could look at concerning the questions outlined in T214493?

Yes, there is still data in the usual places where EventLogging data is stored, e.g. the event.popups Hive table. (I guess you may already have looked at the results published at https://www.mediawiki.org/wiki/Page_Previews/2017-18_A/B_Tests and perhaps the further details in the Phab task(s) linked from there.)

Feb 18 2019, 2:15 AM · Page-Previews, Web-Team-Backlog

Feb 17 2019

Tbayer awarded T195030: Develop availability metrics for PAWS a Like token.
Feb 17 2019, 3:04 AM · PAWS (zero-to-jupyterhub-k8s 0.8.0)

Feb 16 2019

Tbayer closed T216257: Most visited domains (pageviews) across all Wikipedia/Wikimedia as Resolved.

Yes, that should be a separate task (and may require involvement from other teams) .

Feb 16 2019, 12:15 AM · Product-Analytics

Feb 15 2019

Tbayer moved T216257: Most visited domains (pageviews) across all Wikipedia/Wikimedia from Triage to Doing on the Product-Analytics board.
Feb 15 2019, 11:52 PM · Product-Analytics
Tbayer moved T215976: Data Dictionary for Core Metrics from Triage to Doing on the Product-Analytics board.
Feb 15 2019, 11:52 PM · Product-Analytics, Better Use Of Data
Tbayer added a comment to T216257: Most visited domains (pageviews) across all Wikipedia/Wikimedia.

Here is a first result: the top 15 by pageviews for January 2019, with known bots/spiders excluded. (To get the domain, combine project and access method - e.g. "it.wikipedia" "mobile web" means it.m.wikipedia.org, "en.wikipedia" "desktop" means en.wikipedia.org.)

Feb 15 2019, 11:45 PM · Product-Analytics
Tbayer added a comment to T216257: Most visited domains (pageviews) across all Wikipedia/Wikimedia.

However, it seems to be missing a few domains, like when I query for blog.wikimedia.org (or phabricator.wikimedia.org)

As mentioned (admittedly somewhat obliquely) on the documentation page linked in my email, the pageview data is limited to "production sites", which currently does not include blog.wikimedia.org and phabricator.wikimedia.org. There is some traffic data for both domains in other places, but we can be pretty certain already that neither of them are in the top 15 domains by pageviews, so it's probably not worth retrieving numbers for these for this purpose.

Feb 15 2019, 11:36 PM · Product-Analytics
Tbayer added a project to T216208: ToolsDB overload and cleanup: PAWS.
Feb 15 2019, 1:05 AM · TCB-Team (now WMDE-TechWish), Phragile, Data-Services, cloud-services-team (Kanban)

Feb 14 2019

Tbayer added a comment to T211827: Request: Top articles of 2018 on all Wikipedias.

I guess this task can be closed now?

Feb 14 2019, 10:41 PM · Reading-analysis, Product-Analytics
Tbayer updated subscribers of T216096: Whitelist sample flags and page/rev ID fields for ReadingDepth schema.

Great, thanks a lot! The sample fields were introduced in September, so no need to go further back. (CC @Groceryheist )

Feb 14 2019, 8:34 PM · Analytics-Radar, Reading Depth, Analytics-Kanban, Web-Team-Backlog (Tracking), Product-Analytics
Tbayer moved T212961: Add X-Analytics tag for AMC webrequests from Triage to Tracking on the Product-Analytics board.
Feb 14 2019, 7:40 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Product-QA (RW-Test-Cases), Patch-For-Review, XAnalytics, Product-Analytics, Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q3), Advanced Mobile Contributions
Tbayer added a project to T212961: Add X-Analytics tag for AMC webrequests: Product-Analytics.
Feb 14 2019, 7:39 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Product-QA (RW-Test-Cases), Patch-For-Review, XAnalytics, Product-Analytics, Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q3), Advanced Mobile Contributions
Tbayer claimed T214935: Examine clickthrough ratios for different page elements of action=history pages.
Feb 14 2019, 7:33 PM · Web-Team-Backlog, Product-Analytics
Tbayer moved T214935: Examine clickthrough ratios for different page elements of action=history pages from Triage to Doing on the Product-Analytics board.
Feb 14 2019, 7:32 PM · Web-Team-Backlog, Product-Analytics
Tbayer updated subscribers of T216096: Whitelist sample flags and page/rev ID fields for ReadingDepth schema.

Blocked on code review and an answer to T216096#4953210 from someone familiar with the whole EL pipeline and the purging mechanism (@mforns?).

Feb 14 2019, 7:31 PM · Analytics-Radar, Reading Depth, Analytics-Kanban, Web-Team-Backlog (Tracking), Product-Analytics
Tbayer moved T216096: Whitelist sample flags and page/rev ID fields for ReadingDepth schema from Triage to Epics on the Product-Analytics board.
Feb 14 2019, 7:29 PM · Analytics-Radar, Reading Depth, Analytics-Kanban, Web-Team-Backlog (Tracking), Product-Analytics
Tbayer updated subscribers of T216096: Whitelist sample flags and page/rev ID fields for ReadingDepth schema.

@Jdrewniak points out that in https://github.com/wikimedia/mediawiki-skins-MinervaNeue/blob/f07985c6dee5106da8f381a47214e7349fcd147e/resources/skins.minerva.scripts/pageIssuesLogger.js#L65 the spelling is still page-issues-b_sample/ page-issues-a_sample (i.e. like on the schema page, not like in Hive).

Feb 14 2019, 9:06 AM · Analytics-Radar, Reading Depth, Analytics-Kanban, Web-Team-Backlog (Tracking), Product-Analytics
Tbayer added a comment to T216096: Whitelist sample flags and page/rev ID fields for ReadingDepth schema.

NB: The names of these sample field names are spelled with underscores in Hive (e.g. page_issues_b_sample, see below) but with dashes in the schema page (e.g. page-issues-b_sample ). Which version does the whitelist require?

Feb 14 2019, 4:10 AM · Analytics-Radar, Reading Depth, Analytics-Kanban, Web-Team-Backlog (Tracking), Product-Analytics
Tbayer added a comment to T208795: Measure Google Translate Pageview Impact.

See now also T215093: Aggregate and save the Google Translate Pageview count temporarily

Feb 14 2019, 3:06 AM · Reading-Admin
Tbayer added a comment to T215093: Aggregate and save the Google Translate Pageview count temporarily.

Will it be possible later to backfill/update/extend either virtualpageview_hourly or pageview_hourly with data derived from this table? (cf. T212414#4864672)

Feb 14 2019, 3:05 AM · Product-Analytics
Tbayer updated subscribers of T216096: Whitelist sample flags and page/rev ID fields for ReadingDepth schema.

PS: patch is at https://gerrit.wikimedia.org/r/490514 (seems @gerritbot is lagging a bit currently)

Feb 14 2019, 2:43 AM · Analytics-Radar, Reading Depth, Analytics-Kanban, Web-Team-Backlog (Tracking), Product-Analytics
Tbayer added a comment to T209051: ReadingDepth schema is whitelisting both session ids and page ids.

It looks like we had forgotten to whitelist the actual pageID field in addition to the page title, probably because it was only introduced shortly after this task was created (it's in the current version of the schema page but not yet deployed). I should have caught that before +2ing Nuria's patch. I submitted a fix as part of 209051, also for the related revision ID field.

Feb 14 2019, 1:17 AM · Analytics-Radar
Tbayer added a comment to T209087: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas.

Found (and fixed) an oversight regarding ReadingDepth: T216096

Feb 14 2019, 1:08 AM · Analytics-Radar, Analytics-Kanban, Patch-For-Review, Product-Analytics, Reading-analysis
Tbayer created T216096: Whitelist sample flags and page/rev ID fields for ReadingDepth schema.
Feb 14 2019, 12:52 AM · Analytics-Radar, Reading Depth, Analytics-Kanban, Web-Team-Backlog (Tracking), Product-Analytics

Feb 13 2019

Tbayer added a comment to T216063: [Bug] Many ReadingDepth validation errors logged.

In case it's useful, keep in mind that it's possible to query the webrequest table for more detail on the event in question:

Feb 13 2019, 10:49 PM · Reading Depth, Web-Team-Backlog (Tracking), Analytics