Tbayer (Tilman Bayer)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 11:21 PM (208 w, 6 d)
Availability
Available
IRC Nick
HaeB
LDAP User
Unknown
MediaWiki User
Tbayer (WMF) [ Global Accounts ]

Recent Activity

Today

Tbayer assigned T202789: Update Audiences page and Key Product Metrics with August 2018 Readers data to chelsyx.
Mon, Oct 22, 1:51 PM · Product-Analytics

Sat, Oct 13

Tbayer added a comment to T204143: ReadingDepth events are not being sent in browsers where navigator.sendBeacon should be supported but in practice isn't.

To add to this analysis
@Nuria had this to say today:

jdlrobson: do not trust user agents 100% "android 2" could be a who-knows-bot with user agent "android 2" this happens everyday
4:42 PM jdlrobson: or also, could be a misslabeled UA, that is, parser thinks is Android 2 but it is really something else
4:42 PM jdlrobson: this does not happen a lot but it does happens
4:43 PM jdlrobson: i just run some numbers yesterday and by my early estimates 5% of our traffic labeled as "user" is really bots
4:44 PM jdlrobson: so i would not expect 100% consistancy, bots have "made up" UAs

Sat, Oct 13, 8:27 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Product-Analytics, Reading-analysis

Fri, Oct 12

Tbayer added a comment to T204143: ReadingDepth events are not being sent in browsers where navigator.sendBeacon should be supported but in practice isn't.

I took a deep dive into this data today.
I compiled a table, cross checking browser versions with browser capabilities:
https://www.mediawiki.org/wiki/User:Jdlrobson/Page_issues_analysis

[...]
Thanks again for the analysis and the recommendations!

From this analysis, I'd strongly recommend ignoring ReadingDepth data coming from Android native browser; iOS Chrome prior to 11.3 and Chrome <=38.

I guess that this was meant to read "iOS prior to 11.3", correct? (cf. above)

Fri, Oct 12, 5:29 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Product-Analytics, Reading-analysis
Tbayer added a comment to T206279: Hive join fails when using a HiveServer2 client.

FWIW, I encountered the same kind of error in beeline around the same time last week (October 2 I believe). Below is the query and the log (reproduced today). The same query works fine in Hive, and SET hive.auto.convert.join=false;fixes it in beeline as well.

Fri, Oct 12, 2:59 AM · Analytics-Cluster, Analytics, Contributors-Analysis, Product-Analytics
Tbayer added a comment to T202751: Ingest data from PageIssues EventLogging schema into Druid.

@Tbayer

@mforns Great to hear that Druid already allows ingestion of array types! But just to clarify, it seems that this involves information reduction of some kind? At least I'm only seeing scalar values in the selection dropdown in Turnilo (below).

I think it's working OK, no? There are 2 fields that are arrays right now. One of them, sectionNumbers, is an array of integers (I think that's the one in the screenshot, no?). The other one, issuesSeverity is an array of strings, and seems to be working fine on my side.

By "information reduction" (in both of these fields), I meant that several possible values will be mapped to the same value.
E.g. the arrays [0,1,2] and [0] in the EL data will both result in the integer 0 in the Druid data. In the task description and T201873 we had IIRC understood the term "flattened into a string" as mapping e.g. the array [0,1,2]into the string '[0,1,2]'.

Fri, Oct 12, 2:22 AM · Patch-For-Review, Analytics-Kanban, Product-Analytics, Reading-analysis, Readers-Web-Backlog (Tracking), Page-Issue-Warnings, Analytics
Tbayer added a comment to T202751: Ingest data from PageIssues EventLogging schema into Druid.

@Tbayer

It occurred to me afterwards though that this might be because we were looking at the Count measure instead of the Event Count measure. What is the meaning of the former? Is this documented somewhere? Is it necessary to include in the Turnilo options? This will likely not be the last time that it causes that kind of confusion.

Yes, this thing already caused confusion to other people. The EventCount metric is generated at ingestion time by our data crunching job, it represents the number of EventLogging events that fall inside the given slice/dice. The Count metric is added automatically at some point in the pipeline, it corresponds to the number of aggregated/rolled-up rows of the Druid datasource, and it will be different from EventCount in most cases. The addition of Count metric was not intended, I'm trying to see whether we can drop it. If not, I will add it as a gotcha to the documentation.

Fri, Oct 12, 1:52 AM · Patch-For-Review, Analytics-Kanban, Product-Analytics, Reading-analysis, Readers-Web-Backlog (Tracking), Page-Issue-Warnings, Analytics
Tbayer added a comment to T202751: Ingest data from PageIssues EventLogging schema into Druid.

BTW, I understand we are focusing on use in Turnilo for now, but out of curiosity (and considering the task description) I checked Superset too and didn't see this data there yet.

Turnilo and superset both read from the same storage, Druid.

We all know that.

Fri, Oct 12, 1:50 AM · Patch-For-Review, Analytics-Kanban, Product-Analytics, Reading-analysis, Readers-Web-Backlog (Tracking), Page-Issue-Warnings, Analytics

Thu, Oct 11

Tbayer claimed T205681: Metrics request on portal namespace usage.
Thu, Oct 11, 8:12 PM · Analytics, Product-Analytics
Tbayer moved T205681: Metrics request on portal namespace usage from Triage to Doing on the Product-Analytics board.
Thu, Oct 11, 8:11 PM · Analytics, Product-Analytics
Tbayer moved T202790: Update Audiences page and Key Product Metrics with September 2018 Readers data from Blocked to Doing on the Product-Analytics board.
Thu, Oct 11, 8:09 PM · Product-Analytics
Tbayer added a comment to T205681: Metrics request on portal namespace usage.

@AfroThundr3007730 Thanks for the additional background!
I should be able to get you some data for the first to groups of questions soon, based on the internal referrer data we have available.

Thu, Oct 11, 7:46 PM · Analytics, Product-Analytics

Wed, Oct 10

Tbayer added a comment to T205569: Define cross-schema event correlation approach.

the Legal team (or in the future, Privacy)

Not sure what Privacy refers to here, can you clarify a bit?

This is a term from the instrumentation DACI, it's perhaps useful to get familiar with that first (and then work on any necessary clarifications there).

Wed, Oct 10, 9:41 PM · Reading-Admin
Tbayer updated subscribers of T189554: Determine current percentage of the mobile site among desktop device pageviews.
Wed, Oct 10, 9:29 PM · Readers-Web-Backlog (Tracking), Product-Analytics, Reading-analysis
Tbayer added a comment to T205569: Define cross-schema event correlation approach.

I am not clear on whether the "Define cross-schema event correlation approach" is for a cross-schema data to remain after 90 days, if so, how is that in agreement with the Data retention guidelines? (seems like it could not possibly be) and If we want to cross relate schemas for just 90 days, do we really need anything beyond session id?

My understanding has been that this task is largely separate from the question which of the resulting data can be kept beyond 90 days. I would expect we will receive guidance from the Legal team (or in the future, Privacy) regarding this question, and that this guidance would depend on the specific data being logged (e.g., whether it contains page names or not). @dr0ptp4kt , can you clarify the scope?

Wed, Oct 10, 7:01 PM · Reading-Admin
Tbayer added a comment to T169550: Final Vetting of Family Wide unique devices data .

ping @Tbayer, do you think you could get to this task in the next month?

Wed, Oct 10, 6:33 PM · Analytics, Product-Analytics, Reading-analysis, Analytics-Kanban

Tue, Oct 9

Tbayer added a comment to T204143: ReadingDepth events are not being sent in browsers where navigator.sendBeacon should be supported but in practice isn't.

Insprired by a suggestion of @Jdlrobson, here is a version of the above query by iOS version, showing a clear change at iOS 11.3, but also some oddities at earlier versions like 9.1:

Tue, Oct 9, 10:11 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Product-Analytics, Reading-analysis
Tbayer added a comment to T178802: Add Tilman to analytics-admins.

@Tbayer, you do not need any special permissions to access any type of data, the datasources that were accessible through these permits have since then being migrated to cluster.

Tue, Oct 9, 9:14 PM · Patch-For-Review, Operations, SRE-Access-Requests, Analytics-Kanban
Tbayer added a comment to T153821: wikitech.wikimedia.org missing from pageviews API.

@Krenair if wikitech is not behing varnish pageviews cannot be collected. Correct. Seems that we can close ticket?

Be that as it may - we do actually have data in the webrequest table for Wikitech. Using a somewhat simplistic pageview definition, here are the 100 most viewed pages for September 2018 (without spiders) according to that data. Looks quite plausible.

Tue, Oct 9, 7:47 AM · Analytics, Pageviews-API, wikitech.wikimedia.org, Cloud-Services
Tbayer updated subscribers of T202751: Ingest data from PageIssues EventLogging schema into Druid.

Back to the view in Turnilo: This looks very exciting indeed!

Tue, Oct 9, 4:07 AM · Patch-For-Review, Analytics-Kanban, Product-Analytics, Reading-analysis, Readers-Web-Backlog (Tracking), Page-Issue-Warnings, Analytics
Tbayer added a comment to T202751: Ingest data from PageIssues EventLogging schema into Druid.

BTW, I understand we are focusing on use in Turnilo for now, but out of curiosity (and considering the task description) I checked Superset too and didn't see this data there yet. I clicked "scan new datasources", which appears to have imported it, alongside data from some other schemas:

Tue, Oct 9, 3:48 AM · Patch-For-Review, Analytics-Kanban, Product-Analytics, Reading-analysis, Readers-Web-Backlog (Tracking), Page-Issue-Warnings, Analytics
Tbayer added a comment to T202751: Ingest data from PageIssues EventLogging schema into Druid.

Another question: It seems that the dimensions lack e.g. Ua Browser Major and other user agent derived fields (that we have and use in e.g. https://turnilo.wikimedia.org/#pageviews_daily/ ). In the web team we often need these when evaluating EL data, see e.g. this example from earlier today: T204143#4650771 . Could they be added, analogously to the pageviews data?

Tue, Oct 9, 3:35 AM · Patch-For-Review, Analytics-Kanban, Product-Analytics, Reading-analysis, Readers-Web-Backlog (Tracking), Page-Issue-Warnings, Analytics
Tbayer added a comment to T202751: Ingest data from PageIssues EventLogging schema into Druid.

@mforns Great to hear that Druid already allows ingestion of array types! But just to clarify, it seems that this involves information reduction of some kind? At least I'm only seeing scalar values in the selection dropdown (below).
If that's the case, could we document how that works - does it always pick the first element of the array? (i.e. [0,2,5] --> 0, etc.)

Tue, Oct 9, 3:27 AM · Patch-For-Review, Analytics-Kanban, Product-Analytics, Reading-analysis, Readers-Web-Backlog (Tracking), Page-Issue-Warnings, Analytics

Mon, Oct 8

Tbayer added a comment to T204143: ReadingDepth events are not being sent in browsers where navigator.sendBeacon should be supported but in practice isn't.

[...]

With regards to the first 2, I'd need more detailed information on the versions data is missing for Chrome Mobile iOS, Android (stock browser) and (desktop) Chrome. Chrome iOS is very different from Chrome for Android (one uses webkit and the other blink for rendering). For desktop, at least Chrome 39 is needed and for Android stock browser I still don't really understand why this browser is still around and I suspect it's in maintenance mode - I wouldn't be surprised if it doesn't support sendBeacon or performance.

Mon, Oct 8, 11:57 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Product-Analytics, Reading-analysis
Tbayer updated the task description for T204143: ReadingDepth events are not being sent in browsers where navigator.sendBeacon should be supported but in practice isn't.
Mon, Oct 8, 9:57 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Product-Analytics, Reading-analysis
Tbayer closed T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks as Resolved.

So about the rest of the result I had mentioned above that looked fine:

Mon, Oct 8, 9:43 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests
Tbayer closed T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks, a subtask of T200792: Run A/B test on page issues (Farsi, Japanese, Russian, English), as Resolved.
Mon, Oct 8, 9:43 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Page-Issue-Warnings, User-notice, Wikimedia-Site-requests
Tbayer reopened T204143: ReadingDepth events are not being sent in browsers where navigator.sendBeacon should be supported but in practice isn't as "Open".
Mon, Oct 8, 9:40 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Product-Analytics, Reading-analysis
Tbayer reopened T204143: ReadingDepth events are not being sent in browsers where navigator.sendBeacon should be supported but in practice isn't, a subtask of T200792: Run A/B test on page issues (Farsi, Japanese, Russian, English), as Open.
Mon, Oct 8, 9:39 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Page-Issue-Warnings, User-notice, Wikimedia-Site-requests

Sun, Oct 7

Tbayer added a comment to T118063: Reconsider the schema of the Edit event log.

By the way, we have some data on how often links are being opened in a new tab (or window), i.e. how frequently a new mw.user.sessionId()is generated in course of a browser session (in the usual sense that aligns with session cookie storage).

Sun, Oct 7, 5:47 AM · Epic, Product-Analytics, VisualEditor, VisualEditor-MediaWiki, Contributors-Analysis

Sat, Oct 6

Tbayer added a comment to T204143: ReadingDepth events are not being sent in browsers where navigator.sendBeacon should be supported but in practice isn't.

Remind me, did we do QA for this schema on Mobile Safari? If you and/or @Ryasmeen saw valid events on that browser, I would agree that it's reasonable to assume for now that we can use its data.

iOS Safari sendBeacon support was only added in 11.4 (Mar 2018). Thus older versions of 11 will not have it. Are we seeing events from 11.4 Mobile Safari?

Sat, Oct 6, 6:34 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Product-Analytics, Reading-analysis
Tbayer updated subscribers of T178802: Add Tilman to analytics-admins.

@HaeB do you still need this? Can we roll this back?

Yes, until the end of January it looks like (see also our timeline document).

Sat, Oct 6, 6:00 AM · Patch-For-Review, Operations, SRE-Access-Requests, Analytics-Kanban

Fri, Oct 5

Tbayer updated subscribers of T118063: Reconsider the schema of the Edit event log.
Fri, Oct 5, 9:33 PM · Epic, Product-Analytics, VisualEditor, VisualEditor-MediaWiki, Contributors-Analysis
Tbayer updated subscribers of T118063: Reconsider the schema of the Edit event log.

I can confirm that I see the same behaviour described in T118063#4547350 – it's always good when browsers behave as specified!

I don't know where exactly this documentation should go, but yeah, it would be good to keep this straight.

My advice is to always start with the source (https://doc.wikimedia.org/mediawiki-core/master/js/source/mediawiki.user.html#mw-user-method-sessionId) as it's closest to the truth.

I can't find any reference to "sessionid" or "page token" on mediawikiwiki, so I'd also recommend that we create a high-level documentation page there that covers both mw.user.sessionId and the pageview token generated by EventLogging.

Sounds like a good idea! In the meantime, I have submitted a patch to at least add a caveat to the existing documentation.

Fri, Oct 5, 9:32 PM · Epic, Product-Analytics, VisualEditor, VisualEditor-MediaWiki, Contributors-Analysis

Wed, Oct 3

Tbayer added a comment to T200792: Run A/B test on page issues (Farsi, Japanese, Russian, English).

Thanks for the technical background, @elukey! I think it would be useful to add some guidance to the documentation. Developers might find concrete rate limits particularly useful (like the one we stated earlier about the old MySQL system). Especially since there was a sense earlier that the new Hadoop infrastructure would basically relieve us of worrying about throughput limitations.

Wed, Oct 3, 3:59 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Page-Issue-Warnings, User-notice, Wikimedia-Site-requests
Tbayer updated subscribers of T200792: Run A/B test on page issues (Farsi, Japanese, Russian, English).

There was a lack of clarity about the expected event increase from https://gerrit.wikimedia.org/r/463875 , causing some misunderstanding with Analytics Engineering and the postponing of the deployment earlier today:

Wed, Oct 3, 2:10 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Page-Issue-Warnings, User-notice, Wikimedia-Site-requests
Tbayer awarded T191706: It's not possible to undo edits from the revision history or diff in "Mobile" mode a Mountain of Wealth token.
Wed, Oct 3, 3:25 AM · MobileFrontend (MobileFrontend Special Pages), MW-1.32-notes (WMF-deploy-2018-10-02 (1.32.0-wmf.24)), Readers-Web-Backlog (Design), MediaWiki-History-or-Diffs
Tbayer added a comment to T168103: heirloom-mailx fails trying to send out email from SWAP notebook.

PPS: I added a section to the documentation.

Wed, Oct 3, 2:42 AM · Analytics-Kanban, Analytics

Tue, Oct 2

Tbayer added a comment to T168103: heirloom-mailx fails trying to send out email from SWAP notebook.

PS: This solves my own use case and I think that of some other Python users too. Personally I wouldn't mind closing this task, although the problem as stated hasn't been solved yet, and users of other languages might not yet have a way to circumvent it.

Tue, Oct 2, 10:12 PM · Analytics-Kanban, Analytics
Tbayer added a comment to T168103: heirloom-mailx fails trying to send out email from SWAP notebook.

Cool! This works great for me. I tweaked it a bit to make the from_email and to_email parameters optional, autogenerating them based on the server name and user name.

In[1]:
# cf. https://phabricator.wikimedia.org/T168103#4635031 :
notebookservername = !hostname
notebookserverdomain =  notebookservername[0]+'.eqiad.wmnet'
username = !whoami
Tue, Oct 2, 9:58 PM · Analytics-Kanban, Analytics
Tbayer added a comment to T168103: heirloom-mailx fails trying to send out email from SWAP notebook.

What type of notebook are you using? Python?

Tue, Oct 2, 5:50 PM · Analytics-Kanban, Analytics
Tbayer closed T205176: Increase default sampling ratio of ReadingDepth as Resolved.

The event rate before and after deploy looks plausible from a glance at Grafana - closing this task now.

Tue, Oct 2, 5:21 PM · Patch-For-Review, Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q1)
Tbayer updated the task description for T205176: Increase default sampling ratio of ReadingDepth.
Tue, Oct 2, 5:19 PM · Patch-For-Review, Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q1)
Tbayer updated the task description for T200792: Run A/B test on page issues (Farsi, Japanese, Russian, English).
Tue, Oct 2, 12:20 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Page-Issue-Warnings, User-notice, Wikimedia-Site-requests

Mon, Oct 1

Tbayer added a comment to T200792: Run A/B test on page issues (Farsi, Japanese, Russian, English).

And to follow up on T204609#4630216, the newly added wikis appear to exhibit an issues clickthrough rate that is similarly low as on lvwiki (it's a bit higher on fawiki with 0.69% so far). This looks like a good reason to increase the sampling ratio to 100% on the smaller (non-enwiki) wikis, in order to have a better chance to detect changes (if any) with statistical significance. E.g. jawiki will have about 5 million mobile views during the two weeks of the test; if perhaps 1 million of these views will be to pages with issues, that already would not be enough to detect a 5% increase at 0.3% clickthrough rate.

Mon, Oct 1, 10:49 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Page-Issue-Warnings, User-notice, Wikimedia-Site-requests
Tbayer added a comment to T200792: Run A/B test on page issues (Farsi, Japanese, Russian, English).

We now have two full hours of data. As a first check, here is the ratio of pageloaded events to all mobile web pageviews for enwiki (analogously to T204609#4607546 for lvwiki). 2.8% at a sampling ratio of 20% would extrapolate to 16% of pageviews, which is still consistent with 19% of enwiki mainspace pages having Ambox issues (T201123#4494446).

Mon, Oct 1, 10:29 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Page-Issue-Warnings, User-notice, Wikimedia-Site-requests
Tbayer updated subscribers of T168103: heirloom-mailx fails trying to send out email from SWAP notebook.

Ah, sorry, I responded to hastily. The problem is from inside a Jupyter notebook. I'm a bit stumped at the moment, as I can't see why this would work on the shell but not in a Notebook terminal. It likely has something to do with the systemd Jupyter notebook isolation, but still it is strange. Will continue investigating...

Mon, Oct 1, 7:51 PM · Analytics-Kanban, Analytics
Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

Here is the distribution of actions so far. This does not look impossible a priori (although it would mean quite a low issue clickthrough ratio of <=2% in both control and test). [...]

After some other checks that looked fine (will post the detailed results here), I happened to look at the frequency of action types again.[1]

Mon, Oct 1, 3:31 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests

Fri, Sep 28

Tbayer added a comment to T205562: Ingest data into druid for readingDepth schema .

@Nuria I figured that percentiles including the median might be more demanding, but I didn't expect that the mean would be a problem too. Considering that Druid's aggregators include sums and counts, is there a way to calculate their quotient (i.e. the mean) later in Turnilo or Superset?

Fri, Sep 28, 10:01 PM · Readers-Web-Backlog (Tracking), Patch-For-Review, Analytics-Kanban, Analytics
Tbayer added a comment to T201123: What % of pages feature issues?.

On the Persian Wikipedia, the ratio of pages with (Ambox) issues is around 5%: https://phabricator.wikimedia.org/T201123

Fri, Sep 28, 9:50 PM · Product-Analytics, Readers-Web-Backlog (Tracking), Reading-analysis, Page-Issue-Warnings
Tbayer added a comment to T204090: QA page issues feature.

Two bugs involving overlapping text on the Catalan Wikipedia:

Fri, Sep 28, 9:06 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Audiences-QA, Page-Issue-Warnings
Tbayer updated subscribers of T202751: Ingest data from PageIssues EventLogging schema into Druid.

For the record: @Nuria and I discussed this task earlier this week, and I understand that the AE team feels that due to the workload from other projects it might not be possible to implement this (specifically, the subtask T201873 ) before the end of Q2. The team has suggested to instead tackle T205562: Ingest data into druid for readingDepth schema first, as it might be an easier case.

Fri, Sep 28, 8:46 PM · Patch-For-Review, Analytics-Kanban, Product-Analytics, Reading-analysis, Readers-Web-Backlog (Tracking), Page-Issue-Warnings, Analytics
Tbayer added a comment to T205641: Add ability to bucketize integers as part of event ingestion.

As indicated over at T205562#4626349 , in many or most cases we will want to treat such integer fields as measures, rather than as dimensions. It seems bucketing only makes sense for the latter.

Fri, Sep 28, 8:43 PM · Patch-For-Review, Analytics-Kanban, Analytics
Tbayer added a comment to T205562: Ingest data into druid for readingDepth schema .

Hi @Tbayer, just wanted to let you know that at the moment we can't index the time values stored in this dataset due to their high cardinality, so I wanted to ask if the dataset has value to you in druid without them, or you would prefer to wait for the completion of T205641 (referenced here as a subtask), which would put the time values into buckets and make it possible for them to be ingested in druid?

Oh, those time fields (visibleLength and totalLength are particularly relevant) are to be understood as measures, not as dimensions, to use the terms referred to in https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines and the Druid documentation. It's actually very similar to the examples from the draft guidelines about "time spent" and "time since last action".

Fri, Sep 28, 8:42 PM · Readers-Web-Backlog (Tracking), Patch-For-Review, Analytics-Kanban, Analytics
Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

Re-checking the ratio of pageloaded events per pageview after the fix for T205355 has been deployed:
This looks much more plausible now than earlier (T204609#4607546), with rates around the estimated 10%.

Fri, Sep 28, 12:35 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests
Tbayer renamed T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks from Turn on page issues A/B test for Latvian wikipedia to Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.
Fri, Sep 28, 12:12 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests
Tbayer added a comment to T205176: Increase default sampling ratio of ReadingDepth.

Further to T205176#4614616:

Between 12 AM and 6 AM today we saw an error rate of between 0.0005% and 0.0067%.

[...]
Thanks! Does this mean we can consider the AC "Analyse any errors that are introduced in the EventLogging pipeline relating to this change" fulfilled?

Fri, Sep 28, 12:08 AM · Patch-For-Review, Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q1)

Thu, Sep 27

Tbayer added a comment to T205355: A/B config flag should be subject to ResourceLoader caching rules not HTML caching rules.

Over to you. We'll back in action on Latvian Wikipedia. I'd hope without the problems of caching this time round...

Thu, Sep 27, 11:55 PM · MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Patch-For-Review, Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q1)
Tbayer moved T160492: Conduct further data quality checks on the ReadingDepth schema from Next Up to Doing on the Product-Analytics board.
Thu, Sep 27, 8:25 PM · Product-Analytics, Reading-analysis
Tbayer awarded T205562: Ingest data into druid for readingDepth schema a Mountain of Wealth token.
Thu, Sep 27, 1:29 AM · Readers-Web-Backlog (Tracking), Patch-For-Review, Analytics-Kanban, Analytics
Tbayer updated subscribers of T205562: Ingest data into druid for readingDepth schema .
Thu, Sep 27, 1:28 AM · Readers-Web-Backlog (Tracking), Patch-For-Review, Analytics-Kanban, Analytics

Wed, Sep 26

Tbayer reassigned T160492: Conduct further data quality checks on the ReadingDepth schema from Tbayer to Groceryheist.
Wed, Sep 26, 11:56 PM · Product-Analytics, Reading-analysis
Tbayer added a comment to T204143: ReadingDepth events are not being sent in browsers where navigator.sendBeacon should be supported but in practice isn't.

I didn't realise we were excluding all of Safari. That seems a bit extreme imo given we have seen this issue only on 11.1.1 on desktop and we could just exclude that user agent.

I hope this doesn't mean we are excluding iPhone/iPad.
If so I recommend more testing on different Safari versions to increase our confidence. We have no reason right now to believe that all safari's are bad based on 2 desktop browsers.

Remind me, did we do QA for this schema on Mobile Safari? If you and/or @Ryasmeen saw valid events on that browser, I would agree that it's reasonable to assume for now that we can use its data.

Wed, Sep 26, 12:30 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Product-Analytics, Reading-analysis

Tue, Sep 25

Tbayer added a comment to T201063: Modern Event Platform: Event Schema Registry.

As a data analyst or product manager, I want a canonical place where I can easily draft schema definitions and implementation details in collaboration with product engineers during implementation (example), document and access them once a schema is live, and correct and amend them later as needed.

Hm, I'd split this up into different use cases. How about:

  • As a data analyst, I want a place where I can easily draft schema definitions and implementation details in collaboration with product engineers during implementation

    and Neils original:
  • As a data analyst/product manager, I want a canonical place where I can easily document schema definitions and implementation details.

They seemed to be closely connected to me, also because of the "correct and amend them later as needed" part. But we can split them if you prefer.

I left out the product manager user in the one about editing/drafting schemas, only because the few I've talked to haven't had the need to edit them. We can ask around and see if I missed that and there is actually a use case for them here.

Tue, Sep 25, 6:34 PM · Services (watching), Analytics-EventLogging, EventBus, Analytics-Kanban
Tbayer updated the task description for T205454: LDAP Access request for Nathan TeBlunthuis (groceryheist / nathante).
Tue, Sep 25, 4:27 PM · LDAP-Access-Requests
Tbayer added a comment to T199252: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest.

Outside of the scope of this ticket, but I wanted to note it. This is what happened to the index for it.m.wikipedia.org after we submitted the sitemap for it.wikipedia.org:

[...]
I was also curious how the impact of the sitemap rollout would look like for the desktop domain it.wikipedia.org itself:


source

Tue, Sep 25, 3:19 AM · Patch-For-Review, SEO, Performance-Team, Operations, Traffic, Wikimedia-General-or-Unknown
Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

For the record: @Jdlrobson has found the likely reason for the initially low event rate ("Minerva A/B tests are not subject to HTML caching time. Config added inside SkinMinerva is subject to the rules of HTML caching and can take several days ..."). The fix is being worked on at T205355: A/B config flag should be subject to ResourceLoader caching rules not HTML caching rules

Tue, Sep 25, 3:02 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests

Mon, Sep 24

Tbayer added a comment to T199252: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest.

@Nemo_bis Regarding your comment above, about Google not respecting canonical URLs: it's not that they don't respect them, it's that we're not currently giving them the type of hints that they want specifically for mobile discovery[1].

Mon, Sep 24, 1:20 AM · Patch-For-Review, SEO, Performance-Team, Operations, Traffic, Wikimedia-General-or-Unknown

Sep 22 2018

Tbayer renamed T168103: heirloom-mailx fails trying to send out email from SWAP notebook from mailx fails trying to send out email from SWAP to heirloom-mailx fails trying to send out email from SWAP notebook.
Sep 22 2018, 2:57 AM · Analytics-Kanban, Analytics
Tbayer updated subscribers of T168103: heirloom-mailx fails trying to send out email from SWAP notebook.

Added Analytics , as the AE team has taken ownership of SWAP since last year.
@Ottomata , any thoughts? @Neil_P._Quinn_WMF and I wonder if it has to do with the virtual environment.

Sep 22 2018, 2:56 AM · Analytics-Kanban, Analytics
Tbayer added a project to T168103: heirloom-mailx fails trying to send out email from SWAP notebook: Analytics.
Sep 22 2018, 2:54 AM · Analytics-Kanban, Analytics
Tbayer created T205176: Increase default sampling ratio of ReadingDepth.
Sep 22 2018, 2:17 AM · Patch-For-Review, Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q1)

Sep 21 2018

Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

Here is a look at the ratio of pageloaded events from the PageIssues to all applicable views on lvwiki (more precisely, mobile web (-domain) pageviews to mainspace pages, excluding spider views).

Sep 21 2018, 11:55 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests
Tbayer added a comment to T201123: What % of pages feature issues?.

On Latvian Wikipedia, the ratio of pages with issues is around 10%: https://quarry.wmflabs.org/query/29838 (using the above approach to count Ambox-using pages, adapting @TheDJ's queries and combining them into one)

Sep 21 2018, 10:48 PM · Product-Analytics, Readers-Web-Backlog (Tracking), Reading-analysis, Page-Issue-Warnings
Tbayer added a comment to T201123: What % of pages feature issues?.

https://quarry.wmflabs.org/query/28877
1,549,538 articles in en.wp feature usage of a (1 or more) Module:Message_box (the meta template for all Template:*mbox'es)
https://quarry.wmflabs.org/query/28878
1,076,899 articles in en.wp feature usage of a Template:Ambox

https://quarry.wmflabs.org/query/28879
There are currently 5695246 articles on en.wp (redirects excluded)

1,549,538/5,695,246*100 = 27%
So 1,076,899/5,695,246*100 = 19%

https://en.wikipedia.org/wiki/Module:Message_box
For all namespace, a Module:Message_box is used on 6,323,224, which is 14%

Thanks @TheDJ ! (also for taking care to only count distinct pages, as the query used by the templatecounts tool that the task description proposed to use for this question actually counts multiple template occurrences on the same page separate, cf. T201123#4476734.)

Sep 21 2018, 10:45 PM · Product-Analytics, Readers-Web-Backlog (Tracking), Reading-analysis, Page-Issue-Warnings
Tbayer added a comment to T202751: Ingest data from PageIssues EventLogging schema into Druid.

@Tbayer
Yes, no renames will be needed! We'll find a solution to the array field and implement it soon.

Thanks for confirming!

The only caveat is that fields (dimensions) with high cardinality, like pageToken, sessionToken, pageTitle and pageIdSource perform very bad in Druid, so I would blacklist them from Druid ingestion if possible.

Sep 21 2018, 3:34 PM · Patch-For-Review, Analytics-Kanban, Product-Analytics, Reading-analysis, Readers-Web-Backlog (Tracking), Page-Issue-Warnings, Analytics
Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

Interesting... can we rule out the following?

  • doNotTrack header was present

Yes, that can be ruled out. Compare the PageIssues event rate from T204609#4601701 (or [2] below) with e.g. the print button event rate of the Print schema (lvwiki, Minerva, sampled at 10%).[1]

  • events triggered errors due to uri length and were not processed

Can be ruled out. That was a very rare occurrence even in T196904 (where the event query string contained a page title / URL twice, and we only have one page title field here). Besides, it wouldn't explain the inconsistent logging for the same page in the https://lv.m.wikipedia.org/wiki/Filozofija example.

  • events havent made it to hive yet

Super unlikely. (Other schemas, e.g. Print [1], don't seem to be seeing such a delay. And repeating the query from T204609#4601701 >13h later doesn't show any retroactive increases in the events logged.[2])

I'll think about other reasons in meantime...

Sep 21 2018, 3:04 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests
Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

Could you provide some example pages which have DEFAULT priority? It would be useful to verify they are behaving as expected as they were difficult to test...

Sep 21 2018, 2:31 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests
Tbayer updated subscribers of T201124: Provide standard/reproducible way to access a PageToken.

By the way, can we find out / keep track of which other schemas are now using this newly standardized pageview token?

It looks like it has been adapted in QuickSurveys (https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/QuickSurveys/+/454431/ / T204921).

Sep 21 2018, 1:59 AM · MW-1.32-notes (WMF-deploy-2018-09-04 (1.32.0-wmf.20)), Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q1), Patch-For-Review, Analytics, Analytics-EventLogging, MediaWiki-extensions-WikimediaEvents, Page-Previews, Readers-Web-Kanbanana-Board-Old, Technical-Debt, Performance-Team (Radar), Page-Issue-Warnings
Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

No red flags in issuesVersion, isAnon, and namespaceId either.

SELECT event.issuesVersion AS issuesVersion, COUNT(*) AS events
FROM event.pageissues 
WHERE year >0 
GROUP BY event.issuesVersion;
Sep 21 2018, 1:47 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests
Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

And the distribution of values of the sectionNumbers and issuesSeverity fields looks plausible too - at least there are a lot of different kinds of combinations represented.

Sep 21 2018, 1:39 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests
Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

Here is the distribution of actions so far. This does not look impossible a priori (although it would mean quite a low issue clickthrough ratio of <=2% in both control and test). So the missing events are not caused by an entire category of actions missing.

actionevents
pageLoaded1341
issueClicked14
editClicked9
modalClose7
modalInternalClicked2
modalEditClicked2
Sep 21 2018, 1:10 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests
Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

OK, let's look at the article https://lv.m.wikipedia.org/wiki/Filozofija , which currently is tagged as lacking references and receives between 20-80 pageviews per day.

However, it hasn't generated any PageIssues events so far:

SELECT COUNT(*) FROM event.pageissues 
WHERE year >0 
AND event.pageTitle = 'Filozofija';

_c0
0
1 row selected (58.149 seconds)

Do we know what user agents those views are? Just want to rule out this being a problem with grade C browsers... (e.g. browsers we don't run JS)

Sep 21 2018, 12:37 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests

Sep 20 2018

Tbayer moved T148262: Vet and explore new readership engagement metric from Backlog to Doing on the Product-Analytics board.
Sep 20 2018, 10:01 PM · Product-Analytics, Patch-For-Review, Reading-analysis
Tbayer moved T148263: Vet and explore new readership retention metric from Backlog to Doing on the Product-Analytics board.
Sep 20 2018, 10:01 PM · Product-Analytics, Reading-analysis
Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

On the other hand, here is a list of the 100 pages that have generated the most events so far.

Sep 20 2018, 2:32 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests
Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

OK, let's look at the article https://lv.m.wikipedia.org/wiki/Filozofija , which currently is tagged as lacking references and receives between 20-80 pageviews per day.

Sep 20 2018, 2:22 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests
Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

We have over half a day's worth of data in the table now, including from daytime hours in Latvia, and
the caches should have caught up. But the event rate remains surprisingly low (see Grafana and query below) - about 1-2 events per minute, whereas lv.m.wikipedia.org receives 70-80k views/day currently (https://stats.wikimedia.org/v2/#/lv.wikipedia.org/reading/total-page-views/normal|bar|1-Month|access~mobile-web ) or around 40-60 views/minute. Maybe content quality is very high on this wiki... (Will do further checks.)

Sep 20 2018, 1:42 PM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests
Tbayer added a comment to T201124: Provide standard/reproducible way to access a PageToken.

By the way, can we find out / keep track of which other schemas are now using this newly standardized pageview token?

Sep 20 2018, 1:10 PM · MW-1.32-notes (WMF-deploy-2018-09-04 (1.32.0-wmf.20)), Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q1), Patch-For-Review, Analytics, Analytics-EventLogging, MediaWiki-extensions-WikimediaEvents, Page-Previews, Readers-Web-Kanbanana-Board-Old, Technical-Debt, Performance-Team (Radar), Page-Issue-Warnings
Tbayer updated subscribers of T204143: ReadingDepth events are not being sent in browsers where navigator.sendBeacon should be supported but in practice isn't.

Perfect. Thanks @Ryasmeen! In that case, let's go ahead and close this task and confirm that we will be avoiding Safari in our analysis @Tbayer

Sep 20 2018, 5:56 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Product-Analytics, Reading-analysis
Tbayer added a comment to T198218: Generate list of most used special pages.

We haven't really decided into which direction to take this task from here. Which of the above combinations would be most useful to extend to the other four languages? (logged in vs all views, mobile domain vs. mobile + desktop)

Sep 20 2018, 2:03 AM · Reading-analysis, Product-Analytics, Readers-Web-Backlog (Tracking)
Tbayer added a comment to T198218: Generate list of most used special pages.

@Tbayer thanks, that's helpful. Is there any way to include the use of Talk pages within these results?

Sep 20 2018, 1:59 AM · Reading-analysis, Product-Analytics, Readers-Web-Backlog (Tracking)
Tbayer added a comment to T198218: Generate list of most used special pages.

Here is the (or an) answer to the second question from the task, about the top 50 pages outside the article namespace (enwiki, July, logged in views, desktop+mobile).

Sep 20 2018, 1:54 AM · Reading-analysis, Product-Analytics, Readers-Web-Backlog (Tracking)
Tbayer added a comment to T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks.

OK, the pageissues table just materialized in Hadoop with the data from the first hour - 19 events, 14 of which seem to be your test views. Let's wait a bit for the caches...

Sep 20 2018, 1:27 AM · Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Patch-For-Review, Wikimedia-Site-requests

Sep 19 2018

Tbayer added a comment to T198218: Generate list of most used special pages.

And here is the analogous list of special page with the most logged in views on the mobile site (en.m.wikipedia.org in this case), also for July 2018. Comparing with the above result, one finds e.g. that 6% of views to Special:Watchlist are on the mobile domain.

Sep 19 2018, 7:19 PM · Reading-analysis, Product-Analytics, Readers-Web-Backlog (Tracking)
Tbayer added a comment to T204746: It should be possible to opt into new page issues treatment via query string parameter.

To add a clarification from kickoff, for the record: This will not switch on the instrumentation, so we would still need to resort to other means for checking events are being sent correctly on a particular wiki.

Sep 19 2018, 5:16 PM · User-Ryasmeen, MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Patch-For-Review, Audiences-QA, Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q1), MinervaNeue
Tbayer added a comment to T191532: Mobile page issues - instrument page issues.

BTW, to clarify just in case (because it's not explicit in the QA steps listed in the task description):
For the ReadingDepth schema, the most important events to check are the pageUnloaded actions that are sent when a tab is closed or the user navigates away to a different page.

Added more detail to the task description in that regard (after discussion with @Ryasmeen ).

Checked that the 'action: pageLoaded' is sent on the initial page load, and 'action: pageUnloaded' is sent when I navigated away to a different page. However I couldn't verify the 'action:pageUnloaded' for iOS Safari and Chrome Android because the toast message disappears too quickly right before navigating away. But for desktop Chrome, Firefox it's working correctly. @Tbayer: Do you think that's an acceptable test coverage for this? :)

Sep 19 2018, 6:53 AM · User-Ryasmeen, Audiences-QA, MW-1.32-notes (WMF-deploy-2018-08-28 (1.32.0-wmf.19)), Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q1), Page-Issue-Warnings
Tbayer added a comment to T204790: nathante/groceryheist shell request for researchers, statistics-privatedata-users, analytics-privatedata-users.

Seconding this request:
@Groceryheist will be working with @ovasileva and myself on this project under a WMF contract, doing research on understanding reader behavior, focused on ReadingDepth EventLogging data (in combination with data from webrequest and pageview_hourly).
The three access groups listed in the task are what we have determined as necessary for this work (per https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups and https://wikitech.wikimedia.org/wiki/SWAP#Access ).

Sep 19 2018, 1:02 AM · Patch-For-Review, Operations, SRE-Access-Requests

Sep 18 2018

Tbayer updated the task description for T179915: Determine expected amount of usage of mobile print to PDF button per browser.
Sep 18 2018, 9:36 PM · Product-Analytics, New-Readers, Readers-Web-Backlog (Tracking), Reading-analysis
Tbayer added a comment to T203134: Generate various histograms and time series for exploring the new reader retention metric.

I finished creating time-series graphs looking at the users avg next return time (within 31 days and 7 days) for a variety of countries and projects from December 2016 through July 2018.

I’m currently reviewing the graphs and their breakdowns to identify trends. Below are some initial observations:

....

  • On English Wikipedia, there were a number of sudden drops on desktop between May and July 2017, where the avg return time within 31 days decreased from around 5.5 to 1.0 or 2.0 days. Similar drops during this timeframe were also seen in for Wikimedia and Wikisource projects and from US, Japan, France, and Russia countries. I’ll investigate further by looking through the raw dataset and using daily histograms of return time around those dates.

Interesting! So it seems that the average may have been integer-valued on these drop days in F26025897 ? That would point to a data artifact.

Sep 18 2018, 9:34 PM · Product-Analytics
Tbayer added a comment to T204746: It should be possible to opt into new page issues treatment via query string parameter.

This would be great. Just to double-check (apologies if that's a naive question): Would that query parameter survive across several issue clicks and modal clicks? e.g. https://en.m.wikipedia.org/wiki/Pharmacovigilance?pageissues=new2018 --> https://en.m.wikipedia.org/wiki/Pharmacovigilance?pageissues=new2018#/issues/all

Sep 18 2018, 9:11 PM · User-Ryasmeen, MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Patch-For-Review, Audiences-QA, Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q1), MinervaNeue