Tbayer (Tilman Bayer)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 11:21 PM (199 w, 1 d)
Availability
Available
IRC Nick
HaeB
LDAP User
Unknown
MediaWiki User
Tbayer (WMF) [ Global Accounts ]

Recent Activity

Today

Tbayer added a comment to T201973: Determine ratio of wikitext edits made from mobile devices that use the desktop interface.

Here is a quick and dirty query for a related question: The ratio of wikitext edits using the desktop interface made from iOS and Android devices (a good approximation of the "mobile devices" in the sense of this task). It is around 5% on average.
This will need to be compared with the number of mobile web edits (ideally also filtering out those which were made, conversely, from desktop devices, although that might be trickier to determine).

Wed, Aug 15, 12:40 AM · Product-Analytics

Yesterday

Tbayer created T201973: Determine ratio of wikitext edits made from mobile devices that use the desktop interface.
Tue, Aug 14, 11:52 PM · Product-Analytics
Tbayer added a comment to T201124: Provide standard/reproducible way to access a PageToken.

@Tbayer and I discussed and reviewed. Here are our conclusions:

  • For page issues reading depth specifically, generateRandomSessionId() would be adequately random. However, the solution proposed in the outstanding patch is general purpose and would be applied to other metrics so the timestamp _is_ wanted.

To add more detail regarding the rationale here: As alluded to above and stated in the code, the 64 bits of entropy from generateRandomSessionId() are sufficient to prevent token collisions with ~99% probability in a sample of 500 million, which is enough for the upcoming page issues A/B test which is envisaged to feature up to >100 million pagetokens per project. That said, we also want to keep this future-proof so that users don't even have to bother with such calculations, and it's not totally inconceivable that it could on occasion be used for larger sets too - e.g. the Virtualpageview table currently consists of 7.5 billion link interactions (it doesn't use a token for them and is only an auxiliary schema feeding into the actual aggregate table used for analysis, but this illustrates the possible dimensions). That's a motivation for continuing to use the timestamp as a second source of randomness. However, also adding a second call to generateRandomSessionId() seems overkill indeed.

Tue, Aug 14, 10:49 PM · Patch-For-Review, Analytics, Analytics-EventLogging, MediaWiki-extensions-WikimediaEvents, Page-Previews, Readers-Web-Kanbanana-Board, Technical-Debt, Performance-Team (Radar), Page-Issue-Warnings, Readers-Web-Backlog
Tbayer added a comment to T200792: Run A/B test on page issues.

We've estimated high as we anticipate a possibility of needing to support Tilman in analysis and bug identification. Hopefully we won't and this will prove easy.

Tue, Aug 14, 4:33 PM · Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer updated the task description for T200792: Run A/B test on page issues.
Tue, Aug 14, 4:22 PM · Wikimedia-Site-requests, Readers-Web-Backlog

Mon, Aug 13

Tbayer added a comment to T191532: Mobile page issues - instrument page issues.

@Tbayer: Did you want to set up a meeting with an engineer to go through the instrumentation manually (further to the QA steps that @Jdlrobson added). IIRC we did that for Page Previews and it was useful.

Mon, Aug 13, 11:58 AM · MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Audiences-QA, Readers-Web-Kanbanana-Board, Readers-Web-Backlog, Page-Issue-Warnings

Sun, Aug 12

Tbayer updated the task description for T200792: Run A/B test on page issues.
Sun, Aug 12, 9:19 AM · Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer added a comment to T200792: Run A/B test on page issues.

We decided to run the test everywhere but I don't see any harm in running it for a bit longer on one wiki.

My choice of words with "test run" was bad. Let me try asking this another way.

If we run the A/B test and there proves to be a problem with the data collected, what will we do? Will we just turn it off and on again, or would we need to change the wiki we are running the A/B test on because the sample has been exposed to the new treatment and thus the A/B test has become invalid?

How would it become invalid? The research questions do not include investigation of any novelty effects, i.e. those that result from the user's initial unfamiliarity with the new design. Instead, we are interested in how the new design would change reader behavior over the long run.
To the contrary, we should actually add a run-up time of 1-2 days to the experiment time of two weeks, to reduce any novelty effects, and also to account for caches updating - as we did with the last Popups A/B tests (or tried to T178500#3765787).

This information will impact how we setup and run the A/B test. My recommendation was that before running any official A/B test we might want to validate the data integrity by turning this on on a single wiki separately to the test.

Sun, Aug 12, 9:17 AM · Wikimedia-Site-requests, Readers-Web-Backlog

Sat, Aug 11

Tbayer added a comment to T201124: Provide standard/reproducible way to access a PageToken.

Add a getter for the pageViewToken property defined in the ext.eventLogging.subscriber module.

Done in https://gerrit.wikimedia.org/r/451885 as mw.eventLog.pageviewToken() and mw.eventlog.newPageInteractionToken().

Can someone now document the difference between mw.eventLog.pageviewToken() and mw.eventlog.newPageInteractionToken() (how is each calculated, and when, and how long does it persist)?

Sat, Aug 11, 1:24 AM · Patch-For-Review, Analytics, Analytics-EventLogging, MediaWiki-extensions-WikimediaEvents, Page-Previews, Readers-Web-Kanbanana-Board, Technical-Debt, Performance-Team (Radar), Page-Issue-Warnings, Readers-Web-Backlog

Fri, Aug 10

Tbayer added a comment to T201124: Provide standard/reproducible way to access a PageToken.

When refactoring, please keep in mind, that in Page previews repo, when we create a pageInteractionToken -> this token has to be unique for each preview. We cannot use the same token if the user dwells over the same link twice.

The Popups schema for page previews has both a linkInteractionToken and a pageToken. The latter is unique for each pageview, the former for each preview.

Fri, Aug 10, 5:07 PM · Patch-For-Review, Analytics, Analytics-EventLogging, MediaWiki-extensions-WikimediaEvents, Page-Previews, Readers-Web-Kanbanana-Board, Technical-Debt, Performance-Team (Radar), Page-Issue-Warnings, Readers-Web-Backlog
Tbayer added a comment to T201653: Missing documentation for pageviews dataset.

Instead of maintaining Readme documentation on dumps.wikimedia.org, we should link back to the corresponding documentation pages on Wikitech , which are more reliable and up to date. This is already done on e.g. https://dumps.wikimedia.org/other/pagecounts-raw/ .

Fri, Aug 10, 4:59 PM · Patch-For-Review, Analytics-Kanban, Datasets-General-or-Unknown, Documentation, Analytics

Thu, Aug 9

Tbayer moved T200794: Analyze results of page issues A/B test from Triage to Blocked on the Product-Analytics board.
Thu, Aug 9, 8:13 PM · Readers-Web-Backlog (Tracking), Product-Analytics, Reading-analysis
Tbayer added a comment to T186828: Productionize per-country daily & monthly active app user stats.

@chelsyx thanks for working on this. Will review as is but let me outline an approach that would have survived the test of time better: we could have created a "tag" (see: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest/Tagging) for when a pageview is a mobile pageview, that way all this code:

AND ((parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'action') = 'mobileview' AND uri_path == '/w/api.php')
       OR (uri_path LIKE '/api/rest_v1%' AND uri_query == ''))
   AND COALESCE(x_analytics_map['wmfuuid'],
                parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'appInstallID')) IS NOT NULL
   AND webrequest_source IN ('text')

For context, the code in question comes from the Analytics Engineering team's query to calculate the global version of this data, and I can see strong arguments for keeping this new per-country query consistent with that.
So I understand your remarks are about how the Analytics Engineering team could have approached this differently back in 2015 if the tagging infrastructure had been around already.

could be abstracted to a tag so 1) that where clause could be changed to "where tags include ' mobile-pageview' ".

Its usefulness as a general tag would be limited though, considering that it only captures app views where the user has opted in/not opted of data collection, as opposed to the general access_method = 'mobile app'.

This is what we do to , for example, identify wqds requests, we tag them when we refine and subsequent jobs that use that data do not need to do costly regexes. The wdqs tag as an example:

https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/webrequest/tag/WDQSTagger.java

See a similar tag for portal pageviews:

https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/webrequest/tag/PortalTagger.java

Thu, Aug 9, 7:45 PM · Patch-For-Review, Product-Analytics, Analytics, Discovery-Analysis, Reading-analysis
Tbayer updated the task description for T191532: Mobile page issues - instrument page issues.
Thu, Aug 9, 4:00 PM · MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Audiences-QA, Readers-Web-Kanbanana-Board, Readers-Web-Backlog, Page-Issue-Warnings
Tbayer updated the task description for T191532: Mobile page issues - instrument page issues.
Thu, Aug 9, 3:59 PM · MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Audiences-QA, Readers-Web-Kanbanana-Board, Readers-Web-Backlog, Page-Issue-Warnings
Tbayer added a comment to T193294: CPS report of traffic impact during campaign.

Great! Since I see my name in the task description, I should point out that I haven't yet seen the report myself (not insisting that I need to - just keeping the RACI record straight ;)

Thu, Aug 9, 1:58 AM · New-Readers
Tbayer added a comment to T201123: What % of pages feature issues?.
> As discussed at T200792#4475856, there seems to be some confusion here between two related but separate questions:

Yes, it seems I might be confused. Do we need to know the answer to the question "What % of pages feature issues?" If not, let's decline this task!

Thu, Aug 9, 1:51 AM · Readers-Web-Backlog, Page-Issue-Warnings

Wed, Aug 8

Tbayer updated the task description for T200792: Run A/B test on page issues.
Wed, Aug 8, 6:10 PM · Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer added a comment to T200792: Run A/B test on page issues.

Here is a very rough estimate of the sampling ratio (or bucket sizes) we need in order to answer the research questions.

Wed, Aug 8, 6:00 PM · Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer added a comment to T191532: Mobile page issues - instrument page issues.

This is a bit late in the game, but I want to flag that with this new schema we have an opportunity to consider whether we want to ask Analytics Engineering engineering to ingest its data into Druid, in order to potentially make it accessible as a view or dashboard in Superset. As a first step this would require checking whether the schema's format satisfies these (draft) guidelines.

Wed, Aug 8, 4:27 PM · MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Audiences-QA, Readers-Web-Kanbanana-Board, Readers-Web-Backlog, Page-Issue-Warnings
Tbayer added a comment to T193578: Assess impact of ua-parser update on core metrics.

Ping @Tbayer let's mark this as resolved?

We got a lot of good information here (thanks again, also for sharing SWAP notebook!)
But the three questions spelled out in the task description are still not yet marked as resolved. If I have overlooked the answers, please feel free to point that out and tick the corresponding checkboxes (preferably also linking the answers in the task description so folks can find them easily).

Regarding the first one ("What percentage of global human pageviews [...] are going to be reclassified as spider pageviews?"), we have some partial information from your great results in T193578#4196915 . But as I pointed out later that day in T193578#4198163, these still need to applied to the agent_type = 'user' subset. I might take a look at your notebook now to see if I can work it out myself.

Wed, Aug 8, 1:55 PM · Reading-analysis, Product-Analytics, Analytics

Tue, Aug 7

Tbayer added a subtask for T191532: Mobile page issues - instrument page issues: T201124: Provide standard/reproducible way to access a PageToken.
Tue, Aug 7, 5:11 PM · MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Audiences-QA, Readers-Web-Kanbanana-Board, Readers-Web-Backlog, Page-Issue-Warnings
Tbayer added a parent task for T201124: Provide standard/reproducible way to access a PageToken: T191532: Mobile page issues - instrument page issues.
Tue, Aug 7, 5:11 PM · Patch-For-Review, Analytics, Analytics-EventLogging, MediaWiki-extensions-WikimediaEvents, Page-Previews, Readers-Web-Kanbanana-Board, Technical-Debt, Performance-Team (Radar), Page-Issue-Warnings, Readers-Web-Backlog
Tbayer added a comment to T198947: There should not be multiple h1 tags on mobile page HTML: Restructure mobile web header for SEO and accessibility.

Google isn't using our mobile content for indexing anyway. They are using the Parsoid-format HTML directly from RESTbase AFAIK.

Tue, Aug 7, 4:58 PM · MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Audiences-QA, Patch-For-Review, Readers-Web-Kanbanana-Board, Accessibility, Readers-Web-Backlog, MinervaNeue, Mobile, SEO
Tbayer added a comment to T200792: Run A/B test on page issues.

Following the discussion above and on Slack, I have put some notes about terminology at https://www.mediawiki.org/wiki/Reading/Web/Quantitative_Testing#Sampling_and_bucketing . Please review.

Tue, Aug 7, 2:37 PM · Wikimedia-Site-requests, Readers-Web-Backlog

Mon, Aug 6

Tbayer added a comment to T191532: Mobile page issues - instrument page issues.

@Tbayer I think #3 is the most accurate description.

We are essentially parsing elements with any of the CSS classes table.ambox table.tmbox table.cmbox table.fmbox (details here) which are classes that derive from the following templates:

  • Ambox (article message box)
  • Tmbox (talk page message box)
  • Cmbox (category message box)
  • Fmbox (file message box)

    However, we're only showing the new treatment for the article namespace.

Thanks, very helpful! BTW it would be good to also include this definition (or at least a pointer to it) at https://www.mediawiki.org/wiki/Reading/Web/Projects/Mobile_Page_Issues#Proposed_changes , so that people can determine which pages on which projects are going to be affected.

So from your above examples, the French templates https://fr.wikipedia.org/wiki/Modèle:Méta_bandeau_d%27avertissement would be excluded, however the Spanish templates would actually be included https://es.wikipedia.org/wiki/Arquitectura

But I didn't see table.ambox in the HTML of https://es.wikipedia.org/wiki/Arquitectura ? (only e.g. ambox-text)

We are also parsing the "severity level" of the templates in a few languages: Italian, Spanish, Russian. See here for details.

I don't think it's been explicitly stated, but I think we would want to run this test only for the article name-space right?

Yes, considering that per your remark above the design will only change in article namespace too.

Mon, Aug 6, 3:24 PM · MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Audiences-QA, Readers-Web-Kanbanana-Board, Readers-Web-Backlog, Page-Issue-Warnings
Tbayer updated the task description for T191532: Mobile page issues - instrument page issues.
Mon, Aug 6, 1:29 PM · MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Audiences-QA, Readers-Web-Kanbanana-Board, Readers-Web-Backlog, Page-Issue-Warnings
Tbayer added a comment to T191532: Mobile page issues - instrument page issues.

We still need to document more precisely what we are actually counting as "page issues" with this instrumentation, especially since (per recent conversations, see e.g. T200792#4472739 ) there is now a more pronounced desire to measure things beyond the English Wikipedia. Currently this is only loosely described in the AC:

pages which have page issues on them e.g. ambox templates

Mon, Aug 6, 1:28 PM · MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Audiences-QA, Readers-Web-Kanbanana-Board, Readers-Web-Backlog, Page-Issue-Warnings

Fri, Aug 3

Tbayer closed T187590: Mobile readers location data as Resolved.

Closing per @dchen

Fri, Aug 3, 11:22 PM · Product-Analytics, Reading-analysis
Tbayer awarded T201022: Third party resources loaded by wikimediafoundation.org a Evil Spooky Haunted Tree token.
Fri, Aug 3, 6:13 PM · Privacy, wikimediafoundation.org
Tbayer added a comment to T200810: Make it possible to A/B test different section headings on mobile web.

(likewise moved here from the task description - it seem that these are thoughts about the interpretation of the resulting data and suggestions what to take into account during its analysis, which is always valuable but seems offtopic for this task per se:)

== Short term vs long term impact
Note we should be cautious in time we run such as experiment new headings may arose curiosity. It is possible with a new heading, readers are more likely to click it to find out what kind of information they can find inside.

Such novelty effects may be possible, but they don't prevent us from running user interface A/B tests either, and because the ratio of repeat readers to the same page is likely rather low, I would expect them to be even less of a problem here.

It might be that rather than the heading, the content or the delivery of that content is a problem and over time the section headings themselves become associated with that content and are less preferred on mobile. The references section for example may be rarely used, not because of the title, but due to the fact that most mobile users know that clicking on an inline reference will show the associated reference.

Good point (I dwelled on it in my Wikimania presentation too), but I fail to see what it has to do with the implementation of the present task.

I would thus not recommend doing this A/B test for sections such as "External links", "References" but more for sections where technical words are used, where different language may lead to more accessible content.

Fri, Aug 3, 3:30 PM · Reading Epics (Analytics), Epic, Readers-Web-Backlog
Tbayer updated the task description for T200810: Make it possible to A/B test different section headings on mobile web.
Fri, Aug 3, 3:25 PM · Reading Epics (Analytics), Epic, Readers-Web-Backlog
Tbayer updated the task description for T200810: Make it possible to A/B test different section headings on mobile web.
Fri, Aug 3, 3:24 PM · Reading Epics (Analytics), Epic, Readers-Web-Backlog
Tbayer added a comment to T200810: Make it possible to A/B test different section headings on mobile web.

(Moving a few things from the task description here into the comments as they seems more discussion contributions than something we all are ready to commit to as part of this task:)

Fri, Aug 3, 3:23 PM · Reading Epics (Analytics), Epic, Readers-Web-Backlog
Tbayer added a comment to T187590: Mobile readers location data.

@Tbayer in phab review today we noticed this is in blocked. Can you add a note as to why?

Fri, Aug 3, 3:03 PM · Product-Analytics, Reading-analysis
Tbayer closed T184227: Measured impact of SVG optimizations as Resolved.

@Tbayer During Phab Review, the team

What does "the team" refer to? ;)

thought that this ticket could be resolved, and a new one opened for the followup work. Does that seem reasonable to you?

Well, as noted above in T184227#3936244 , the original plan was a different one. But considering that half a year later , neither the URL format requested above on January 16 (necessary to extend the result beyond enwiki) nor the request to repeat the analysis following further SVG optimization work has materialized, I think it's reasonable to close this task now as done, with the option to open a new one once either of these two happens. Especially since within the Product Analytics team'sPhab Review processes as they are currently set up, the presence of such open tickets seems to cause significant distraction and several staff (including you and me right now) repeatedly spending time just for task management purposes.

Fri, Aug 3, 2:48 PM · Product-Analytics, Reading-analysis, MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), UI-Standardization
Tbayer closed T184227: Measured impact of SVG optimizations, a subtask of T178867: Unify and optimize SVG markup across Foundation products, as Resolved.
Fri, Aug 3, 2:48 PM · MW-1.31-release-notes (WMF-deploy-2018-02-06 (1.31.0-wmf.20)), Front-end-Standards-Group, UI-Standardization
Tbayer added a comment to T201123: What % of pages feature issues?.

Count template transclusions
Identify all templates that can render ambox class (Special:Search can help here)
For each template, check corresponding template count https://tools.wmflabs.org/templatecount/index.php?lang=en&namespace=10&name=Ambox#bottom
Note: this approach would lead to duplicates where more than one template is used in the same page.

Actually I think that this tool may count nested transclusions too - it appears that it simply executes the following query:

Fri, Aug 3, 2:30 PM · Readers-Web-Backlog, Page-Issue-Warnings
Tbayer renamed T201123: What % of pages feature issues? from What % of page views feature page issues? to What % of pages feature issues?.
Fri, Aug 3, 12:55 PM · Readers-Web-Backlog, Page-Issue-Warnings
Tbayer added a comment to T201123: What % of pages feature issues?.

As discussed at T200792#4475856, there seems to be some confusion here between two related but separate questions:

  1. the ratio of *pages* with issues among all pages
  2. the ratio of *pageviews* to pages with issues, among all pageviews

For example, suppose a wiki has two pages, one with issues and one without. The first page gets 8 views, and the second page gets 2 views. Then the answer to question 1 would be "50%", the answer to the second question would be "80%".

Fri, Aug 3, 12:55 PM · Readers-Web-Backlog, Page-Issue-Warnings
Tbayer added a comment to T200792: Run A/B test on page issues.

An A/B test by definition will split users into 2 equal buckets - 50% group A and 50% group B. The sampling rate defines the total number of users in the test,soif we are using a sampling rate of 10% of 100 users, that will be 10 users, with 5 in group A and 5 in group B.

IIRC we deliberately avoided using the term "sampling rate" when talking about the Page Previews A/B tests and stuck to "bucket size" to avoid confusion.

Yes (albeit after using "sampling rates" earlier for page previews too), see e.g. https://meta.wikimedia.org/wiki/Schema_talk:Popups . But the "bucket size" parlance used there (e.g. "0.04:0.04:0.92") does not match the usage in this task ("Bucketing will be 50%").

Fri, Aug 3, 12:40 PM · Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer added a comment to T200792: Run A/B test on page issues.

What is the approximate percentage of (mobile) pageviews to pages with issues on (select languages)?
If we want better accuracy, we should instead send the pageLoaded event for all pageviews, at the expense of a higher sample rate.

If we want to know this, with high accuracy, this still doesn't seem perfect. Infrequently visited pages would be missed in the A/B test.

How is that a problem?

It might be possible that pages with issues are less read, so I would not rely on anything page view based to count this.

The question is actually about pageviews, so it seems kind of odd to "not rely on anything page view based to count this".

I'd thus advise against using EventLogging for this, as there are better ways to do this. Have setup some ideas here - T201123 - but I strongly advise we avoid doing this.

I assume that this advice is based on a misunderstanding, see above, and that T201123 is instead about the related but quite different question about the ratio of pages with issues, instead of the ratio of pageviews to pages with issues that is the subject of the research question in this task.

Fri, Aug 3, 9:09 AM · Wikimedia-Site-requests, Readers-Web-Backlog

Thu, Aug 2

Tbayer added a comment to T200792: Run A/B test on page issues.

turn on A/B test for page issues for all projects

Many projects don't use page issues so ideally would not run the AB test. Should this be Wikipedia projects?

I would be okay with only looking at Wikipedias, but @Tbayer - any thoughts on this? In standup we discussed that running everywhere would also help us identify the list of projects where these improvements are not helpful (projects including Wikipedias that do not use page issues, or like frwiki, use different versions of page issues that are not covered by these changes).

I guess that's mainly a product question (e.g. do we intend to put work into improving maintenance templates on sister projects, considering that their readership is orders of magnitude below that of Wikipedia - if we expect to have the resources, great, but if not, including them here is more a nice to have).

Thu, Aug 2, 3:58 PM · Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer added a comment to T200792: Run A/B test on page issues.

Discussed during grooming and identified the following questions:

  • should we drop the size of the test bucket (proposed change to 10%)

I don't quite recall that outcome. Does that refer to the size of the sample (how many sessions will be included in the experiment to send data, either as part of the test group - shown the new design - or the control group - old design, which is also what @phuedx refers to above)? Or to the bucketing into test vs. control within that sample, which per usual practice for A/B tests should be 50:50? (I think we have been through this kind of terminology confusion before...)

Thu, Aug 2, 3:49 PM · Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer added a comment to T200792: Run A/B test on page issues.

Just want to flag that the following is a new research question that we added per the discussion in the meeting on Tuesday:

What is the approximate percentage of (mobile) pageviews to pages with issues on (select languages)?

(the rest of the questions com from T191532, dating back to April)

Thu, Aug 2, 3:40 PM · Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer added a comment to T200754: Redirects for new Wikimedia Foundation website.

For a majority of the pages - we want the end user to be sent to the main site - and the 404 page is fine as it conveys it’s changed. The reality is that a vast majority of the pages on Governance Wiki will not be staying there. There is no reason to maintain links to inactive pages replaced by the new site. Active policies are getting redirects.

Thu, Aug 2, 10:05 AM · wikimediafoundation.org, WMF-Communications
Tbayer added a comment to T108980: Write help page for using citoid in VisualEditor.

Is this notice still needed?

Thu, Aug 2, 9:43 AM · Need-volunteer, Documentation, VisualEditor

Wed, Aug 1

Tbayer added a comment to T198978: Investigate logging inbound "referrer" for app opens in EventLogging.

Some context: On the web and (since recently, with T192779 and this patch) in the Android app, we do actually send the full url of the referrer as part of the request for the actual article being opened. For the web, that's of course simply part of the HTTP standard. It is then stored temporarily as part of the webrequest table (in the referer field), and processed further to e.g. generate the referer_class field in the pageview_hourly table, or the external referrals data exposed in this dashboard or the public Clickstream datasets.

Wed, Aug 1, 5:29 PM · Spike, Wikipedia-iOS-App-Backlog
Tbayer updated the task description for T200111: Investigate the spikes in average user return time in Indonesia and Bangladesh on Wikipedia.
Wed, Aug 1, 2:05 PM · Product-Analytics, Reading-analysis

Tue, Jul 31

Tbayer added a comment to T200810: Make it possible to A/B test different section headings on mobile web.

If needed I am happy to personally change the heading on EN WP from one to the other option as part of this test. We could maybe start by looking at 3 articles?

Cool - I think it will be rather easy to scale this after the first article, but the instrumentation and infrastructure would need to be in place already.

Tue, Jul 31, 5:41 PM · Reading Epics (Analytics), Epic, Readers-Web-Backlog
Tbayer updated the task description for T200792: Run A/B test on page issues.
Tue, Jul 31, 4:35 PM · Wikimedia-Site-requests, Readers-Web-Backlog
Tbayer created T200810: Make it possible to A/B test different section headings on mobile web.
Tue, Jul 31, 3:57 PM · Reading Epics (Analytics), Epic, Readers-Web-Backlog
Tbayer awarded T135908: Add a possibility to delete a draft a Cookie token.
Tue, Jul 31, 6:32 AM · Quarry
Tbayer added a comment to T191132: Investigate search behavior changes resulting from changes to the mobile Hindi Wikipedia main page.

Thanks @chelsyx . Apologies for the very slow response time on this, I've been OOO on sabbatical and just back now.

There are a couple of things I'd like to understand more here.

[...]

App searches seem incorrectly high as compared to desktop or mobile web. In your first screenshot, they look like 2-4x more searches from the app, but the web's traffic overall looks like about 100x that of the app, so I'm finding this hard to believe. Can you clarify what's going on here?

Agree that it would be good to know the answer to this question. (Also, are the data source and queries used for this analysis documented somewhere? Does it have to do with the dataset mentioned at https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Cirrus ?)

Tue, Jul 31, 6:19 AM · Product-Analytics, Discovery-Analysis, New-Readers
Tbayer added a comment to T191405: Mark interwiki links as external links when they fall outside the WMF family.

This request is a bit too much Wikimedia-centric. The idea of the interwiki map is to hold together all the MediaWiki wikis in an collaborative division of labour (http://meatballwiki.org/wiki/InterWiki ).

Tue, Jul 31, 5:32 AM · MediaWiki-Parser, MediaWiki-Interface, MediaWiki-extensions-Interwiki
Tbayer awarded T199517: Investigate June Unique devices increase of 170% for wikidata a Evil Spooky Haunted Tree token.
Tue, Jul 31, 2:36 AM · WMDE-Analytics-Engineering, Analytics, User-Addshore, Wikidata
Tbayer added a comment to T116515: Enable embedding of media from Wikimedia Commons.

maybe we need to adjust content security policy?

Tue, Jul 31, 2:21 AM · RelEng-Archive-FY201718-Q1, Phabricator (2017-06-01)
Tbayer created T200757: Embedded Commons videos are broken.
Tue, Jul 31, 2:20 AM · Phabricator
Tbayer added a comment to T186246: Enable image hotlinking.

Actually T116515 had been about images in the first place (from Commons) - it somehow morphed into a task about videos. I have (re-)filed T199407: Enable embedding of images from Wikimedia Commons as a subtask of this one.

Tue, Jul 31, 2:11 AM · Phabricator

Wed, Jul 25

Tbayer added a comment to T199726: Understand if ru.wikipedia main page changes had impact on mobile users.

What, if any, impact did this have on browsing behavior? Similar question to T191354

That task was about the impact of the video campaign on search, not the impact of the Hindi main page changes. For the latter, there is T191132.

Wed, Jul 25, 6:25 AM · Product-Analytics, New-Readers

Sat, Jul 21

Tbayer reopened T200052: Mandate that account passwords must be a minimum of eight characters on Wikimedia projects as "Open".

duplicate of T32574 - there is a patch which implements password lenght attached to T32574. this does not display yet a password meter.

I'm not sure how this is a duplicate of T32574: Display a password strength bar...

Sat, Jul 21, 7:06 PM · Security, Wikimedia-General-or-Unknown

Fri, Jul 20

Tbayer added a comment to T191532: Mobile page issues - instrument page issues.

...

@Tbayer a question has been raised about the values for editCount being a string like "1-4 edits" instead of numbers like 0, 5, 100 etc. Would it make a difference if these values were numbers, in terms of ease of parsing or analyzing?
https://gerrit.wikimedia.org/r/c/mediawiki/skins/MinervaNeue/+/445184/6/resources/skins.minerva.scripts/cleanuptemplates.js#36

Fri, Jul 20, 9:48 PM · MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Audiences-QA, Readers-Web-Kanbanana-Board, Readers-Web-Backlog, Page-Issue-Warnings
Tbayer added a comment to T185584: Understanding main page traffic around Hindi video campaign.

I now understand that this task has been worked on and a report may already exist, but since there had been no activity here since April, I had ran a couple of queries myself earlier this month to be safe, seeing that the data was going to expire. I'm leaving some raw results below, for the record and just in case. Note that @chelsyx had already posted results about pageviews as part of the separate "understanding search" task: T191354#4176867

Fri, Jul 20, 2:31 AM · New-Readers, Product-Analytics, Reading-analysis

Wed, Jul 18

Tbayer added a comment to T195349: Remove bugs from Analysis who is affected by edit conflicts..

@Lea_WMDE

By allowing user to be not created by self ( removing event_user_is_created_by_self = true from the HiveQL query) and encompassing user revisions that were subsequently deleted (removing revision_is_deleted = false from the HiveQL query) I was able to lower the critical percent of those who have encountered an edit conflict without having any revisions to 2.08% only:

EditsGroup TotalConflicts Average Median Percentage
G0                   2232    1.33      1       2.08
G1                   8024    1.22      1       7.48
G2                   9372    1.44      1       8.74
G3                   2900    1.81      1       2.7 
G4                  84710    5.00      2      79.0

Once again, the EditGroups are the following:
G0 means 0 Edits or not found in the wmf.mediawiki_history table, G1 is 1 - 10 edits, G2 is 11 - 100 edits, G3 is 101 - 200 edits, and G4 is > 200 edits.

Is this data from the EditConflict schema? (Or if it is from TwoColConflictConflict, does that schema capture all edit conflicts, or only those where the new two-column feature was used?)

Wed, Jul 18, 2:01 PM · TCB-Team, Two-Column-Edit-Conflict-Merge, User-GoranSMilovanovic
Tbayer closed T184793: [EPIC] Instrument page interactions as Resolved.
Wed, Jul 18, 9:32 AM · Product-Analytics, Epic, Reading-analysis, MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), Readers-Web-Kanbanana-Board, Page-Previews, Readers-Web-Backlog
Tbayer closed T184793: [EPIC] Instrument page interactions, a subtask of T154635: [EPIC] Deploy page previews to English and German Wikipedia, as Resolved.
Wed, Jul 18, 9:32 AM · Readers-Web-Kanbanana-Board, Epic, Wikimedia-Site-requests, Documentation, Page-Previews, Readers-Web-Backlog
Tbayer closed T184793: [EPIC] Instrument page interactions, a subtask of T173952: Remove A/B testing instrumentation code, as Resolved.
Wed, Jul 18, 9:32 AM · MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), Patch-For-Review, Readers-Web-Kanbanana-Board, Technical-Debt, Readers-Web-Backlog, Page-Previews
Tbayer closed T184793: [EPIC] Instrument page interactions, a subtask of T184801: Remove A/B testing code and EventLogging instrumentation, as Resolved.
Wed, Jul 18, 9:32 AM · Page-Previews, Readers-Web-Backlog
Pirroh awarded T156980: Ability to view other users' notebooks a Yellow Medal token.
Wed, Jul 18, 12:25 AM · PAWS

Jul 13 2018

Tbayer changed the edit policy for T199157: [Spike ??hrs] Sticky header instrumentation.
Jul 13 2018, 4:39 PM · Analytics, Readers-Web-Backlog, MinervaNeue, Design

Jul 12 2018

Tbayer closed T148287: Outreachy Project Proposal - 5 subprojects with the Reading Department as Resolved.

This was a successful internship where @Zareenf did very valuable work in various areas. We should long ago have updated this ticket with details about this work, listing the various tasks that got done - but we haven't gotten around to that in a while, so I'm closing this ticket for now to reflect the conclusion of the internship.

Jul 12 2018, 11:16 PM · Product-Analytics, Outreachy (Round-13), Reading-analysis
Tbayer closed T148287: Outreachy Project Proposal - 5 subprojects with the Reading Department, a subtask of T148260: Investigate frequency of section titles in Wikipedia articles, as Resolved.
Jul 12 2018, 11:16 PM · Product-Analytics, Reading-analysis
Tbayer added a comment to T193578: Assess impact of ua-parser update on core metrics.

Ping @Tbayer let's mark this as resolved?

We got a lot of good information here (thanks again, also for sharing SWAP notebook!)
But the three questions spelled out in the task description are still not yet marked as resolved. If I have overlooked the answers, please feel free to point that out and tick the corresponding checkboxes (preferably also linking the answers in the task description so folks can find them easily).

Jul 12 2018, 7:55 PM · Reading-analysis, Product-Analytics, Analytics
phuedx awarded T186728: Record and aggregate page previews a Love token.
Jul 12 2018, 12:42 PM · MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), Patch-For-Review, Analytics-Kanban
Tbayer added a comment to T116515: Enable embedding of media from Wikimedia Commons.

BTW, the example video above in T116515#3309596 doesn't work for me right now, in either Chromium ("Requests to the server have been blocked by an extension." - even when I have no browser extension enabled) or Firefox.

Jul 12 2018, 6:32 AM · RelEng-Archive-FY201718-Q1, Phabricator (2017-06-01)
Tbayer added a comment to T116515: Enable embedding of media from Wikimedia Commons.

This task was originally mainly about images, but then morphed into a task about videos and was closed as such. I have split the image part off into T199407.

Jul 12 2018, 6:30 AM · RelEng-Archive-FY201718-Q1, Phabricator (2017-06-01)
Tbayer added a parent task for T199407: Enable embedding of images from Wikimedia Commons: T186246: Enable image hotlinking.
Jul 12 2018, 6:29 AM · Phabricator
Tbayer added a subtask for T186246: Enable image hotlinking: T199407: Enable embedding of images from Wikimedia Commons.
Jul 12 2018, 6:29 AM · Phabricator
Tbayer created T199407: Enable embedding of images from Wikimedia Commons.
Jul 12 2018, 6:27 AM · Phabricator
Tbayer updated the task description for T193524: Publish data on seen page previews.
Jul 12 2018, 5:55 AM · Analytics
Tbayer added a comment to T186728: Record and aggregate page previews.

BTW, I started a documentation page about this dataset at https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Virtualpageview_hourly .

Jul 12 2018, 3:00 AM · MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), Patch-For-Review, Analytics-Kanban
Tbayer added a comment to T193505: Remove MobileWebMainMenuClickTracking schema from mobile.

A quick (totally non-exhaustive) check of the current data:

Jul 12 2018, 2:20 AM · Readers-Web-Backlog, MinervaNeue (Desktop)
Tbayer added a comment to T193505: Remove MobileWebMainMenuClickTracking schema from mobile.

That was a different schema (MobileWebSettings). This schema in MobileWebMainMenuClickTracking - which has been neglected, probably doesn't work and appears to be unused.

A schema named MobileWebSettings doesn't seem to exist, did you mean something else?

Unless you or @Tbayer have used this schema in the last 6 months, I propose we remove this code from mobile.

Thanks for the ping! I don't recall having used it so far. That said, an EL schema doesn't have to be used regularly to be valuable. It's often very useful to be able to resort to an existing schema for a new data question instead of having to build a new instrumentation. How likely is it that the instrumentation is broken at this point?

Jul 12 2018, 1:11 AM · Readers-Web-Backlog, MinervaNeue (Desktop)
Tbayer raised the priority of T193505: Remove MobileWebMainMenuClickTracking schema from mobile from Low to Normal.
Jul 12 2018, 12:56 AM · Readers-Web-Backlog, MinervaNeue (Desktop)
Tbayer reopened T193505: Remove MobileWebMainMenuClickTracking schema from mobile as "Open".

CCing @phuedx as the other maintainer of this schema (see schema page)

Jul 12 2018, 12:55 AM · Readers-Web-Backlog, MinervaNeue (Desktop)
Tbayer closed T193505: Remove MobileWebMainMenuClickTracking schema from mobile as Declined.
Jul 12 2018, 12:54 AM · Readers-Web-Backlog, MinervaNeue (Desktop)

Jul 11 2018

Tbayer added a comment to T191532: Mobile page issues - instrument page issues.

PS regarding the task description changes in T191532#4395096 about the logging of issue type and severity level:
These were based on the standup conversations earlier that day, where there was a sense that issue type would be hard to get, but that the severity level is already available in more accessible form.

Jul 11 2018, 10:53 PM · MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Audiences-QA, Readers-Web-Kanbanana-Board, Readers-Web-Backlog, Page-Issue-Warnings
Tbayer awarded T198974: Rate-limit is too harsh and affects human users a Evil Spooky Haunted Tree token.
Jul 11 2018, 10:08 PM · Patch-For-Review, Phabricator
Tbayer updated the task description for T199354: Replace namespace re-definitions and hardcoded numbers with .
Jul 11 2018, 6:27 PM · Page-Previews, MinervaNeue, MobileFrontend, Technical-Debt, Readers-Web-Backlog
Tbayer awarded T198908: Alarms on throughput on camus imported data a Like token.
Jul 11 2018, 9:40 AM · Analytics-Kanban, Patch-For-Review, Wikimedia-Incident, cloud-services-team (Kanban), Analytics

Jul 10 2018

Tbayer added a comment to T148461: Bot Identification: Inconsistent data in #all-sites-by-os-and-browser for IE7.

Following up on this, our prior version of ua-parser was missclassifying this traffic as IE7, the traffic looks automated in nature but the true classification of the user agent has shifted from IE7 to (mostly) IE11

Jul 10 2018, 3:14 PM · Analytics
Tbayer added a comment to T198612: Exclude WMDE/WMF IP from rate limiting / throttling.

This happened again about an hour ago:

Jul 10 2018, 12:37 AM · Patch-For-Review, Phabricator

Jul 9 2018

Tbayer added a comment to T196113: Update Audiences page and Key Product Metrics with May 2018 Readers data.

Over to you, @mpopov and @chelsyx ;)

Jul 9 2018, 8:14 PM · Product-Analytics
Tbayer updated the task description for T196113: Update Audiences page and Key Product Metrics with May 2018 Readers data.
Jul 9 2018, 8:12 PM · Product-Analytics
Tbayer added a comment to T172009: Add referer to WebrequestData.

To clarify, I assume that this is separate from the general HTTP referrer header that is already recorded in the referer field in the webrequest data.

Jul 9 2018, 7:04 PM · Product-Analytics, Analytics, Discovery-Analysis, Discovery

Jul 6 2018

Tbayer added a comment to T193578: Assess impact of ua-parser update on core metrics.

@Tbayer ok, so getting IE traffic on these countries for the months previous to the update, we can see that the traffic for IE11 was about 1 - 1.5% of all those pageviews.


I think the right approach is just to assume that 98.5% of the IE traffic in those countries is bogus. Reparsing ua strings with the old regexes in webrequest will be an super long query that would be carrying pretty much the same error as this approach.

Basically, we're going from assuming that all IE7 pageviews are false, to assuming that 98.5% of IE11 pageviews are false.

Thanks. I'm now trying out excluding all IE traffic from these countries (Iran, Pakistan, Afghanistan).
Excluding just IE11 would not seem sufficient, considering that (as you already indicated above in T193578#4242326 ) the traffic formerly classified as IE7 now falls into several different versions (e.g. besides IE11 also a substantial number for IE8, etc.):

Jul 6 2018, 9:54 PM · Reading-analysis, Product-Analytics, Analytics
Tbayer added a comment to T186044: Reorganize metrics dashboard for Search Platform.

Would this affect https://discovery-dev.wmflabs.org/external/ too?

Jul 6 2018, 3:05 AM · Product-Analytics, Discovery-Search (Current work)
Tbayer closed T196100: Traffic and content data for SDG impact report as Resolved.
Jul 6 2018, 3:04 AM · Product-Analytics
Tbayer added a comment to T196100: Traffic and content data for SDG impact report.

After some email discussion about possible metrics options, we settled on the number of mobile pageviews (i.e. views of Wikimedia sites using either our mobile web interface or our mobile apps) by country for 2016 and 2017, which I sent over last month as CSV files - attached here too for the record. These use ISO two-letter country codes, based on the Maxmind geoIP database. "--" means unknown.

Jul 6 2018, 2:54 AM · Product-Analytics

Jul 5 2018

Tbayer updated subscribers of T198612: Exclude WMDE/WMF IP from rate limiting / throttling.

WMF IPs could become affected too, once the San Francisco office opens this Monday morning.

This is happening now for myself and others (including @Neil_P._Quinn_WMF and @JKatzWMF):

Jul 5 2018, 8:21 PM · Patch-For-Review, Phabricator