Page MenuHomePhabricator

ERI Metrics: Measure click-through actions from RC page and create 'Productivity" baseline
Closed, ResolvedPublic

Description

The question here is, can we measure whether the new RC pages are more productive, in the sense that users are finding more edits that require some action? To do this, we'll need to measure whether users take an action on pages they click to from RC page.

We need to establish a baseline for RC Page tool usage before we release the beta. (Establishing the baseline after beta release would be bad, since many of our most active users will be in the beta.) This exercise will also flush out any issues with the tracking mechanisms we've put in place.

Proposed Productivity Metrics

  • Action Ratio: What is the ratio of clicks on edit results total vs. clicks that lead to a page where the user takes some specified set of actions (Revert, Undo, click to Edit the page...). Our hypothesis is that the closer the ratio is to 1:1, the better the tool is doing at helping users find the pages they're looking for.
  • Quality Filter Action Ratio Can we sort our results so that we know if people who used particular filters have higher Action Ratios? A high-value candidate here would be the ORES Quality filters. It would be very interesting to know whether users of these filters get more hits than other users.
  • Newcomer Action Ratio Along the same lines, it would be relevant to ERI success if we could see whether users of the Newcomer filter perform certain actions more or less often than others. I.e., do people tracking Newcomers Revert and Undo more or less? Do they Thank and hit Talk more or less?

Questions/Issues

  • For the Action Ratio, what are the set of "actions" that we'd want to count?
    1. Include: Edit, Undo, Thank, Rollback, Rollback Vandal, Talk. Don't include: click a link to go to any another page. What else?
    2. Can we track actions taken in Twinkle? Just clicking to launch the Twinkle is not a true indication of taking action, since the top-level Twinkle menu includes non-actions like "Last," which just shows the previous Diff. Can we do something like If action=Twinkle, record the next action? And then have a list of those that count?
    3. I don't think Mark as Patrolled should count. If all you do is mark a page as patrolled, that basically means you didn't find what you were looking for, doesn't it?
  • My understanding from @Catrope is that path analysis on our system is only really reliable for the first action after the user's click. So the metrics proposed above work within that limitation. If more sophisticated analyses are feasible, we can think more ambitiously. So, two questions:
    1. Are more complex analyses feasible? E.g., could we follow the reviewer for X number of clicks, to find out if any of those actions included the specified set (Revert, Undo, etc.) on the target page? E.g., could we know if the reviewer eventually Reverted, after checking some facts and diffs?
    2. If we can't do the more sophisticated analyses, do we believe the proposed metrics provide relative but useful measures of success? (I.e., even if we don't have a full picture of what users actually do, will we know if things got better?)
  • I'm thinking a week or a month might be the relevant period for this type of analysis, to avoid normal weekly rhythms.
  • Do we need to produce the baseline figures now for all wikis we will ever want to measure? Or will the data be available indefinitely?

Steps

  • We don't need to build a graphing tool out of the gate. Our goal, I think, is to produce a spreadsheet from which we can extract meaningful conclusions. If we want to automate analysis, we can do that later.

My sense of the best way to proceed is this:

  1. Investigate the issues, determine what is possible and how involved the project will be, then report back.
  2. If required, put in place whatever tools are necessary to get the data we want.
  3. Make a trial run at producing analysis for two of the ORES wikis, one large and one small. Say en.wiki and pl.wiki?
  4. Refine methodology/technology as needed.
  5. Rerun the analysis of the two wikis above.
  6. Determine a test set and acquire baseline figures, since figures will not be available indefinitely.

Event Timeline

jmatazzoni renamed this task from ERI Metrics: Measure click-through actions from RC page and create baseline to ERI Metrics: Measure click-through actions from RC page and create 'Productivity" baseline.Feb 18 2017, 12:20 AM
jmatazzoni updated the task description. (Show Details)

@Catrope, are you proposing to use EventLogging click-tracking, or something else? Would the click-tracking only be on the RC page, or also on other pages?

I am thinking about some possible complexities, such as Navigation popups.

I don't think Mark as Patrolled should count. If all you do is mark a page as patrolled, that basically means you didn't find what you were looking for, doesn't it?

Mark as Patrolled is a constructive action that other users depend on. However, it's indirect, so I see the argument for excluding it.

Do we need to produce the baseline figures now for all wikis we will ever want to measure? Or will the data be available indefinitely?

If we're using EventLogging, we need to measure the baseline first (even if we don't initially look at the data). If we don't measure it, we won't be able to retrieve it later from existing sources. Measuring it on all wikis is the same work as measuring it on a couple.

How long the data is available depends on how it's collected, and what information we collect. It might be cleared after 90 days, but it depends. See https://wikitech.wikimedia.org/wiki/Analytics/EventLogging/Data_retention_and_auto-purging#Purging_strategies .

@Catrope, are you proposing to use EventLogging click-tracking, or something else? Would the click-tracking only be on the RC page, or also on other pages?

I am thinking about some possible complexities, such as Navigation popups.

I was thinking we'd add something like ?fromrc=1 to the hrefs of links in the RC results area (links to pages, diffs, history, etc.), and then have EventLogging track clicks on specific links (e.g. edit, undo, rollback, patrol) if that query string parameter is set. It would not be on the RC page itself, only other pages, and we could probably limit it to diffs and history (although new page patrol works by clicking a patrol link on the page view itself, so maybe we want that too?).

You're right that Navigation popups could break that strategy because the links it generate wouldn't have ?fromrc=1 in them. On the other side, Twinkle could cause trouble because we might not be able to track clicks on links it presents (or if we could, we probably couldn't interpret what those clicks mean).

Re click-tracking on the RC page itself, we should probably also include clicks on the rollback links that are presented directly on the RC page.

@Mattflaschen-WMF and others, any thoughts about this "query string param then EventLog specific link clicks" strategy?

Re cross-referencing the results against filter usage: yes, we can do that. We'd need to record the session ID to then cross-reference that against the filter data (see T160928), and we'd only be able to cross-reference data that's less than 90 days old.

Change 344291 had a related patch set uploaded (by Catrope):
[mediawiki/extensions/WikimediaEvents@master] RCFilters: Log click actions on RC page and on pages linked from there

https://gerrit.wikimedia.org/r/344291

Change 344291 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] RCFilters: Log click actions on RC page and on pages linked from there

https://gerrit.wikimedia.org/r/344291

@Catrope In betalabs I do not see count for ChangesListClickTracking

@deployment-eventlogging03:/srv/log/eventlogging$ grep 'ChangesListClickTracking.*' all-events.log |wc
      0       0       0
@deployment-eventlogging03:/srv/log/eventlogging$ grep 'enhancedFiltersEnabled.*' all-events.log |wc
     24    1375   18351

enhancedFiltersEnabled gets counted immediately after any filter selection was made.

Re-checked - only betalabs wikidata provides recording to eventlogging. The count increases more or less immediately after a user clicks on any link on RC page.

A sample of a eventlog record - (userId is omitted)

$ grep 'ChangesListClickTracking.*' all-events.log

{"event": {"enhancedFiltersEnabled": false, "fromPage": "Recentchanges", "fromQuery": "", "linkType": "contribs", "sessionId": "8b25f6fe19b9c3b8", "userId": ***}, "recvFrom": "deployment-cache-text04.deployment-prep.eqiad.wmflabs", "revision": 16484895, "schema": "ChangesListClickTracking", "seqId": 1157926, "timestamp": 1491506511, "userAgent": "{\"os_minor\": \"10\", \"os_major\": \"10\", \"device_family\": \"Other\", \"os_family\": \"Mac OS X\", \"browser_minor\": \"0\", \"wmf_app_version\": \"-\", \"browser_major\": \"57\", \"browser_family\": \"Chrome\"}", "uuid": "301107e95d9855508f66ce7b80d08d1f", "webHost": "wikidata.beta.wmflabs.org", "wiki": "wikidatawiki"}

Not sure if that info is relevant:
deployment-eventlogging03:/var/log/upstart$ sudo grep 'Unable to validate*' eventlogging_processor-client-side-00.log

2017-04-06 18:02:25,095 [27312] (MainThread) Unable to validate: ?
%7B%22event%22%3A%7B%22pagename%22%3A%22Recentchanges%22%2C%22enhancedFiltersEnabled%22%3Atrue%2C%22userId%22%3A4462%2C%22hideliu%22%3Afalse%2C%22hideanons%22%3Atrue%2C%22userExpLevel%22%3A%22learner%22%2C%22hidemyself%22%3Afalse%2C%22hidebyothers%22%3Afalse%2C%22hidebots%22%3Atrue%2C%22hidehumans%22%3Afalse%2C%22hideminor%22%3Afalse%2C%22hidemajor%22%3Afalse%2C%22hidepageedits%22%3Afalse%2C%22hidenewpages%22%3Afalse%2C%22hidecategorization%22%3Afalse%2C%22hideWikibase%22%3Afalse%2C%22hidelog%22%3Afalse%7D%2C%22schema%22%3A%22ChangesListFilters%22%2C%22revision%22%3A16484266%2C%22clientValidated%22%3Afalse%2C%22wiki%22%3A%22enwiki%22%2C%22webHost%22%3A%22en.wikipedia.beta.wmflabs.org%22%2C%22userAgent%22%3A%22Mozilla%2F5.0%5Cu0020%28Macintosh%3B%5Cu0020Intel%5Cu0020Mac%5Cu0020OS%5Cu0020X%5Cu002010_10_5%29%5Cu0020AppleWebKit%2F537.36%5Cu0020%28KHTML%2C%5Cu0020like%5Cu0020Gecko%29%5Cu0020Chrome%2F57.0.2987.133%5Cu0020Safari%2F537.36%22%7D;	deployment-cache-text04.deployment-prep.eqiad.wmflabs	1157157	2017-04-06T18:02:25	-	"MediaWiki/1.29.0-alpha" (Additional properties are not allowed (u'userExpLevel' was unexpected))

I was thinking we'd add something like ?fromrc=1 to the hrefs of links in the RC results area (links to pages, diffs, history, etc.), and then have EventLogging track clicks on specific links (e.g. edit, undo, rollback, patrol) if that query string parameter is set. It would not be on the RC page itself, only other pages, and we could probably limit it to diffs and history (although new page patrol works by clicking a patrol link on the page view itself, so maybe we want that too?).

apparently this was already implemented. unfortunately, this causes serious pain to some editors:
it seems that the ?fromrc=1 is appended to the href after the click, rather than when the page is constructed. as a result, the browser does not recognize the page as "visited", and does not color the link correctly.

BROKEN WORKFLOW :
some (many) patrollers work off "recent changes" page, where they click on the page (or on "prev", or any other "diff" link), examine the edit, and return to recentchanges page.
until this change, the "prev" link was painted as "visited", but now, the "visited" page has the "freomrc=1" parameter, while the link on the rc page does not. this makes the link non-visited, and the browser paints it as such.

please add the "fromrc=1" to the links when constructing the pages, rather than after the click.
(if you can read hebrew, see the original complain HERE.

peace.

BROKEN WORKFLOW :
some (many) patrollers work off "recent changes" page, where they click on the page (or on "prev", or any other "diff" link), examine the edit, and return to recentchanges page.
until this change, the "prev" link was painted as "visited", but now, the "visited" page has the "freomrc=1" parameter, while the link on the rc page does not. this makes the link non-visited, and the browser paints it as such.

Seconded. This is quite disruptive and makes rc patrolling almost impossible.

Change 348504 had a related patch set uploaded (by Catrope):
[mediawiki/extensions/WikimediaEvents@master] RecentChangesClicks: Don't modify URL in click handler

https://gerrit.wikimedia.org/r/348504

I had not thought about link visitedness at all when implementing this. My apologies for the disruption! (And for missing the first comment about it.)

I've uploaded a fix, but because of Wednesday's server switch there's a moratorium on non-emergency deployments this week. I don't know if this counts as an emergency, but I'll ask. If it doesn't, then this fix will reach hewiki on April 26th and other Wikipedias on April 27th.

Change 348504 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] RecentChangesClicks: Don't modify URL in click handler

https://gerrit.wikimedia.org/r/348504

Change 348626 had a related patch set uploaded (by Catrope):
[mediawiki/extensions/WikimediaEvents@wmf/1.29.0-wmf.20] RecentChangesClicks: Don't modify URL in click handler

https://gerrit.wikimedia.org/r/348626

Change 348626 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.29.0-wmf.20] RecentChangesClicks: Don't modify URL in click handler

https://gerrit.wikimedia.org/r/348626

Mentioned in SAL (#wikimedia-operations) [2017-04-17T22:08:30Z] <catrope@tin> Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/modules/ext.wikimediaEvents.recentChangesClicks.js: T158458 T163152 (duration: 16m 23s)

I was granted permission to deploy the fix, and tried to deploy it, but because of issues with the deployment server the deployment didn't fully work, and the fix is now half-deployed. I'm trying to get someone to fix that server so I can deploy it properly.

Mentioned in SAL (#wikimedia-operations) [2017-04-17T22:46:01Z] <catrope@tin> Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/modules/ext.wikimediaEvents.recentChangesClicks.js: T158458 T163152 (duration: 03m 01s)

I was granted permission to deploy the fix, and tried to deploy it, but because of issues with the deployment server the deployment didn't fully work, and the fix is now half-deployed. I'm trying to get someone to fix that server so I can deploy it properly.

This should be done now.

This should be done now.

Thank you!

Change 353210 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/WikimediaEvents@master] RecentChangesClicks: Address minor performance concerns

https://gerrit.wikimedia.org/r/353210

I had not thought about link visitedness at all when implementing this. My apologies for the disruption!

The Special:Search satisfaction instrumentation was changed at some point to use History#replaceState to silently change the url back to normal on the other side. I believe their main concern was people sharing the url from the address bar through social media to not have this (ugly) query parameter. However it might solve the visitedness as well, since that would make it part of the history.

I've verified this just now on test.wikipedia.org (revision). The page contains a link to Redirect and a link to Sandbox2. Use the browser Dev Tools to make the visited styles now obvious (e.g. a:visited { color:green; }). Clicking the link to Redirect will load a page at "/wiki/Redirect" which contains the content of Sandbox2 page with a note on top saying "Redirected from Redirect". There is also client-side JavaScript on that page rewriting the url using the History API to /wiki/Sandbox2. Once loaded in a background tab, both links became green.

Change 353210 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] RecentChangesClicks: Address minor performance concerns

https://gerrit.wikimedia.org/r/353210

Change 353211 had a related patch set uploaded (by Catrope; owner: Krinkle):
[mediawiki/extensions/WikimediaEvents@wmf/1.30.0-wmf.1] RecentChangesClicks: Address minor performance concerns

https://gerrit.wikimedia.org/r/353211

Change 353211 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.30.0-wmf.1] RecentChangesClicks: Address minor performance concerns

https://gerrit.wikimedia.org/r/353211

Mentioned in SAL (#wikimedia-operations) [2017-05-11T18:23:32Z] <thcipriani@tin> Synchronized php-1.30.0-wmf.1/extensions/WikimediaEvents/modules/ext.wikimediaEvents.recentChangesClicks.js: SWAT: [[gerrit:353211|RecentChangesClicks: Address minor performance concerns]] T158458 (duration: 00m 42s)

Change 359498 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/extensions/WikimediaEvents@master] Follow-up 6b83e12aee71: reenable fromrc handling, but without breaking visited links

https://gerrit.wikimedia.org/r/359498

Change 359498 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Follow-up 6b83e12aee71: reenable fromrc handling, but without breaking visited links

https://gerrit.wikimedia.org/r/359498

@Catrope What I checked in betalabs (if you provide some feedback-will be great):

(1) visited links - they are still visited, not changing back
(2) monitored for couple of days for recorded events and for live events (i.e. click and see if they are being recorded):
There is a time gap in /srv/log/eventlogging/all-events.log. e.g. on July 14, all-events.log showed:

  • First timestamp recorded: July 09
  • Last timestamp recorded: July 12

(3) No relevant validation errors (only MobileWikiAppProtectedEditAttempt and MobileWikiAppFindInPage schemas events)

/var/log/upstart$ sudo grep 'Unable to validate.*' eventlogging_processor-client-side-00.log

(4) Checked for the following:

@deployment-eventlogging03:/srv/log/eventlogging$ grep  'enhancedFiltersEnabled.*' all-events.log |wc 
   1320   68516  930858

@deployment-eventlogging03:/srv/log/eventlogging$ grep  'ChangesListFilters.*' all-events.log | wc
   1320   68516  930858

@deployment-eventlogging03:/srv/log/eventlogging$ grep 'ChangesListClickTracking.*' all-events.log |wc
      0       0       0

@deployment-eventlogging03:/srv/log/eventlogging$ grep 'ChangesListHighlights.*' all-events.log |wc
      5     237    3268

@deployment-eventlogging03:/srv/log/eventlogging$ grep 'RecentChangesTopLinks.*' all-events.log |wc
      0       0       0

Re-checked (2) - the updates are immediately shown; no time gaps.

Re-checked (4) - count for the tracking events on RC page seems to be correct

@deployment-eventlogging03:/srv/log/eventlogging$ grep  'ChangesListFilters.*' all-events.log | wc
   1351   68386  944918
@deployment-eventlogging03:/srv/log/eventlogging$ grep 'ChangesListClickTracking.*' all-events.log |wc
      8     416    6501
@deployment-eventlogging03:/srv/log/eventlogging$ grep 'ChangesListHighlights.*' all-events.log |wc
     34    1648   22198
@deployment-eventlogging03:/srv/log/eventlogging$ grep 'RecentChangesTopLinks.*' all-events.log |wc
      3     141    2116

QA Recommendation: Resolve

@Catrope I see this has moved to my column, which means it's done. That's great, but I'm not sure how I get the results, so I can review them. Can you show me?