ERI Metrics: Measure click-through actions from RC page and create 'Productivity" baseline
Closed, ResolvedPublic
Actions

Description

The question here is, can we measure whether the new RC pages are more productive, in the sense that users are finding more edits that require some action? To do this, we'll need to measure whether users take an action on pages they click to from RC page.

We need to establish a baseline for RC Page tool usage before we release the beta. (Establishing the baseline after beta release would be bad, since many of our most active users will be in the beta.) This exercise will also flush out any issues with the tracking mechanisms we've put in place.

Proposed Productivity Metrics

Action Ratio: What is the ratio of clicks on edit results total vs. clicks that lead to a page where the user takes some specified set of actions (Revert, Undo, click to Edit the page...). Our hypothesis is that the closer the ratio is to 1:1, the better the tool is doing at helping users find the pages they're looking for.
Quality Filter Action Ratio Can we sort our results so that we know if people who used particular filters have higher Action Ratios? A high-value candidate here would be the ORES Quality filters. It would be very interesting to know whether users of these filters get more hits than other users.
Newcomer Action Ratio Along the same lines, it would be relevant to ERI success if we could see whether users of the Newcomer filter perform certain actions more or less often than others. I.e., do people tracking Newcomers Revert and Undo more or less? Do they Thank and hit Talk more or less?

Questions/Issues

For the Action Ratio, what are the set of "actions" that we'd want to count?
1. Include: Edit, Undo, Thank, Rollback, Rollback Vandal, Talk. Don't include: click a link to go to any another page. What else?
2. Can we track actions taken in Twinkle? Just clicking to launch the Twinkle is not a true indication of taking action, since the top-level Twinkle menu includes non-actions like "Last," which just shows the previous Diff. Can we do something like If action=Twinkle, record the next action? And then have a list of those that count?
3. I don't think Mark as Patrolled should count. If all you do is mark a page as patrolled, that basically means you didn't find what you were looking for, doesn't it?
My understanding from @Catrope is that path analysis on our system is only really reliable for the first action after the user's click. So the metrics proposed above work within that limitation. If more sophisticated analyses are feasible, we can think more ambitiously. So, two questions:
1. Are more complex analyses feasible? E.g., could we follow the reviewer for X number of clicks, to find out if any of those actions included the specified set (Revert, Undo, etc.) on the target page? E.g., could we know if the reviewer eventually Reverted, after checking some facts and diffs?
2. If we can't do the more sophisticated analyses, do we believe the proposed metrics provide relative but useful measures of success? (I.e., even if we don't have a full picture of what users actually do, will we know if things got better?)
I'm thinking a week or a month might be the relevant period for this type of analysis, to avoid normal weekly rhythms.
Do we need to produce the baseline figures now for all wikis we will ever want to measure? Or will the data be available indefinitely?

Steps

We don't need to build a graphing tool out of the gate. Our goal, I think, is to produce a spreadsheet from which we can extract meaningful conclusions. If we want to automate analysis, we can do that later.

My sense of the best way to proceed is this:

Investigate the issues, determine what is possible and how involved the project will be, then report back.
If required, put in place whatever tools are necessary to get the data we want.
Make a trial run at producing analysis for two of the ORES wikis, one large and one small. Say en.wiki and pl.wiki?
Refine methodology/technology as needed.
Rerun the analysis of the two wikis above.
Determine a test set and acquire baseline figures, since figures will not be available indefinitely.

Details

Subject	Repo	Branch	Lines +/-
Follow-up 6b83e12aee71: reenable fromrc handling, but without breaking visited links	mediawiki/extensions/WikimediaEvents	master	+7 -4
RecentChangesClicks: Address minor performance concerns	mediawiki/extensions/WikimediaEvents	wmf/1.30.0-wmf.1	+31 -32
RecentChangesClicks: Address minor performance concerns	mediawiki/extensions/WikimediaEvents	master	+31 -32
RecentChangesClicks: Don't modify URL in click handler	mediawiki/extensions/WikimediaEvents	wmf/1.29.0-wmf.20	+4 -3
RecentChangesClicks: Don't modify URL in click handler	mediawiki/extensions/WikimediaEvents	master	+4 -3
RCFilters: Log click actions on RC page and on pages linked from there	mediawiki/extensions/WikimediaEvents	master	+110 -2

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Catrope	T158458 ERI Metrics: Measure click-through actions from RC page and create 'Productivity" baseline
		Resolved		SBisson	T158344 Add technology so we'll be able to track usage of RC Page highlighting

Event Timeline

• jmatazzoni created this task.Feb 18 2017, 12:11 AM

• jmatazzoni updated the task description. (Show Details)Feb 18 2017, 12:13 AM

• jmatazzoni updated the task description. (Show Details)

• jmatazzoni updated the task description. (Show Details)Feb 18 2017, 12:16 AM

• jmatazzoni moved this task from Untriaged to Ready for Pickup on the Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017) board.

• jmatazzoni renamed this task from ERI Metrics: Measure click-through actions from RC page and create baseline to ERI Metrics: Measure click-through actions from RC page and create 'Productivity" baseline.Feb 18 2017, 12:20 AM

• jmatazzoni updated the task description. (Show Details)

@Catrope, are you proposing to use EventLogging click-tracking, or something else? Would the click-tracking only be on the RC page, or also on other pages?

I am thinking about some possible complexities, such as Navigation popups.

I don't think Mark as Patrolled should count. If all you do is mark a page as patrolled, that basically means you didn't find what you were looking for, doesn't it?

Mark as Patrolled is a constructive action that other users depend on. However, it's indirect, so I see the argument for excluding it.

Do we need to produce the baseline figures now for all wikis we will ever want to measure? Or will the data be available indefinitely?

If we're using EventLogging, we need to measure the baseline first (even if we don't initially look at the data). If we don't measure it, we won't be able to retrieve it later from existing sources. Measuring it on all wikis is the same work as measuring it on a couple.

How long the data is available depends on how it's collected, and what information we collect. It might be cleared after 90 days, but it depends. See https://wikitech.wikimedia.org/wiki/Analytics/EventLogging/Data_retention_and_auto-purging#Purging_strategies .

• jmatazzoni updated the task description. (Show Details)Feb 21 2017, 7:54 PM

• jmatazzoni mentioned this in T158344: Add technology so we'll be able to track usage of RC Page highlighting.Feb 22 2017, 10:40 PM

• jmatazzoni added a subtask: T158344: Add technology so we'll be able to track usage of RC Page highlighting.

Jdforrester-WMF assigned this task to Catrope.Mar 18 2017, 1:52 AM

In T158458#3043942, @Mattflaschen-WMF wrote:

@Catrope, are you proposing to use EventLogging click-tracking, or something else? Would the click-tracking only be on the RC page, or also on other pages?

I am thinking about some possible complexities, such as Navigation popups.

I was thinking we'd add something like ?fromrc=1 to the hrefs of links in the RC results area (links to pages, diffs, history, etc.), and then have EventLogging track clicks on specific links (e.g. edit, undo, rollback, patrol) if that query string parameter is set. It would not be on the RC page itself, only other pages, and we could probably limit it to diffs and history (although new page patrol works by clicking a patrol link on the page view itself, so maybe we want that too?).

You're right that Navigation popups could break that strategy because the links it generate wouldn't have ?fromrc=1 in them. On the other side, Twinkle could cause trouble because we might not be able to track clicks on links it presents (or if we could, we probably couldn't interpret what those clicks mean).

Re click-tracking on the RC page itself, we should probably also include clicks on the rollback links that are presented directly on the RC page.

@Mattflaschen-WMF and others, any thoughts about this "query string param then EventLog specific link clicks" strategy?

Re cross-referencing the results against filter usage: yes, we can do that. We'd need to record the session ID to then cross-reference that against the filter data (see T160928), and we'd only be able to cross-reference data that's less than 90 days old.

Catrope removed Catrope as the assignee of this task.Mar 20 2017, 9:36 PM

Change 344291 had a related patch set uploaded (by Catrope):
[mediawiki/extensions/WikimediaEvents@master] RCFilters: Log click actions on RC page and on pages linked from there

https://gerrit.wikimedia.org/r/344291

gerritbot added a project: Patch-For-Review.Mar 23 2017, 1:59 AM

Catrope claimed this task.Mar 23 2017, 2:00 AM

Catrope moved this task from Ready for Pickup to Needs Review on the Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017) board.

• jmatazzoni closed subtask T158344: Add technology so we'll be able to track usage of RC Page highlighting as Resolved.Mar 23 2017, 8:31 PM

Change 344291 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] RCFilters: Log click actions on RC page and on pages linked from there

https://gerrit.wikimedia.org/r/344291

SBisson moved this task from Needs Review to QA Review on the Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017) board.Mar 29 2017, 2:12 PM

ReleaseTaggerBot added a project: MW-1.29-release (WMF-deploy-2017-04-04_(1.29.0-wmf.19)).Mar 29 2017, 3:00 PM

@Catrope In betalabs I do not see count for ChangesListClickTracking

@deployment-eventlogging03:/srv/log/eventlogging$ grep 'ChangesListClickTracking.*' all-events.log |wc
      0       0       0
@deployment-eventlogging03:/srv/log/eventlogging$ grep 'enhancedFiltersEnabled.*' all-events.log |wc
     24    1375   18351

enhancedFiltersEnabled gets counted immediately after any filter selection was made.

Re-checked - only betalabs wikidata provides recording to eventlogging. The count increases more or less immediately after a user clicks on any link on RC page.

A sample of a eventlog record - (userId is omitted)

$ grep 'ChangesListClickTracking.*' all-events.log

{"event": {"enhancedFiltersEnabled": false, "fromPage": "Recentchanges", "fromQuery": "", "linkType": "contribs", "sessionId": "8b25f6fe19b9c3b8", "userId": ***}, "recvFrom": "deployment-cache-text04.deployment-prep.eqiad.wmflabs", "revision": 16484895, "schema": "ChangesListClickTracking", "seqId": 1157926, "timestamp": 1491506511, "userAgent": "{\"os_minor\": \"10\", \"os_major\": \"10\", \"device_family\": \"Other\", \"os_family\": \"Mac OS X\", \"browser_minor\": \"0\", \"wmf_app_version\": \"-\", \"browser_major\": \"57\", \"browser_family\": \"Chrome\"}", "uuid": "301107e95d9855508f66ce7b80d08d1f", "webHost": "wikidata.beta.wmflabs.org", "wiki": "wikidatawiki"}

Not sure if that info is relevant:
deployment-eventlogging03:/var/log/upstart$ sudo grep 'Unable to validate*' eventlogging_processor-client-side-00.log

2017-04-06 18:02:25,095 [27312] (MainThread) Unable to validate: ?
%7B%22event%22%3A%7B%22pagename%22%3A%22Recentchanges%22%2C%22enhancedFiltersEnabled%22%3Atrue%2C%22userId%22%3A4462%2C%22hideliu%22%3Afalse%2C%22hideanons%22%3Atrue%2C%22userExpLevel%22%3A%22learner%22%2C%22hidemyself%22%3Afalse%2C%22hidebyothers%22%3Afalse%2C%22hidebots%22%3Atrue%2C%22hidehumans%22%3Afalse%2C%22hideminor%22%3Afalse%2C%22hidemajor%22%3Afalse%2C%22hidepageedits%22%3Afalse%2C%22hidenewpages%22%3Afalse%2C%22hidecategorization%22%3Afalse%2C%22hideWikibase%22%3Afalse%2C%22hidelog%22%3Afalse%7D%2C%22schema%22%3A%22ChangesListFilters%22%2C%22revision%22%3A16484266%2C%22clientValidated%22%3Afalse%2C%22wiki%22%3A%22enwiki%22%2C%22webHost%22%3A%22en.wikipedia.beta.wmflabs.org%22%2C%22userAgent%22%3A%22Mozilla%2F5.0%5Cu0020%28Macintosh%3B%5Cu0020Intel%5Cu0020Mac%5Cu0020OS%5Cu0020X%5Cu002010_10_5%29%5Cu0020AppleWebKit%2F537.36%5Cu0020%28KHTML%2C%5Cu0020like%5Cu0020Gecko%29%5Cu0020Chrome%2F57.0.2987.133%5Cu0020Safari%2F537.36%22%7D;	deployment-cache-text04.deployment-prep.eqiad.wmflabs	1157157	2017-04-06T18:02:25	-	"MediaWiki/1.29.0-alpha" (Additional properties are not allowed (u'userExpLevel' was unexpected))

Etonkovidova moved this task from QA Review to In Development on the Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017) board.Apr 6 2017, 7:52 PM

I was thinking we'd add something like ?fromrc=1 to the hrefs of links in the RC results area (links to pages, diffs, history, etc.), and then have EventLogging track clicks on specific links (e.g. edit, undo, rollback, patrol) if that query string parameter is set. It would not be on the RC page itself, only other pages, and we could probably limit it to diffs and history (although new page patrol works by clicking a patrol link on the page view itself, so maybe we want that too?).

apparently this was already implemented. unfortunately, this causes serious pain to some editors:
it seems that the ?fromrc=1 is appended to the href after the click, rather than when the page is constructed. as a result, the browser does not recognize the page as "visited", and does not color the link correctly.

BROKEN WORKFLOW :
some (many) patrollers work off "recent changes" page, where they click on the page (or on "prev", or any other "diff" link), examine the edit, and return to recentchanges page.
until this change, the "prev" link was painted as "visited", but now, the "visited" page has the "freomrc=1" parameter, while the link on the rc page does not. this makes the link non-visited, and the browser paints it as such.

please add the "fromrc=1" to the links when constructing the pages, rather than after the click.
(if you can read hebrew, see the original complain HERE.

peace.

In T158458#3161869, @Kipod wrote:

BROKEN WORKFLOW :
some (many) patrollers work off "recent changes" page, where they click on the page (or on "prev", or any other "diff" link), examine the edit, and return to recentchanges page.
until this change, the "prev" link was painted as "visited", but now, the "visited" page has the "freomrc=1" parameter, while the link on the rc page does not. this makes the link non-visited, and the browser paints it as such.

Seconded. This is quite disruptive and makes rc patrolling almost impossible.

Change 348504 had a related patch set uploaded (by Catrope):
[mediawiki/extensions/WikimediaEvents@master] RecentChangesClicks: Don't modify URL in click handler

https://gerrit.wikimedia.org/r/348504

I had not thought about link visitedness at all when implementing this. My apologies for the disruption! (And for missing the first comment about it.)

I've uploaded a fix, but because of Wednesday's server switch there's a moratorium on non-emergency deployments this week. I don't know if this counts as an emergency, but I'll ask. If it doesn't, then this fix will reach hewiki on April 26th and other Wikipedias on April 27th.

Change 348504 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] RecentChangesClicks: Don't modify URL in click handler

https://gerrit.wikimedia.org/r/348504

ReleaseTaggerBot edited projects, added MW-1.29-release (WMF-deploy-2017-04-25_(1.29.0-wmf.21)); removed MW-1.29-release (WMF-deploy-2017-04-04_(1.29.0-wmf.19)).Apr 17 2017, 7:00 PM

Cirdan mentioned this in T163152: ERI Metrics fromrc=1 URL-extension breaks heavily used admin script .Apr 17 2017, 8:50 PM

Change 348626 had a related patch set uploaded (by Catrope):
[mediawiki/extensions/WikimediaEvents@wmf/1.29.0-wmf.20] RecentChangesClicks: Don't modify URL in click handler

https://gerrit.wikimedia.org/r/348626

Change 348626 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.29.0-wmf.20] RecentChangesClicks: Don't modify URL in click handler

https://gerrit.wikimedia.org/r/348626

ReleaseTaggerBot edited projects, added MW-1.29-release (WMF-deploy-2017-04-11_(1.29.0-wmf.20)); removed MW-1.29-release (WMF-deploy-2017-04-25_(1.29.0-wmf.21)).Apr 17 2017, 10:00 PM

Mentioned in SAL (#wikimedia-operations) [2017-04-17T22:08:30Z] <catrope@tin> Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/modules/ext.wikimediaEvents.recentChangesClicks.js: T158458 T163152 (duration: 16m 23s)

I was granted permission to deploy the fix, and tried to deploy it, but because of issues with the deployment server the deployment didn't fully work, and the fix is now half-deployed. I'm trying to get someone to fix that server so I can deploy it properly.

Mentioned in SAL (#wikimedia-operations) [2017-04-17T22:46:01Z] <catrope@tin> Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/modules/ext.wikimediaEvents.recentChangesClicks.js: T158458 T163152 (duration: 03m 01s)

In T158458#3188054, @Catrope wrote:

I was granted permission to deploy the fix, and tried to deploy it, but because of issues with the deployment server the deployment didn't fully work, and the fix is now half-deployed. I'm trying to get someone to fix that server so I can deploy it properly.

This should be done now.

• jmatazzoni edited projects, added Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017); removed Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017).Apr 18 2017, 1:07 AM

• jmatazzoni moved this task from Untriaged to In Development on the Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017) board.

In T158458#3188256, @Catrope wrote:

This should be done now.

Thank you!

Change 353210 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/WikimediaEvents@master] RecentChangesClicks: Address minor performance concerns

https://gerrit.wikimedia.org/r/353210

In T158458#3187134, @Catrope wrote:

I had not thought about link visitedness at all when implementing this. My apologies for the disruption!

The Special:Search satisfaction instrumentation was changed at some point to use History#replaceState to silently change the url back to normal on the other side. I believe their main concern was people sharing the url from the address bar through social media to not have this (ugly) query parameter. However it might solve the visitedness as well, since that would make it part of the history.

I've verified this just now on test.wikipedia.org (revision). The page contains a link to Redirect and a link to Sandbox2. Use the browser Dev Tools to make the visited styles now obvious (e.g. a:visited { color:green; }). Clicking the link to Redirect will load a page at "/wiki/Redirect" which contains the content of Sandbox2 page with a note on top saying "Redirected from Redirect". There is also client-side JavaScript on that page rewriting the url using the History API to /wiki/Sandbox2. Once loaded in a background tab, both links became green.

Change 353210 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] RecentChangesClicks: Address minor performance concerns

https://gerrit.wikimedia.org/r/353210

Change 353211 had a related patch set uploaded (by Catrope; owner: Krinkle):
[mediawiki/extensions/WikimediaEvents@wmf/1.30.0-wmf.1] RecentChangesClicks: Address minor performance concerns

https://gerrit.wikimedia.org/r/353211

ReleaseTaggerBot added a project: MW-1.30-release-notes (WMF-deploy-2017-05-23_(1.30.0-wmf.2)).May 11 2017, 4:00 AM

Change 353211 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.30.0-wmf.1] RecentChangesClicks: Address minor performance concerns

https://gerrit.wikimedia.org/r/353211

Mentioned in SAL (#wikimedia-operations) [2017-05-11T18:23:32Z] <thcipriani@tin> Synchronized php-1.30.0-wmf.1/extensions/WikimediaEvents/modules/ext.wikimediaEvents.recentChangesClicks.js: SWAT: [[gerrit:353211|RecentChangesClicks: Address minor performance concerns]] T158458 (duration: 00m 42s)

ReleaseTaggerBot edited projects, added MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)); removed MW-1.30-release-notes (WMF-deploy-2017-05-23_(1.30.0-wmf.2)).May 11 2017, 7:00 PM

Krinkle removed a project: MW-1.29-release (WMF-deploy-2017-04-11_(1.29.0-wmf.20)).May 25 2017, 1:31 PM

Krinkle unsubscribed.May 25 2017, 3:06 PM

Change 359498 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/extensions/WikimediaEvents@master] Follow-up 6b83e12aee71: reenable fromrc handling, but without breaking visited links

https://gerrit.wikimedia.org/r/359498

Change 359498 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Follow-up 6b83e12aee71: reenable fromrc handling, but without breaking visited links

https://gerrit.wikimedia.org/r/359498

ReleaseTaggerBot edited projects, added MW-1.30-release-notes (WMF-deploy-2017-06-27_(1.30.0-wmf.7)); removed MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)).Jun 26 2017, 8:00 PM

Catrope moved this task from In Development to QA Review on the Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017) board.Jul 10 2017, 6:26 PM

• jmatazzoni edited projects, added Collaboration-Team-Triage (Collab-Team-This-Quarter); removed Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017).Jul 14 2017, 12:29 AM

• jmatazzoni moved this task from Untriaged to QA Review on the Collaboration-Team-Triage (Collab-Team-This-Quarter) board.

@Catrope What I checked in betalabs (if you provide some feedback-will be great):

(1) visited links - they are still visited, not changing back
(2) monitored for couple of days for recorded events and for live events (i.e. click and see if they are being recorded):
There is a time gap in /srv/log/eventlogging/all-events.log. e.g. on July 14, all-events.log showed:

First timestamp recorded: July 09
Last timestamp recorded: July 12

(3) No relevant validation errors (only MobileWikiAppProtectedEditAttempt and MobileWikiAppFindInPage schemas events)

/var/log/upstart$ sudo grep 'Unable to validate.*' eventlogging_processor-client-side-00.log

(4) Checked for the following:

@deployment-eventlogging03:/srv/log/eventlogging$ grep  'enhancedFiltersEnabled.*' all-events.log |wc 
   1320   68516  930858

@deployment-eventlogging03:/srv/log/eventlogging$ grep  'ChangesListFilters.*' all-events.log | wc
   1320   68516  930858

@deployment-eventlogging03:/srv/log/eventlogging$ grep 'ChangesListClickTracking.*' all-events.log |wc
      0       0       0

@deployment-eventlogging03:/srv/log/eventlogging$ grep 'ChangesListHighlights.*' all-events.log |wc
      5     237    3268

@deployment-eventlogging03:/srv/log/eventlogging$ grep 'RecentChangesTopLinks.*' all-events.log |wc
      0       0       0

Re-checked (2) - the updates are immediately shown; no time gaps.

Re-checked (4) - count for the tracking events on RC page seems to be correct

@deployment-eventlogging03:/srv/log/eventlogging$ grep  'ChangesListFilters.*' all-events.log | wc
   1351   68386  944918
@deployment-eventlogging03:/srv/log/eventlogging$ grep 'ChangesListClickTracking.*' all-events.log |wc
      8     416    6501
@deployment-eventlogging03:/srv/log/eventlogging$ grep 'ChangesListHighlights.*' all-events.log |wc
     34    1648   22198
@deployment-eventlogging03:/srv/log/eventlogging$ grep 'RecentChangesTopLinks.*' all-events.log |wc
      3     141    2116

QA Recommendation: Resolve

Etonkovidova moved this task from QA Review to Product Review on the Collaboration-Team-Triage (Collab-Team-This-Quarter) board.Jul 19 2017, 12:02 AM

@Catrope I see this has moved to my column, which means it's done. That's great, but I'm not sure how I get the results, so I can review them. Can you show me?

@Catrope , any progress? Moving this back to In Dev.

Catrope closed this task as Resolved.Jul 3 2018, 6:18 PM

ERI Metrics: Measure click-through actions from RC page and create 'Productivity" baselineClosed, ResolvedPublicActions