The question here is, can we measure whether the new RC pages are more productive, in the sense that users are finding more edits that require some action? To do this, we'll need to measure whether users take an action on pages they click to from RC page.
We need to establish a baseline for RC Page tool usage before we release the beta. (Establishing the baseline after beta release would be bad, since many of our most active users will be in the beta.) This exercise will also flush out any issues with the tracking mechanisms we've put in place.
Proposed Productivity Metrics
- Action Ratio: What is the ratio of clicks on edit results total vs. clicks that lead to a page where the user takes some specified set of actions (Revert, Undo, click to Edit the page...). Our hypothesis is that the closer the ratio is to 1:1, the better the tool is doing at helping users find the pages they're looking for.
- Quality Filter Action Ratio Can we sort our results so that we know if people who used particular filters have higher Action Ratios? A high-value candidate here would be the ORES Quality filters. It would be very interesting to know whether users of these filters get more hits than other users.
- Newcomer Action Ratio Along the same lines, it would be relevant to ERI success if we could see whether users of the Newcomer filter perform certain actions more or less often than others. I.e., do people tracking Newcomers Revert and Undo more or less? Do they Thank and hit Talk more or less?
Questions/Issues
- For the Action Ratio, what are the set of "actions" that we'd want to count?
- Include: Edit, Undo, Thank, Rollback, Rollback Vandal, Talk. Don't include: click a link to go to any another page. What else?
- Can we track actions taken in Twinkle? Just clicking to launch the Twinkle is not a true indication of taking action, since the top-level Twinkle menu includes non-actions like "Last," which just shows the previous Diff. Can we do something like If action=Twinkle, record the next action? And then have a list of those that count?
- I don't think Mark as Patrolled should count. If all you do is mark a page as patrolled, that basically means you didn't find what you were looking for, doesn't it?
- My understanding from @Catrope is that path analysis on our system is only really reliable for the first action after the user's click. So the metrics proposed above work within that limitation. If more sophisticated analyses are feasible, we can think more ambitiously. So, two questions:
- Are more complex analyses feasible? E.g., could we follow the reviewer for X number of clicks, to find out if any of those actions included the specified set (Revert, Undo, etc.) on the target page? E.g., could we know if the reviewer eventually Reverted, after checking some facts and diffs?
- If we can't do the more sophisticated analyses, do we believe the proposed metrics provide relative but useful measures of success? (I.e., even if we don't have a full picture of what users actually do, will we know if things got better?)
- I'm thinking a week or a month might be the relevant period for this type of analysis, to avoid normal weekly rhythms.
- Do we need to produce the baseline figures now for all wikis we will ever want to measure? Or will the data be available indefinitely?
Steps
- We don't need to build a graphing tool out of the gate. Our goal, I think, is to produce a spreadsheet from which we can extract meaningful conclusions. If we want to automate analysis, we can do that later.
My sense of the best way to proceed is this:
- Investigate the issues, determine what is possible and how involved the project will be, then report back.
- If required, put in place whatever tools are necessary to get the data we want.
- Make a trial run at producing analysis for two of the ORES wikis, one large and one small. Say en.wiki and pl.wiki?
- Refine methodology/technology as needed.
- Rerun the analysis of the two wikis above.
- Determine a test set and acquire baseline figures, since figures will not be available indefinitely.