Page MenuHomePhabricator

Data need: Explore range of article revision comparisons
Closed, ResolvedPublic2 Estimated Story Points

Description

For the revision slider it could be useful to know what is actually done when comparing revisions. Thus I/we could optimize the design for the most common tasks instead of possibly going into a direction that does not cover frequent usecases.

Data needs:

  • How many steps back in time do users go when comparing versions? (measured in current state -n revisions)
  • How big is the difference between compared versions (measured in revisions)
  • Is it common to jump between versions back and forth in a short time or do users find the revisions quickly without iterating and adjusting often?

Data Format
All data makes the most sense if given in[[ https://en.wikipedia.org/wiki/Quantile#Specialized_quantiles | Percentiles/Deciles ]], so we can see the distribution.

Skills
Could be delivered as Excel/Calc, csv, RDataFrame

Coversation

On a diff you have a "previous" and "next" edit button.
Would tracking how often these were used be of use?

Yes, that is a reasonable proxy for what interests me, I suppose.

Event Timeline

Change 287647 had a related patch set uploaded (by Addshore):
Track dewiki diff page usage

https://gerrit.wikimedia.org/r/287647

Change 288158 had a related patch set uploaded (by Addshore):
Don't log dewiki_diffstats to logstash

https://gerrit.wikimedia.org/r/288158

The above change would provide a JSON log of data that we could then work with.
A blob would be logged every time the diff view was loaded on dewiki.
The blob would contain:

  • current timestamp
  • revision ids for the diff
  • page id
  • Total number of revisions of the page
  • Number of revisions between the compared revisions
  • Number of revisions back in time of the latest revision being compared.

This would likely need review / approval from someone at the WMF to ensure this is okay and not going to stress anything too much.

Just a summary of the data Jan and I are interested in:
For the revision view as is (without the new revision slider):

  • date and position of older revision (as in "the nth revision")
  • date and position of younger revision
  • total number of revisions of this article
  • maybe: article id

So the patch above collects the oldid and newid, from this we can get the timestamp of each and the position in history.
It also directly collects the number of revisions of the page.
Article id can be collected using either of the revision ids, but it is also included in the patch

Tobi_WMDE_SW set the point value for this task to 2.
Lea_WMDE renamed this task from Data need: User Behaviour when comparing article revisions to Data need: Data need: Explore range of article revision comparisons .Jun 6 2016, 3:50 PM
Lea_WMDE updated the task description. (Show Details)
WMDE-leszek renamed this task from Data need: Data need: Explore range of article revision comparisons to Data need: Explore range of article revision comparisons .Jun 27 2016, 11:35 AM

Change 287647 merged by jenkins-bot:
Track dewiki diff page usage

https://gerrit.wikimedia.org/r/287647

Change 288158 abandoned by Addshore:
Don't log dewiki_diffstats to logstash

https://gerrit.wikimedia.org/r/288158

Change 299744 had a related patch set (by Addshore) published:
Make dewiki_diffstats debug instead of info

https://gerrit.wikimedia.org/r/299744

Change 299744 merged by jenkins-bot:
Make dewiki_diffstats debug instead of info

https://gerrit.wikimedia.org/r/299744

Change 288158 restored by Addshore:
Don't log dewiki_diffstats to logstash

https://gerrit.wikimedia.org/r/288158

Change 288158 merged by jenkins-bot:
Add dewiki_diffstats to wmgMonologChannels

https://gerrit.wikimedia.org/r/288158

Changed merged and the logs are now accessible on fluorine

Mentioned in SAL [2016-07-25T23:11:15Z] <dereckson@tin> Synchronized wmf-config/InitialiseSettings.php: Add dewiki_diffstats to wmgMonologChannels ([[Gerrit:288158]], T134861) (duration: 00m 25s)