For VisualEditor's performance benchmarking, it would be hugely helpful to have an understanding of the central tendency of the length of pages which are edited (by all users using all editing tools), rather than all articles that exist (because lots of articles don't get edited frequently/from year to year, and some get edited multiple times a day). Not sure what measures will be most useful – median & s.d.?
Description
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
analytics/limn-edit-data | master | +85 -0 | Analyze page size impact on editing |
Event Timeline
Thanks for the tag, Grace.
James, how about a breakdown of the "success" metric by "size of page" where the size can be:
- more than 1 SD below median
- above 1 SD, below median
- above median, below 1 SD
- more than 1 SD above median
So you'd have total rate of success, and rate by each of those four classes? This could be visualized as a stacked area chart under the total success rate.
That'd be great. :-) The initial impetus of this request is to try to come up with a coherent, reasonable article/set of articles for which we can measure performance numbers using our synthetic benchmarks, as part of the "is VisualEditor fast enough" requirement. However, this sounds like a really great extension of the idea.
That'd be great. :-) The initial impetus of this request is to try to come up with a coherent, reasonable article/set of articles for which we can measure performance numbers using our synthetic benchmarks, as part of the "is VisualEditor fast enough" requirement. However, this sounds like a really great extension of the idea.
Hm. Then, in the short term, would you rather I give you a set of articles that had N articles in each of these sixteen categories:
- four quartiles by size of article
- four quartiles by relative editing traffic
? If so, what N would you like?
The fact that page length is not set in the event [1] means this will be a little trickier, as we have to make prepared statements and join to every wiki db.
Yeah, not available on the client, sadly, so an additional client request we wouldn't otherwise need.
Change 195895 had a related patch set uploaded (by Milimetric):
Analyze page size impact on editing
Doing this requires ad-hoc cron jobs to synchronize data from event logging and the other mediawiki databases into a combined staging table. A data analyst could do this task, or we could build infrastructure to solve this problem in general. But the analytics team wants to stay away from ad-hoc work going forward and favors infrastructure work (as per our new infrastructure denomination :))
I'm removing the analytics projects from this task to make it clear we'd rather have an analyst on the editing team take on this type of work.
Are you looking for a set of summary statistics for different categories of articles by editing frequency? For example, "articles which have been edited in the past month have an average length of X kB, with a standard deviation of Y kB." Or would you prefer what Millimetric suggested: a (more-or-less) random selection of pages in each quartile of article size and editing traffic?