Page MenuHomePhabricator

Remove edit size information in the Edit Counter in new XTools
Closed, ResolvedPublic2 Estimated Story Points

Description

It says that my average edit size on English Wikipedia is 31,250.8:

Average edit size: 	31,250.8

This number is totally bogus and fixing it would be expensive. See below.

Since this has apparently always been broken and no one even noticed, it doesn't seem like it is important information. Please remove the following fields from the interface:

  • Average edit size
  • Small edits (<20 bytes)
  • Large edits (>1000 bytes)

Event Timeline

That's being calculated by:

SELECT 'average_size' AS `key`, AVG(rev_len) AS val FROM $revisionTable WHERE rev_user = :userId

So yep, bytes, and high maybe because you've got some massive outlier that's skewing things?

Should we be displaying the median size instead?

@Samwilson: That's the wrong query to use. rev_len is the size of the entire revision, i.e. the size of the article immediately after the edit. To get the edit size, you have to subtract the rev_len of the revision before the edit from the rev_len of the edit, which is probably way too expensive to do. Just giving the average or median rev_len doesn't convey any useful information.

kaldari renamed this task from Average edit size is mysterious in new XTools to Average edit size is bogus in new XTools.Jul 10 2017, 5:49 PM
kaldari triaged this task as Medium priority.
kaldari updated the task description. (Show Details)
kaldari renamed this task from Average edit size is bogus in new XTools to Remove all the average edit size information in the Edit Counter in new XTools.Jul 12 2017, 12:05 AM
kaldari renamed this task from Remove all the average edit size information in the Edit Counter in new XTools to Remove edit size information in the Edit Counter in new XTools.
kaldari raised the priority of this task from Medium to High.
kaldari updated the task description. (Show Details)
kaldari edited projects, added Community-Tech-Sprint; removed Community-Tech.
kaldari set the point value for this task to 2.

@Samwilson: That's the wrong query to use. rev_len is the size of the entire revision, i.e. the size of the article immediately after the edit. To get the edit size, you have to subtract the rev_len of the revision before the edit from the rev_len of the edit, which is probably way too expensive to do. Just giving the average or median rev_len doesn't convey any useful information.

Hm, why don't we just correct that query? Do we really need to remove that field?

@Luke081515: Correcting the query would mean doubling the number of revisions we have to look at for the user. The Edit Counter interface is already quite slow as it is, especially for people with more than a few thousand edits. I think it would be a bad idea to make it significantly slower. I imagine this is why these fields were disabled in the old XTools interface.

When there's a will there's a way, but you should also pay mind to reverts vs non-reverts. People care about content, not that you undid a vandals 10K removal :) The old XTools does not account for this, so it's misleading as-is.

@MusikAnimal: Whenever I use the old XTools, it always just shows me "extended" for these 3 fields, no matter who I'm looking up. What does "extended" mean?

I think this might do it (user 59944 is Kaldari):

MariaDB [enwiki_p]> SELECT AVG(sizes.size) AS average_size,
    ->     COUNT(CASE WHEN sizes.size < 20 THEN 1 END) AS small_edits,
    ->     COUNT(CASE WHEN sizes.size > 1000 THEN 1 END) AS large_edits
    -> FROM (
    ->     SELECT (CAST(revs.rev_len AS SIGNED) - IFNULL(parentrevs.rev_len, 0)) AS size
    ->     FROM revision_userindex AS revs
    ->     LEFT JOIN revision_userindex AS parentrevs ON (revs.rev_parent_id = parentrevs.rev_id)
    ->     WHERE revs.rev_user = 59944
    -> ) sizes;
+--------------+-------------+-------------+
| average_size | small_edits | large_edits |
+--------------+-------------+-------------+
|     102.2419 |       31838 |        1432 |
+--------------+-------------+-------------+
1 row in set (20.46 sec)

Still probably too slow.

@MusikAnimal: Whenever I use the old XTools, it always just shows me "extended" for these 3 fields, no matter who I'm looking up. What does "extended" mean?

TODO? Haha, not sure. I guess I'm one of the people that didn't notice it wasn't reporting it. It was there at some point, but was probably wrong or removed because it was too expensive.

And for me:

MariaDB [enwiki_p]> SELECT AVG(sizes.size) AS average_size,
    ->     COUNT(CASE WHEN sizes.size < 20 THEN 1 END) AS small_edits,
    ->     COUNT(CASE WHEN sizes.size > 1000 THEN 1 END) AS large_edits
    -> FROM (
    ->     SELECT (CAST(revs.rev_len AS SIGNED) - IFNULL(parentrevs.rev_len, 0)) AS size
    ->     FROM revision_userindex AS revs
    ->     LEFT JOIN revision_userindex AS parentrevs ON (revs.rev_parent_id = parentrevs.rev_id)
    ->     WHERE revs.rev_user = 14882394
    -> ) sizes;
+--------------+-------------+-------------+
| average_size | small_edits | large_edits |
+--------------+-------------+-------------+
|     860.1665 |       43455 |       20747 |
+--------------+-------------+-------------+
1 row in set (1 min 27.22 sec)

So yeah, might have to ditch this!

However I was thinking, not necessarily right now, that instead of throwing out these fun stats we could instead limit them to users who don't have a bajillion edits, or even introduce an actual SQL LIMIT, and just show a message saying that only N edits are being looked at. I think around 50K is reasonable.

However I was thinking, not necessarily right now, that instead of throwing out these fun stats we could instead limit them to users who don't have a bajillion edits, or even introduce an actual SQL LIMIT, and just show a message saying that only N edits are being looked at. I think around 50K is reasonable.

@MusikAnimal: Feel free to create a new task for that. For now though, let's axe it.

I couldn't resist but move forward with my idea to add the data for the past 5,000 edits (50,000 was a stretch). I ran this against my own account (~108,000 edits) and that query took ~1.4 seconds, which is plenty fast in my opinion. The PR for this is at https://github.com/x-tools/xtools-rebirth/pull/43. A footnote is used to indicate which fields only account for the past 5,000 edits. I did not write tests because the testable functions do nothing more than reference what the Repo returns. All of this I'd estimate at 2 points of work.

The PR to remove the data completely is at https://github.com/x-tools/xtools-rebirth/pull/44 which by my estimate was maybe a 0.5 points... certainly not 2! And you must admit, after removing this data, all that unused real estate on the page looks very sad :( Especially if you load data for an admin, which adds a bunch of fields in the middle column, there's a lot of whitespace. If we stick with axing the edit size data, to fill out the page and give it some happy sunshine, I'd like to add a pie chart for minor edits vs non-minor, and/or live vs deleted (both of which requre little effort to add thanks to the pieChart helper).

I couldn't resist but move forward with my idea to add the data for the past 5,000 edits (50,000 was a stretch).

Fine with me, although I hope we are still on track for officially launching the new version on Monday.

I couldn't resist but move forward with my idea to add the data for the past 5,000 edits (50,000 was a stretch).

Fine with me, although I hope we are still on track for officially launching the new version on Monday.

We are only missing RfX Analysis, RfX Vote Calculator, and fixes on the edit counter according to the description of T153112: Epic: Rewriting XTools. My goal for the first RfX tools is by Saturday. @Samwilson and I discussed tagging the first 3.0 release on Sunday. After that we can roll any bugfixes left over into the 3.0.1 release.

I've merged #43, the limiting code, and it's deployed to prod and dev so we can see how it works.

My average edit size is -79.4 bytes!