Page MenuHomePhabricator

Adjust vertical scale for a better resolution with big changes
Open, MediumPublic

Description

Story:
As a reviewer I do not care so much about seeing the difference between the small changes. ( e.g. if the change is 1, 2 or 5 bytes big ). Seeing the difference between a 400byte and 800byte change is way more important.

Proposed solution:
Normalize small changes in chunks having just a few pixel in height and go logarithmic starting around 50bytes.

By feedback in: https://www.mediawiki.org/wiki/Topic:Tu7w34h8zi8z8ppb

Careful: There might be changes needed once we see this. This is unfortunately a case of "we don't know if this is right, until we see it". So please check back with @Lea_WMDE and WMDE-Design before merging.

Event Timeline

@Lea_WMDE @WMDE-Fisch : Did you (or anybody else) try out how such scaling works with a larger sample of changes? I am uncertain here – changing the scaling-the-bars-paradigm from linear to logarithmically strikes me data-viz-wise as very unusual which can indicate that this is not wide spread for a reason.

@Lea_WMDE @WMDE-Fisch : Did you (or anybody else) try out how such scaling works with a larger sample of changes? I am uncertain here – changing the scaling-the-bars-paradigm from linear to logarithmically strikes me data-viz-wise as very unusual which can indicate that this is not wide spread for a reason.

Currently the scale is already kind of logarithmically [1] - but since it's all about just a few pixel we're not really able to use its full potential. I think flattening the lower end makes sense.

[1] https://gerrit.wikimedia.org/g/mediawiki/extensions/RevisionSlider/+/master/modules/ext.RevisionSlider.RevisionListView.js#224

Lea_WMDE updated the task description. (Show Details)
Lea_WMDE subscribed.
wassan.anmol117 subscribed.

I am interested in working on this task.

Proposed solution that I have think of: Keep the height intact (same) for all changes having < 100 bytes (both positive and negative). Let the logarithmic algorithm calculate the height if it's equal or greater than 100 bytes.

Proposed solution that I have think of: Keep the height intact (same) for all changes having < 100 bytes (both positive and negative). Let the logarithmic algorithm calculate the height if it's equal or greater than 100 bytes.

Sounds good to me, let's try it out and then we can test how it feels!

Change 452350 had a related patch set uploaded (by Wassan.anmol; owner: Anmol Wassan):
[mediawiki/extensions/RevisionSlider@master] Adjust vertical scale for a better resolution with big changes

https://gerrit.wikimedia.org/r/452350

Change 454808 had a related patch set uploaded (by Gopavasanth; owner: Gopavasanth):
[mediawiki/extensions/RevisionSlider@master] Adjust vertical scale for a better resolution with big changes

https://gerrit.wikimedia.org/r/454808

While playing with this together with @Lea_WMDE we decided on changing the default threshold for the cut off to 50.

Change 458202 had a related patch set uploaded (by Gopavasanth; owner: Gopavasanth):
[mediawiki/extensions/RevisionSlider@master] Adjust vertical scale for a better resolution with big changes

https://gerrit.wikimedia.org/r/458202

Change 458202 abandoned by Gopavasanth:
Adjust vertical scale for a better resolution with big changes

Reason:
as of I7104232ba7b3b155e48a07df6ce7878a0e4300e2

https://gerrit.wikimedia.org/r/458202

I mapped the algorithms as graphs via LibreOffice Calc. It might be that I did something wrong, but this is what I get:

image.png (526×560 px, 29 KB)

I see a few problems:

  • For edits with exactly 50 bytes it fails (I think because it tries to calculate Math.log( 0 )).
  • The bars are actually smaller than before. Is this intentional?
  • I'm not sure if it's a good idea to draw all edits below a cut-off size the same size, because they can not be distinguished any more.

For edits with exactly 50 bytes it fails (I think because it tries to calculate Math.log( 0 )).

So @thiemowmde to fix this we can modify logic from diffSize > -50 && diffSize < 50 to diffSize >= -50 && diffSize <= 50 ? That's the straightway to fix this.

I mapped the algorithms as graphs via LibreOffice Calc. It might be that I did something wrong, but this is what I get:

image.png (526×560 px, 29 KB)

I see a few problems:

  • For edits with exactly 50 bytes it fails (I think because it tries to calculate Math.log( 0 )).
  • The bars are actually smaller than before. Is this intentional?
  • I'm not sure if it's a good idea to draw all edits below a cut-off size the same size, because they can not be distinguished any more.

Thanks @thiemowmde for that graph. We should have done this right away. Now we have a better understanding on how our algorithm behaves without hitting refresh all the time. - The formula in the document had some minor issues though ( e.g. it added the wrong field as min revision size ). I fixed that here and also added a few more values so we can at least see the log effect in a curve.

For edits with exactly 50 bytes it fails (I think because it tries to calculate Math.log( 0 )).

So @thiemowmde to fix this we can modify logic from diffSize > -50 && diffSize < 50 to diffSize >= -50 && diffSize <= 50 ? That's the straightway to fix this.

Yes thanks, @Gopavasanth that should fix the issue that is also visible in the graph. - Still I guess the main thing that bothers us is, that the graph is very very steep in the beginning and I think we would want it to be more smooth there and not "skip" several pixel in height for just a few bytes of size differences.

So maybe some adjustments to the log calculation could help here. I did not think about this a lot atm though. - Luckily we can play with the graph a bit to see what might help. Fell free to give it a try ( I will as well ).

Even with the fixed .ods my biggest issues with this approach remain: All bars are smaller than before, and there is a huge spike that heavily exaggerates edits with 52 or more bytes, while edits with 51 or less bytes all look identical.

Size of editHeight of bar
485
495
505
515
5236
5354
5466

One could basically ask how an edit with 52 bytes is 7 times more "important" than an edit that's only 1 byte smaller?

To be honest, when I read this tasks description the most trivial solution I can think of is a linear scale. Really. What this ticket asks for is pretty much the opposite of what a logarithmic scale does, so why not get rid of it, if it's not wanted? Or choose another base for the logarithm that behaves not so extreme. Currently it's still log(n), but it does not need to be like that.

Gerrit also shows bars to visualize the size of a change. This feature is quite similar to what Revision-Slider does.

While looking at my Gerrit dashboard today, I realized it does something neat: It does have a maximum of 500 lines per change. All changes that happen to touch 500 or more lines are shown as a maxed-out 100% bar. Only changes between 0 and 500 lines affect the size of the bar.

I find this super clever. From a certain point on, it does not matter any more if a change is "very large" or "extremely large". It's large. I don't feel any information is lost by applying such an arbitrary upper limit – as long as it is high enough. The positive effect of this limit is that all the "normal" changes that are typically much smaller don't end all being a thin line. I can clearly distinguish changes that touch, let's say, 50 lines (equals at least 10% of the bar, or more), from changes that touch 100 lines (at least 20%).

I suggest to apply the same idea on Revision-Slider.

Sounds Awesome for me, @WMDE-Fisch, @Lea_WMDE what do you say ?

Sounds good to me. @Lea_WMDE is currently out of office for some time so there won't be an answer from her side I guess. :-)

So what would be good numbers for the lower and upper bar in our case? Should that be fixed or vary according to the sizes of the currently loaded diffs?

Personally, I don't think such an upper bound should be variable. The idea really is to have it fixed.

I suggest to not fiddle with a lower bound. Keep it at 0.

To find a good upper bound, we can look at the distribution of a few thousand random edits. I believe we will see a log-tail curve then. We might find something like "1% of the edits are larger than 5000 bytes". That could be our number then.

Change 454808 abandoned by WMDE-Fisch:

[mediawiki/extensions/RevisionSlider@master] Adjust vertical scale for a better resolution with big changes

Reason:

Just cleaning up. This is blocked on a product decision and has a very low priority. It's unclear if we'll come back to this issue but at least not in the foreseeable future.

https://gerrit.wikimedia.org/r/454808

Change 452350 abandoned by WMDE-Fisch:

[mediawiki/extensions/RevisionSlider@master] Adjust vertical scale for a better resolution with big changes

Reason:

Just cleaning up. This is blocked on a product decision and has a very low priority. It's unclear if we'll come back to this issue but at least not in the foreseeable future.

https://gerrit.wikimedia.org/r/452350