Page MenuHomePhabricator

Benchmark performance between Wikidiff2 and its PHP counterpart
Closed, ResolvedPublic5 Estimated Story Points

Description

The #1 most voted wish from the 2022 Community Wishlist is Better diff handling of paragraph splits

As part of some initial investigations, we want to know if new PHP versions are closing the gap in performance between Core's DiffEngine (PHP) and Wikidiff2.
The goal of this task is to compare both engines and get some basic metrics to determine if wikidiff2 is still needed or if we could consider deprecating it.


Things to test/consider

  • Compare C++ wikidiff2 against PHP 7.4 & PHP 8 (if possible)
  • With and without paragraph matching in wikidiff2 (default is that the feature turns off for texts longer than 100 lines).
  • Compare output equivalence. T318377
  • Use several different line and text lengths, and documents with many moves vs. changes, and vice-versa.

Event Timeline

awight updated the task description. (Show Details)
awight updated the task description. (Show Details)

You should be able to disable the moved line detection by setting ini_set( 'wikidiff2.moved_paragraph_detection_cutoff', 0 ); in your LocalSettings.php

Change 831150 had a related patch set uploaded (by Dmaza; author: Dmaza):

[mediawiki/core@master] [WIP] [Do Not Merge] Benchmark diff engines

https://gerrit.wikimedia.org/r/831150

I've created the above patch (DO NOT MERGE) and here is what I found so far with the very limited data that I'm testing against

  • wikidiff2 is almost 100 times faster than php7.2 on the large text
  • wikidiff2 is roughly 44 times faster than php7.2 on the small text
  • php8.2 w/jit enabled is 20 times faster than php7.2 on the large text
  • paragraph move detection slows down wikidiff2 significantly on larger texts

Below are some raw numbers using the above patch. For context, I'm running this on my MBP via docker. Each file is being tested 100 times and this is the output

randomly_large x100
PHP v7.2.34WikiDiff2 v1.13.0WikiDiff2 v1.13.0 w/paragraph matchingPHP v8.2.0beta2PHP v8.2.0beta2 w/jit enabled
rate0.3/s29.4/s0.4/s2.4/s6.1/s
total329157.56ms3404.90ms233691.34ms40923.42ms16419.43ms
mean3291.58ms34.05ms2336.91ms409.23ms164.19ms
max5336.50ms39.89ms2593.64ms487.92ms294.52ms

small_text x100
PHP v7.2.34WikiDiff2 v1.13.0WikiDiff2 v1.13.0 w/paragraph matchingPHP v8.2.0beta2PHP v8.2.0beta2 w/jit enabled
rate52.6/s2270.3/s508.2/s375.0/s437.7/s
total1902.75ms44.05ms196.76ms266.70ms228.47ms
mean19.03ms0.44ms1.97ms2.67ms2.28ms
max33.20ms0.95ms2.89ms11.23ms53.25ms

@awight Do you or your team have good examples of different documents I could use to run a more thorough test? I could really use some help with that.

@tstarling is there something specific I should be testing for to better gauge performance between the different engines?

Very promising results so far!

I don't have a good source of documents. There's the test directory and the built-in benchmarks, but these are more useful for checking correctness, and not a good sample of actual data.

It might be interesting to log events from TextSlotDiffRenderer, and gather some metrics from each document pair. But there's already a basic StatsD metric MediaWiki.diff_time. I made a draft dashboard to explore, please feel free to take this over and improve. I think it shows that there are approximately 30 diffs run every second, and the vast majority take less than 2ms, but there's also a constant stream of long jobs which take 5s, and can sometimes go up to the 60s timeout.

Comparing with your table, it seems that your "small text" is a representative sample of the workload, and for that document the new PHP implementation is a reasonable replacement. The "max" times for the PHP implementation are strange, I wonder if something else is happening like garbage collection or other stuff taking up the single thread.

Move detection increases the time order from O(N^2) to O(N^3). It's naively implemented. Someone just needs to fix the algorithm.

Move detection increases the time order from O(N^2) to O(N^3). It's naively implemented. Someone just needs to fix the algorithm.

That's probably true, but to keep this insight in context, maybe someone should implement the improved algorithm in PHP?

I think the benchmark results above support the idea of leaving it in C++. The point of porting it from PHP to C++ in the first place was to provide a user-visible (>100ms) latency benefit for large and complex diffs and to avoid timeouts. It continues to achieve that. Maybe I'm missing some context here -- why would you want to implement paragraph matching in PHP?

No worries, I think I'm missing more context than you ;-). My thought was along the same lines as the Parsoid rewrite: moving these microservices to a more or less monolithic codebase and a single programming language is a big maintenance benefit (ie., how many Wikimedia devs are going to jump into that C PHP extension to fix the algorithm...), and would make the enhanced functionality available for third-party wikis. Currently the performance is slightly lower but not unreasonably so, IMHO.

My thought was along the same lines as the Parsoid rewrite: moving these microservices to a more or less monolithic codebase and a single programming language is a big maintenance benefit (ie., how many Wikimedia devs are going to jump into that C PHP extension to fix the algorithm...)

Excluding people who have only changed the CI and build system, there have been 15 contributors, which seems not too terrible. It looks like more people have contributed to DiffEngine.h than to DiffEngine.php. You're right that it's a minority skill, but I do think it's a skill we need, even without wikidiff2. It's not our only extension, and I've made some contributions to the PHP core this year which were needed by WMF.

and would make the enhanced functionality available for third-party wikis. Currently the performance is slightly lower but not unreasonably so, IMHO.

wikidiff2 can be used by third-party wikis. If you're running on a shared host where you are not able to install PHP extensions, you should expect degraded performance. The main obstacle is T196132.

I think wikidiff2 is a good model to follow for future performance projects. I've been talking for a couple of years about replacing part of Parsoid with C, specifically a C code generation frontend for WikiPEG. So that might happen at some point.

Running PHP code means running on an open source platform written in C. If we don't have C skills, we will be victims to the limitations of the platform.

It would be helpful to understand what the goal or proposal in this task is besides "TBD/WIP" :-)

Some related tasks:

It might be interesting to log events from TextSlotDiffRenderer, and gather some metrics from each document pair. But there's already a basic StatsD metric MediaWiki.diff_time. I made a draft dashboard to explore, please feel free to take this over and improve. I think it shows that there are approximately 30 diffs run every second, and the vast majority take less than 2ms, but there's also a constant stream of long jobs which take 5s, and can sometimes go up to the 60s timeout.

Interesting. Is the max always capping at ~60s 'cause everything beyond that times out?

The "max" times for the PHP implementation are strange, I wonder if something else is happening like garbage collection or other stuff taking up the single thread.

I'm assuming you are talking about PHP8 numbers. Thanks for pointing that out, and I have no idea. You could be right about gc and my local env.


It would be helpful to understand what the goal or proposal in this task is besides "TBD/WIP" :-)

Some related tasks:

Sorry about that. I've updated the task description.
I don't think C/C++ is an issue. The burden IMO is maintaining two different implementations and keeping them in sync.

Change 831150 abandoned by Dmaza:

[mediawiki/core@master] [WIP] [Do Not Merge] Benchmark diff engines

Reason:

https://gerrit.wikimedia.org/r/831150