Page MenuHomePhabricator

Compare Parsoid perf on current production servers vs a newer test server
Closed, ResolvedPublic

Description

It is known that Parsoid's raw performance for a wt->html transformation will not be as good as core's performance simply because Parsoid does more work. While we should and can continue to improve Parsoid's raw performance, that is unlikely to get us raw performance parity.

There are two options available to us:

  1. some form of incremental parsing solution (limited to say section edits, for example) that is enabled by Parsoid's approach
  2. throw hardware at the problem since it is not a performance goal to match Parsoid performance with core parser performance on equivalent hardware

This phab task is about exploring strategy 2. To get a realistic sense of what is feasible, @Sbailey noted in a recent team meeting that we probably need to get our hands on a test server. I tried to figure out if @tstarling had some readymade insights based on his recent performance work on Parsoid and turns out he doesn't.

So, this is a tracking task to:
(2a) check with serviceops to see what our options are wrt acquiring a test server
(2b) benchmark Parsoid wt2html on current production hardware and on the test server to get a sense of what kind of benefits we can get.

Event Timeline

ssastry triaged this task as High priority.Dec 8 2021, 12:37 AM
ssastry created this task.

In T269459#7522285, Tim has a table that compares Dodo perf against PHP DOM performance. Over there, the table indicates that with PHP DOM (which is what we are currently running in production), GC overhead (on that particular page) is about 4%, so there isn't much benefit to be gained by trying to focus on GC-related memory work. So, as long as memory availability on the test server is on par (or better) and CPU caches are also on par (or better), we are probably trying to assess how much performance benefits we can gain with a faster / newer CPU.

On a temporary basis, the easiest option is to just depool one of the new appservers we just got in codfw and let you run whatever tests you'd like against it. The eqiad parsoid cluster hardware is from 2017 and codfw is from 2019, so I'd expect *some* immediate benefit, especially in eqiad.

T155645 has the current (2017) eqiad parsoid server specs, T231255 has current (2019) codfw parsoid server specs, and T271156 has the specs of the new codfw appservers. If you can't see those tickets I can email the details to you.

On a temporary basis, the easiest option is to just depool one of the new appservers we just got in codfw and let you run whatever tests you'd like against it. The eqiad parsoid cluster hardware is from 2017 and codfw is from 2019, so I'd expect *some* immediate benefit, especially in eqiad.

That would be helpful! Thanks!

I would suggest that instead of trying a test server, we should focus on making parsoid tests run on kubernetes, which is where parsoid will be running soon.

Full logs are at https://wm-bot.wmflabs.org/libera_logs/%23wikimedia-serviceops/20211208.txt - summary of today's IRC convo:

  • Goal is to understand performance of a single request (latencies), for that just using real hardware will be fine, regardless of the k8s future
  • We can also test PHP 7.2 vs 7.4 at the same time
  • ServiceOps needs a set of URLs to hit for the benchmarking script (https://gerrit.wikimedia.org/g/operations/software/benchmw)
  • There is a deadline of Dec 17th for Product's pre-annual-planning stuff
  • We are also interested in a baseline, so numbers on a eqiad cluster server, after depooling
  • I believe the Dec 17 deadline is tight and so there aren't promises from your end that we'll meet that deadline

Now that I think about it, maybe the dec 17 deadline is not that relevant since we are not going to get old servers for the parsoid cluster :-) ... you are going to buy the newest servers on the market, and so, we are going to get whatever performance boost we will get.

Benchmarking a current Parsoid server is straightforward, just need to depool it and start the script. Since all the new servers are non-Parsoid servers, we either need to switch the puppet role to role(parsoid) or emulate that. I'm not sure if we've tested flipping between appserver/parsoid roles like that before without a reimage. P18231 has the differences in hiera, we could just tack that onto one hosts' hiera (mw1456) to emulate role(parsoid) while still being an appserver.

I think the main value of not switching roles is that we don't have to adjust LVS config, which means we don't need to worry about conflicting with the ongoing LVS maintenance.

If we go that route, we'll also need to hack wmf-config to treat as $wmgServerGroup = 'parsoid';. That should raise the memory limit and enable the Parsoid REST endpoints on that host, which we'd undo before repooling it.

If feasible, it would be useful to get core parser performance on these revids as well (either on a baseline eqiad production server, or on the new test server). Thanks!

Do you mean like https://en.wikipedia.org/w/api.php?format=json&action=parse&title=Hospet&text={{:Hospet}}? I'm not aware of a way to parse a specific revision ID using that API, if that's okay.

Benchmarking a current Parsoid server is straightforward, just need to depool it and start the script. Since all the new servers are non-Parsoid servers, we either need to switch the puppet role to role(parsoid) or emulate that. I'm not sure if we've tested flipping between appserver/parsoid roles like that before without a reimage. P18231 has the differences in hiera, we could just tack that onto one hosts' hiera (mw1456) to emulate role(parsoid) while still being an appserver.

I think the main value of not switching roles is that we don't have to adjust LVS config, which means we don't need to worry about conflicting with the ongoing LVS maintenance.

If we go that route, we'll also need to hack wmf-config to treat as $wmgServerGroup = 'parsoid';. That should raise the memory limit and enable the Parsoid REST endpoints on that host, which we'd undo before repooling it.

Parsoid endpoints should be enabled everywhere at this point (Tim enabled it sometime in Sep/Oct as part of performance debugging), so maybe you don't need to do anything special. Please verify, of course. :)

If feasible, it would be useful to get core parser performance on these revids as well (either on a baseline eqiad production server, or on the new test server). Thanks!

Do you mean like https://en.wikipedia.org/w/api.php?format=json&action=parse&title=Hospet&text={{:Hospet}}? I'm not aware of a way to parse a specific revision ID using that API, if that's okay.

That is probably the only way available right now.

Change 747900 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/puppet@production] Pretend mw1456 is a parsoid appserver for benchmarking

https://gerrit.wikimedia.org/r/747900

Mentioned in SAL (#wikimedia-operations) [2021-12-16T19:51:28Z] <legoktm> depooling mw1456 for benchmarking (T297259)

Change 747900 merged by Legoktm:

[operations/puppet@production] Pretend mw1456 is a parsoid appserver for benchmarking

https://gerrit.wikimedia.org/r/747900

OK, mw1456 is depooled and should have PHP/envoy configured the same as parsoid servers do. Once the train rolls out, I'll start running it against that and then wtp1025.

Mentioned in SAL (#wikimedia-operations) [2021-12-17T02:07:04Z] <legoktm> depooling wtp1025 for benchmarking (T297259)

I've posted all the raw data and images at https://people.wikimedia.org/~legoktm/T297259/data/, still digging into it. From a quick eyeball of the graphs I think we're looking at roughly 10%-18% faster with no other optimization/tuning.

Thanks! Were you (are you) able to run the Parsoid vs. core parser performance benchmarks as well?

I just started the core perf benchmarking run.

All the raw results plus images are in https://people.wikimedia.org/~legoktm/T297259/data/ I believe @ssastry and co. all set with this, please re-open if you need more info :)