Page MenuHomePhabricator

Log Parsoid server-side save performance
Closed, ResolvedPublic0 Estimated Story Points

Description

Time-series graph showing save timing from 1/28 to 4/20

Compared to the start of February 2014, median user-perceived save time is up by approximately a full second, whereas the save time at the 75th percentile is up ~4.5 seconds, jumping from 6.5 to 11 seconds. Backend profiling data for action=visualeditoredit corroborates this and suggests the problem is at the backend.

Attached graph can be manipulated in Graphite Composer at http://graphite.wikimedia.org/; look for it under User Graphs -> ori.


Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=66914

Attached:

ve-save-time.png (521×733 px, 78 KB)

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:21 AM
bzimport set Reference to bz64171.

Do we have data about page save time trends for wikitext saves? I ask because slowness on save in the backend can only really be in Parsoid or in code paths that are taken on every save.

Maybe the "backend profiling data" you're talking about can shed more light on this?

(In reply to Roan Kattouw from comment #1)

Maybe the "backend profiling data" you're talking about can shed more light
on this?

The backend profiling data is available in Graphite under the 'MediaWiki' namespace; it consists of aggregated wfProfile{In,Out} measurements collected on 2% of requests.

Some of the useful metrics are MediaWiki.API.visualeditoredit, MediaWiki.API.edit, MediaWiki.WikiPage.doEditContent, and MediaWiki.API.edit. So far the data seems to suggest that the regression is Parsoid-related, but some additional analysis is required before that can be established definitively.

http://graphite.wikimedia.org/render/?width=947&height=521&_salt=1398121734.894&from=00%3A00_20140201&until=23%3A59_20140421&lineMode=connected&target=movingAverage(MediaWiki.API.visualeditoredit.tp90%2C300)&target=movingAverage(MediaWiki.WikiPage.doEditContent.tp90%2C300)&target=movingAverage(MediaWiki.API.edit.tp90%2C300)&target=movingAverage(MediaWiki.API.edit.tp90%2C300)

You have a pending change against Parsoid to add instrumentation, right? Mind linking that change here?

(In reply to Roan Kattouw from comment #3)

You have a pending change against Parsoid to add instrumentation, right?
Mind linking that change here?

https://gerrit.wikimedia.org/r/#/c/112647/

(In reply to Ori Livneh from comment #4)

(In reply to Roan Kattouw from comment #3)

You have a pending change against Parsoid to add instrumentation, right?
Mind linking that change here?

https://gerrit.wikimedia.org/r/#/c/112647/

Note that the request timings are readily available in the VE extension. The internal Parsoid timings are returned by Parsoid in a header.

In [1] the visualeditoredit API end point does not seem to change much. My understanding is that this is the time for the entire save request including Parsoid serialization. Is that correct?

[1]: http://graphite.wikimedia.org/render/?width=947&height=521&_salt=1398121734.894&from=00%3A00_20140101&until=23%3A59_20140421&lineMode=connected&target=movingAverage(MediaWiki.API.visualeditoredit.tp90%2C300)&target=movingAverage(MediaWiki.WikiPage.doEditContent.tp90%2C300)&target=movingAverage(MediaWiki.API.edit.tp90%2C300)&target=movingAverage(MediaWiki.API.edit.tp90%2C300

(In reply to Gabriel Wicke from comment #5)

Note that the request timings are readily available in the VE extension. The
internal Parsoid timings are returned by Parsoid in a header.

I should know -- I added this header. But as I recall, we agreed it'd be more practical to report this directly from Parsoid, which is why I submitted https://gerrit.wikimedia.org/r/#/c/112647/. We really need to get Parsoid performance data into Graphite. Could you either accept/amend my patch, or provide an alternate implementation?

As I have said earlier, the most important metric should be request timings to the Parsoid API which includes the cache layers. Parsoid itself should not see most requests, so naturally can't report on them.

The information you are looking for can be readily measured by timing getHTML(line 85) and postHTML (line 152) in ApiVisualEditor.php.

Change 130543 had a related patch set uploaded by GWicke:
WIP Bug 64171: Log Parsoid request timings to eventlogging

https://gerrit.wikimedia.org/r/130543

I think the instrumentation in the WIP patch above should shed some light by breaking the save-pipeline time into parsoid-serialization time + everything else.

In addition to the instrumentation in the VE extension to measure Parsoid serialization time, on the Parsoid side, we could instrument our serialization pipeline into (a) dom-diff times (b) varnish hit/miss info (c) serialization time.

In roundtrip testing, on slower servers, where there is no dom-diff, and there is full serialization (unlike faster selective serialization in production), we found html2wt times to be consistently fast (< 3s in almost all cases). Unfortunately, those numbers are right now buried in our rt-server db which has grown too big. Once the current round of rt-testing finishes, if I can access those times and post some numbers here.

On my laptop, a full serialization of 3.8M Barack Obama page completed in ~4.2 s. And, dom-diff of a lightly-edited (~5-6 text edits) version of BO html completed in ~2 s.

All that said, dom-diff could still be a bottleneck in some heavy-edit scenarios -- we haven't really done any real perf tests on that component. If this first-level breakdown of time to get Parsoid HTML reveals that we need more breakdown of Parsoid numbers as above, we can prioritize that and add instrumentation in Parsoid (or repurpose Ori's existing patch).

It's also worth keeping in mind that most of the user-perceived save time on large articles & typical DSL connections is currently the network transfer to the server.

The attached screenshot is of a save of the [[en:San Francisco]] article from a residential 16mbit DSL connection in SF. 23s is spent uploading 3MB of uncompressed HTML, 7s is spent waiting for and transferring the response. We already discussed this in https://bugzilla.wikimedia.org/show_bug.cgi?id=53093#c8. Since then the situation has already slightly improved by using multipart/form-data content encoding in VE POSTs.

We are currently working on removing data attributes from the HTML, which make up the bulk of the uncompressed HTML content due to the excessive &quot; escaping. See https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec/Element_IDs and bug 52936.

Created attachment 15292
Save request from VE for [[en:San Francisco]] from a 16mbit DSL connection in San Francisco. Most of the time is spent transferring the 3mb of uncompressed HTML to the server.

Attached:

saverequest.png (209×490 px, 13 KB)

Upload compression was added in T68914.

Roan and I have discussed re-introducing change markers and only sending content which has changed.

If we over-cautiously mark child nodes of the document as having changed any time a transaction touches them we would likely only send a few paragraphs of data for each edit. On large articles this would represent a few percent of the total data.

Jdforrester-WMF renamed this task from VisualEditor: Page save time at 75th percentile up 70% from February to VE page save time at 75th percentile up 70% from February.Dec 2 2014, 9:39 PM
Jdforrester-WMF set Security to None.

Upload compression was added in T68914.

Roan and I have discussed re-introducing change markers and only sending content which has changed.

If we over-cautiously mark child nodes of the document as having changed any time a transaction touches them we would likely only send a few paragraphs of data for each edit. On large articles this would represent a few percent of the total data.

That's T70042: When saving a page, only send back to Parsoid the parts of the page that have been changed.

J.

Mooeypoo lowered the priority of this task from High to Medium.Jan 9 2015, 12:51 AM

Change 130543 had a related patch set uploaded (by Alex Monk):
WIP: Log Parsoid request timings to EventLogging

https://gerrit.wikimedia.org/r/130543

Patch-For-Review

Jdforrester-WMF renamed this task from VE page save time at 75th percentile up 70% from February to Log Parsoid server-side save performance.Feb 10 2015, 9:05 PM
Jdforrester-WMF edited a custom field.

Change 130543 abandoned by Alex Monk:
WIP: Log Parsoid request timings to EventLogging

Reason:
Looks like Parsoid are going to do https://gerrit.wikimedia.org/r/#/c/186219/

https://gerrit.wikimedia.org/r/130543

Change 186219 had a related patch set uploaded (by Arlolra):
Add performance instrumentation and port node-txstatsd wrapper to Parsoid

https://gerrit.wikimedia.org/r/186219

Patch-For-Review

Change 186219 merged by jenkins-bot:
Add perf. instrumentation and port node-txstatsd wrapper to Parsoid

https://gerrit.wikimedia.org/r/186219

Change 191475 had a related patch set uploaded (by Memeht):
Add txstatsd settings for beta labs and production

https://gerrit.wikimedia.org/r/191475

Patch-For-Review

Change 191475 merged by jenkins-bot:
Add txstatsd settings for beta labs and production

https://gerrit.wikimedia.org/r/191475

ssastry claimed this task.

Now deployed and live in production.