Record performance numbers during RT testing
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ssastry
	Mar 28 2013, 9:47 PM

Description

The RT testing setup should measure performance for individual pages and record them in the database so we have a sense of how Parsoid is performing (and improving/regressing).

The numbers could be split into: (a) parse time (b) serialize-time (c) total time; and anything else and recorded in the db. We could also add additional web endpoints to display performance stats of various kinds, possibly also highlight noticeable/significant perf. improvements / regressions, outliers in terms of parse/serialize times, and anything else that would be useful.

Version: unspecified
Severity: normal

Details

Reference: bz46659

Related Objects
Search...

Status	Assigned	Task
Resolved	• marcoil	T48659 Record performance numbers during RT testing
Resolved	cscott	T46651 Add support for round-trip testing of non-English wiki pages
Resolved	• marcoil	T46652 Improve performance of round-trip test setup

Event Timeline

• bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:26 AM

• bzimport added a project: Parsoid-Tests.

• bzimport set Reference to bz46659.

ssastry created this task.Mar 28 2013, 9:47 PM

Since we need to change the schema for bug 44652 and bug 44651 too, it probably makes sense to tackle these three bugs at once. Thus adding a cyclic dependency.

It would also be useful to collect stats on the number of template expansions, extensions and images in a page.

We've also been asked (by James_F) about the bandwidth implications of parsoid markup. So we could collect, for each page, the size of the wikitext (raw and gzip-compressed) and the size of the corresponding parsoid markup (raw, gzip-compressed, and gzip-compressed-after-stripping data-parsoid attributes).

The most important size number for production will really be:

gzip-compressed, without data-parsoid, but with unique id on each node.

Change 78398 had a related patch set uploaded by Marcoil:
WIP - Record performance numbers during RT testing

https://gerrit.wikimedia.org/r/78398

The patch I've uploaded to Gerrit is still a Work In Progress, but the basic functionality is there.

Principal points:

The database schema: I've opted for a (key, value) table as that allows adding new stats without having to change the schema every time, which can be a pain.
The method of collecting the data is very rudimentary, but to start with I didn't want to enter too much into Parsoid's core. There's a pretty good library to do this kind of metrics at https://github.com/s2js/probes , but I'm not sure how compatible it would be with our code…
The data is passed by appending some new tags to the XML output, like

<perfstat type="time:parse">12345</perfstat>

</perfstats>

I'll appreciate comments and reviews :)

The (key,value) setup is the right solution -- yes, we want the flexibility of adding new stats without changing schema. I'll comment on specifics of the solution on the gerrit patch.

Collecting easy-to-collect stats is a good start for now so we can test this approach. There is no reason to instrument Parsoid's core till it becomes necessary. Fine tracing of badly performing pages is more of a debugging exercise, for rt-testing, we want high-level numbers (parse, serialize, sizes, etc.) If it became necessary, we could consider splitting the parse times into parsoid-code time and api-wait time, but I dont see it as a requirement now.

IMO, probes are probably not necessary in rt-testing, but could be useful for perf. debugging or gathering finer grained metrics if desired.

Change 78398 merged by jenkins-bot:
Record performance numbers during RT testing

https://gerrit.wikimedia.org/r/78398

Additional work on perfstats UI will be tracked in a new bug or on IRC.

The flexibility of key/value storage comes at the price of some storage (and probably access) efficiency:

-rw-rw---- 1 mysql mysql 232M Aug 24 20:44 perfstats.ibd
-rw-rw---- 1 mysql mysql 396M Aug 24 20:44 results.ibd

We might need to optimize this a bit at some point.

Actually, the perfstats table size can easily be reduced 2-3x with the following fixes to the schema:

Add an id field to the commits table
At least in the perfstats table (and possibly some or all others as well), replace commit_hash with a commit_id
Add a stat_keys(id, key) and replace the type field in the perfstats table with stat_key_id field.

Record performance numbers during RT testingClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Record performance numbers during RT testing
Closed, ResolvedPublic
Actions

Related Objects
Search...