Page MenuHomePhabricator

Parsoid performance analysis
Closed, ResolvedPublic0 Estimate Story Points

Description

Tim to review Parsoid for performance. Especially interested in API queries.

Related Objects

Event Timeline

tstarling claimed this task.
tstarling raised the priority of this task from to Needs Triage.
tstarling triaged this task as Medium priority.
tstarling updated the task description. (Show Details)
tstarling added a project: MediaWiki-Core-Team.
tstarling moved this task to In Dev/Progress on the MediaWiki-Core-Team board.
tstarling set Security to None.
tstarling added a subscriber: tstarling.

Parsoid API requests account for about 44% of the request rate at the API cluster. Parsoid's action=expandtemplates accounts for 31% of total API wall clock time as accounted by api.log (which omits setup time).

Source data:

Baseline:

[0206][tstarling@fluorine:/a/mw-log]$ zcat archive/api.log-20150105.gz |  perl ~/api-stats.pl | head -n20
count          time (s)       action
587827         35468.292      [Total]
157230         11175.614      expandtemplates
125231         4025.265       query/imageinfo
37826          3308.852       query/pageimages
53135          3258.091       opensearch
22898          2040.357       parse
18490          1859.015       mobileview
45614          1567.970       query/revisions
40874          1516.924       query
1885           1134.007       stashedit
24732          701.929        wbgetclaims
4737           690.069        query/pageprops|pageimages
488            687.032        edit
3657           648.877        no-action
779            557.804        query/extlinks|categories|revisions|iwlinks|images|imageinfo|links|extracts
7959           466.527        wbgetentities
128            147.536        login
370            135.682        feedrecentchanges
5988           107.322        query/info

Filtered for parsoid client IPs:

[0208][tstarling@fluorine:/a/mw-log]$ zcat archive/api.log-20150105.gz | grep -Ff ~/parsoid-ips | perl ~/api-stats.pl | head -n20
count          time (s)       action
261575         13413.248      [Total]
156962         11151.541      expandtemplates
77088          1399.304       query/imageinfo
3601           458.264        parse
23849          397.740        query/revisions
75             6.399          query

Confirmation of lack of errors introduced by the shortcut of using grep -Ff on whole log lines instead of selecting the IP address field:

[0251][tstarling@fluorine:/a/mw-log]$ zcat archive/api.log-20150105.gz | awk '{print $8}' | grep -Ff ~/parsoid-ips | wc -l
261575
Krenair added a subscriber: Krenair.Jan 6 2015, 3:24 AM

The most convincing argument in favour of not batching seems to be for latency optimisation. But latency is irrelevant for ParsoidCacheUpdate jobs. On wtp1001, I captured about 9046 requests between 03:18 and 03:34, and of those, 8848 or 98% had an X-Parsoid header, indicating they originated from ParsoidCacheUpdate. Those requests could use large batch sizes with no effect on the end user. Job execution rate would be unaffected if Parsoid's --num-workers was retuned to give the same CPU utilisation.

Source data:

root@wtp1001:~# tcpdump -r ~/requests.pcap -A | grep ' HTTP/1\.1' | wc -l
reading from file /root/requests.pcap, link-type EN10MB (Ethernet)
9046
root@wtp1001:~# tcpdump -r ~/requests.pcap -A | grep 'GET /' | wc -l
reading from file /root/requests.pcap, link-type EN10MB (Ethernet)
9186
root@wtp1001:~# tcpdump -r ~/requests.pcap -A | grep 'X-Parsoid' | wc -l
reading from file /root/requests.pcap, link-type EN10MB (Ethernet)
8848

I don't think that dashboard thing is related.

marcoil added a subscriber: marcoil.Jan 6 2015, 3:18 PM
Memeht added a comment.EditedJan 7 2015, 9:21 PM

Heya, interesting results! Quick clarification: is api.log some sort of general logs or did you implement some performance instrumentation?

api.log already existed, it is the "api" log channel, generated by ApiMain::logRequest().

Restricted Application added a project: VisualEditor. · View Herald TranscriptJan 23 2015, 6:41 PM
tstarling closed this task as Resolved.Feb 10 2015, 11:24 PM
bd808 moved this task from Done to Archive on the MediaWiki-Core-Team board.Feb 17 2015, 11:44 PM
ori moved this task from Backlog to Done on the VisualEditor-Performance board.Feb 19 2015, 2:29 AM
Jdforrester-WMF moved this task from Blocked to Q4 on the VisualEditor board.Feb 21 2015, 12:36 AM