Page MenuHomePhabricator

Parsoid extremely slow to render a short article
Closed, ResolvedPublic

Description

Parsoid timeouts with the following (short) article:
http://parsoid-lb.eqiad.wikimedia.org/euwiki/Cebus?oldid=3949332

The PHP render works well:
https://eu.wikipedia.org/w/index.php?title=Cebus&oldid=3949332


Version: unspecified
Severity: normal

Details

Reference
bz72564

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:53 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz72564.

One information, my timeout configuration is at 120 seconds. If you wait longer you might get the HTML.

Interesting and rather odd.

I confirmed that using curl to get this page from Parsoid takes ~150 secs (It is now cached, and using curl returns the page rapidly -- so you have to force a refresh in the browser). But, if I take the wikitext and parse the wikitext via the parse.js script on the command-line, it renders in 5 secs. Could be the new timeout code?

(In reply to ssastry from comment #2)

Interesting and rather odd.

I confirmed that using curl to get this page from Parsoid takes ~150 secs
(It is now cached, and using curl returns the page rapidly -- so you have to
force a refresh in the browser). But, if I take the wikitext and parse the
wikitext via the parse.js script on the command-line, it renders in 5 secs.
Could be the new timeout code?

Oh, never mind, that new timeout code hasn't been deployed yet.

I can reproduce the behavior on localhost:8000/

Could be this (timeout fetching wikitext for the page).

[info][euwiki/Cebus?oldid=3949332] started parsing
[warning/api][euwiki/Cebus?oldid=3949332] Failed API request, {"error":{"code":"ETIMEDOUT"},"retries-remaining":8}
[warning/api][euwiki/Cebus?oldid=3949332] Failed API request, {"error":{"code":"ETIMEDOUT"},"retries-remaining":7}
[warning/api][euwiki/Cebus?oldid=3949332] Failed API request, {"error":{"code":"ETIMEDOUT"},"retries-remaining":6}
[info][euwiki/Cebus?oldid=3949332] started parsing
[warning/api][euwiki/Cebus?oldid=3949332] Failed API request, {"error":{"code":"ETIMEDOUT"},"retries-remaining":5}
[warning/api][euwiki/Cebus?oldid=3949332] Failed API request, {"error":{"code":"ETIMEDOUT"},"retries-remaining":8}
[warning/api][euwiki/Cebus?oldid=3949332] Failed API request, {"error":{"code":"ETIMEDOUT"},"retries-remaining":4}
[warning/api][euwiki/Cebus?oldid=3949332] Failed API request, {"error":{"code":"ETIMEDOUT"},"retries-remaining":7}
[warning/api][euwiki/Cebus?oldid=3949332] Failed API request, {"error":{"code":"ETIMEDOUT"},"retries-remaining":3}
[warning/api][euwiki/Cebus?oldid=3949332] Failed API request, {"error":{"code":"ETIMEDOUT"},"retries-remaining":6}
[warning/api][euwiki/Cebus?oldid=3949332] Failed API request, {"error":{"code":"ETIMEDOUT"},"retries-remaining":2}

What I can add: it was the only one in a range of 50.000 euwiki articles generating this timeout error.

Here is my HTTP header dump for http://parsoid-lb.eqiad.wikimedia.org/euwiki/Cebus?oldid=3949332

GET /euwiki/Cebus?oldid=3949332 HTTP/1.1
Host: parsoid-lb.eqiad.wikimedia.org
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,en-us;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
DNT: 0
Cookie: GeoIP=CH:Dubendorf:47.3954:8.6184:v4
Connection: keep-alive

HTTP/1.1 200 OK
X-Powered-By: Express
Vary: Accept-Encoding
Access-Control-Allow-Origin: *
Cache-Control: private,no-cache,s-maxage=0
content-revision-id: 3949332
X-Parsoid-Performance: duration=270362; start=1414501238071
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip
X-Varnish: 497764938, 820175809
Via: 1.1 varnish, 1.1 varnish
Content-Length: 2646
Accept-Ranges: bytes
Date: Tue, 28 Oct 2014 13:05:08 GMT
Age: 0
Connection: keep-alive
X-Cache: cp1058 miss (0), cp1058 frontend miss (0)

You'll notice that the Taxotaula automatikoa template isn't being rendered in PHP there. The API response is,

{
  "expandtemplates": {
      "categories": [
          {
              "sortkey": "",
              "*": "Orrialdearen_espantsio_sakonera_gainditu_da"
          },
          {
              "sortkey": "",
              "*": "Kontuan_hartu_ez_diren_txantiloiak_dituzten_orrialdeak"
          },
          {
              "sortkey": "",
              "*": "Txantiloiaren_inklusio_tamaina_gainditu_den_orrialdeak"
          }
      ],
      "wikitext": "[[:Txantiloi:Taxotaula automatikoa]]<!-- WARNING: template omitted, post-expand include size too large -->"
  }
}

However, it takes ~40s to generate that response from the API. Parsoid has a timeout of 30 for template expansions. So, we repeatedly hit the timeout and retry 10 times.

Are these sane defaults on our part?

The other thing to notice is the arguments to that template,

{{taxotaula automatikoa
|display parents = 2
|display children = 1
|image = Capuchin Costa Rica.jpg
|image caption = ''[[Cebus capucinus]]''
|authority = [[Johann Christian Polycarp Erxleben|Erxleben]], 1777
|parents authority = [[Charles Lucien Bonaparte|Bonaparte]], 1831
}}

There's no taxon specified there. Adding an artificial one brings the parse time down to something reasonable.

Arlolra triaged this task as Medium priority.Nov 25 2014, 11:15 PM
Arlolra claimed this task.

These days, the rendering time seems acceptable.

Retries were reduced in https://gerrit.wikimedia.org/r/#/c/206362/