Citoid can't scrape this url and in the past the request would time out and restbase/citoid would return a 404 response for this request.
Description
Description
Event Timeline
Comment Actions
I tested the actual backend request to citoid
curl 'citoid.discovery.wmnet:1970/api?format=mediawiki-basefields&search=https%3A%2F%2Fwww.babycentre.co.uk%2Fc5112%2Fbefore-you-begin'
and that indeed hangs as well. I will dig deeper but a question stands out: why are neither restbase nor varnish timing out themselves?
Comment Actions
Correcting myself: in most cases, citoid times out after 120 seconds returning an empty response. Sometimes, it returns a 404.
Comment Actions
Couple of notes:
- It's broken in beta on citoid as well: curl 'https://citoid-beta.wmflabs.org/api?format=mediawiki&search=https%3A%2F%2Fwww.babycentre.co.uk%2Fc5112%2Fbefore-you-begin&basefields=true'
- RESTBase has backend request timeout if 2 minutes and does 1 retry. I've verified that RESTBase running on my computer pointing to beta citoid is returning 504 after 4 minutes.
- Varnish is probably retrying 504s as well, which adds a multiplier to the 4 minutes timeout.
Comment Actions
I caught an error in production:
{ "name": "citoid", "hostname": "citoid-production-76db86989b-8td54", "pid": 16, "level": 40, "err": { "message": "504: internal_http_error", "name": "citoid", "stack": "Error: socket hang up\n at createHangUpError (_http_client.js:253:15)\n at TLSSocket.socketOnEnd (_http_client.js:345:23)\n at emitNone (events.js:91:20)\n at TLSSocket.emit (events.js:185:7)\n at endReadableNT (_stream_readable.js:974:12)\n at _combinedTickCallback (internal/process/next_tick.js:80:11)\n at process._tickCallback (internal/process/next_tick.js:104:9)", "status": 504, "body": { "type": "internal_http_error", "description": "Error: socket hang up", "error": { "code": "ECONNRESET", "attempts": 2 }, "stack": "Error: socket hang up\n at createHangUpError (_http_client.js:253:15)\n at TLSSocket.socketOnEnd (_http_client.js:345:23)\n at emitNone (events.js:91:20)\n at TLSSocket.emit (events.js:185:7)\n at endReadableNT (_stream_readable.js:974:12)\n at _combinedTickCallback (internal/process/next_tick.js:80:11)\n at process._tickCallback (internal/process/next_tick.js:104:9)", "uri": "https://www.babycentre.co.uk/c5112/before-you-begin", "method": "get" }, "levelPath": "warn/scraper" }, "msg": "504: internal_http_error", "time": "2019-05-23T14:32:47.605Z", "v": 0 }
so it would seem an https connection error.
This is even more baffling as it seems other https requests do not fail and complete correctly.