Page MenuHomePhabricator

Some citoid requests aren't timing out and are pending indefinitely
Open, Needs TriagePublic

Description

i.e. http://en.wikipedia.org/api/rest_v1/data/citation/mediawiki-basefields/https%3A%2F%2Fwww.babycentre.co.uk%2Fc5112%2Fbefore-you-begin

Citoid can't scrape this url and in the past the request would time out and restbase/citoid would return a 404 response for this request.

Event Timeline

I tested the actual backend request to citoid

curl 'citoid.discovery.wmnet:1970/api?format=mediawiki-basefields&search=https%3A%2F%2Fwww.babycentre.co.uk%2Fc5112%2Fbefore-you-begin'

and that indeed hangs as well. I will dig deeper but a question stands out: why are neither restbase nor varnish timing out themselves?

Correcting myself: in most cases, citoid times out after 120 seconds returning an empty response. Sometimes, it returns a 404.

Couple of notes:

  1. It's broken in beta on citoid as well: curl 'https://citoid-beta.wmflabs.org/api?format=mediawiki&search=https%3A%2F%2Fwww.babycentre.co.uk%2Fc5112%2Fbefore-you-begin&basefields=true'
  2. RESTBase has backend request timeout if 2 minutes and does 1 retry. I've verified that RESTBase running on my computer pointing to beta citoid is returning 504 after 4 minutes.
  3. Varnish is probably retrying 504s as well, which adds a multiplier to the 4 minutes timeout.

I caught an error in production:

{
  "name": "citoid",
  "hostname": "citoid-production-76db86989b-8td54",
  "pid": 16,
  "level": 40,
  "err": {
    "message": "504: internal_http_error",
    "name": "citoid",
    "stack": "Error: socket hang up\n    at createHangUpError (_http_client.js:253:15)\n    at TLSSocket.socketOnEnd (_http_client.js:345:23)\n    at emitNone (events.js:91:20)\n    at TLSSocket.emit (events.js:185:7)\n    at endReadableNT (_stream_readable.js:974:12)\n    at _combinedTickCallback (internal/process/next_tick.js:80:11)\n    at process._tickCallback (internal/process/next_tick.js:104:9)",
    "status": 504,
    "body": {
      "type": "internal_http_error",
      "description": "Error: socket hang up",
      "error": {
        "code": "ECONNRESET",
        "attempts": 2
      },
      "stack": "Error: socket hang up\n    at createHangUpError (_http_client.js:253:15)\n    at TLSSocket.socketOnEnd (_http_client.js:345:23)\n    at emitNone (events.js:91:20)\n    at TLSSocket.emit (events.js:185:7)\n    at endReadableNT (_stream_readable.js:974:12)\n    at _combinedTickCallback (internal/process/next_tick.js:80:11)\n    at process._tickCallback (internal/process/next_tick.js:104:9)",
      "uri": "https://www.babycentre.co.uk/c5112/before-you-begin",
      "method": "get"
    },
    "levelPath": "warn/scraper"
  },
  "msg": "504: internal_http_error",
  "time": "2019-05-23T14:32:47.605Z",
  "v": 0
}

so it would seem an https connection error.

This is even more baffling as it seems other https requests do not fail and complete correctly.

Anyways, this is clearly a citoid issue, as it can be reproduced in beta as well.

Volans subscribed.

Given the latest updates removing Operations.