Page MenuHomePhabricator

changeprop logs nothing when a HTTP action fails
Closed, ResolvedPublic

Description

When testing out the lift-wing endpoints, we were getting silent failures even on a successful match because of an issue with us using the puppet ca bundle and therefore getting TLS errors. We didn't see errors in logs until trace was set, and this was only doable because we were in staging. Not only is this not an option in prod, even if it were we wouldn't catch the errors. This is one manifestation of our overall lack of visibility into changeprop logs and error logging in general.

Ideally this should be logged at ERROR level *and* should emit a metric.

This was the error in question:

{
	  "name": "changeprop-staging",
	  "hostname": "changeprop-staging-55b6f597d7-prlwv",
	  "pid": 141,
	  "level": "TRACE",
	  "message": "Processed event sample",
	  "event_str": "THE_BODY_REDACTED",
	  "request": {
	    "uri": "https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-goodfaith%3Apredict",
	    "headers": {
	      "host": "enwiki-goodfaith.revscoring-editquality-goodfaith.wikimedia.org",
	      "content-type": "application/json",
	      "x-request-id": "ee7f1f1f-4896-4ef1-bd24-32dc5028241d",
	      "x-triggered-by": "req:ee7ffff1f-1111-4ef1-bd24-32dc5028241d,mediawiki.revision-create:https://en.wikipedia.org/wiki/The_Embrace",
	      "user-agent": "ChangePropagation/WMF"
	    },
	    "body": "{THE_BODY_REDACTED}"
	  },
	  "response": {
	    "status": 504,
	    "headers": {
	      "content-type": "application/problem+json"
	    },
	    "body": {
	      "type": "internal_http_error",
	      "detail": "unable to get local issuer certificate",
	      "internalStack": "Error: unable to get local issuer certificate\n    at TLSSocket.onConnectSecure (_tls_wrap.js:1055:34)\n    at TLSSocket.emit (events.js:189:13)\n    at TLSSocket._finishInit (_tls_wrap.js:633:8)",
	      "internalURI": "https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-goodfaith%3Apredict",
	      "internalQuery": "{}",
	      "internalErr": "unable to get local issuer certificate",
	      "internalMethod": "post"
	    }
	  },
	  "stream": "mediawiki.revision-create",
	  "rule_name": "liftwing_goodfaith",
	  "executor": "RuleExecutor",
	  "levelPath": "trace/sample",
	  "msg": "Processed event sample",
	  "time": "2023-01-24T16:33:12.282Z",
	  "v": 0
	}