Page MenuHomePhabricator

scb2003 reports 'Internal error in changeprop'
Closed, ResolvedPublic

Description

Description TBA, but we observed my errors coming from this host in

https://logstash.wikimedia.org/goto/7b7aea332f2a58897aa63d2943eb69f1

TypeError: "value" argument is out of bounds
    at checkInt (buffer.js:1041:11)
    at Buffer.writeInt32BE (buffer.js:1244:5)
    at HTCPPurger._constructHTCPRequest (/srv/deployment/changeprop/deploy-cache/revs/c25a1c25ca9eb8d4a7c1459add82849b16f665b3/node_modules/htcp-purge/index.js:98:16)
    at P.all.urls.map (/srv/deployment/changeprop/deploy-cache/revs/c25a1c25ca9eb8d4a7c1459add82849b16f665b3/node_modules/htcp-purge/index.js:57:35)
    at Array.map (native)
    at HTCPPurger.purge (/srv/deployment/changeprop/deploy-cache/revs/c25a1c25ca9eb8d4a7c1459add82849b16f665b3/node_modules/htcp-purge/index.js:56:27)
    at PurgeService.purge (/srv/deployment/changeprop/deploy-cache/revs/c25a1c25ca9eb8d4a7c1459add82849b16f665b3/src/sys/purge.js:37:28)
    at tryCatcher (/srv/deployment/changeprop/deploy-cache/revs/c25a1c25ca9eb8d4a7c1459add82849b16f665b3/node_modules/bluebird/js/release/util.js:16:23)
    at /srv/deployment/changeprop/deploy-cache/revs/c25a1c25ca9eb8d4a7c1459add82849b16f665b3/node_modules/bluebird/js/release/method.js:15:34
    at handlerWrapper (/srv/deployment/changeprop/deploy-cache/revs/c25a1c25ca9eb8d4a7c1459add82849b16f665b3/node_modules/hyperswitch/lib/hyperswitch.js:422:37)
    at next (/srv/deployment/changeprop/deploy-cache/revs/c25a1c25ca9eb8d4a7c1459add82849b16f665b3/node_modules/hyperswitch/lib/hyperswitch.js:408:42)
    at Object.module.exports [as filter] (/srv/deployment/changeprop/deploy-cache/revs/c25a1c25ca9eb8d4a7c1459add82849b16f665b3/node_modules/hyperswitch/lib/filters/validator.js:272:12)
    at handlerWrapper (/srv/deployment/changeprop/deploy-cache/revs/c25a1c25ca9eb8d4a7c1459add82849b16f665b3/node_modules/hyperswitch/lib/hyperswitch.js:420:27)
    at next (/srv/deployment/changeprop/deploy-cache/revs/c25a1c25ca9eb8d4a7c1459add82849b16f665b3/node_modules/hyperswitch/lib/hyperswitch.js:408:42)
    at Object.module.exports [as filter] (/srv/deployment/changeprop/deploy-cache/revs/c25a1c25ca9eb8d4a7c1459add82849b16f665b3/node_modules/hyperswitch/lib/filters/metrics.js:16:12)
    at handlerWrapper (/srv/deployment/changeprop/deploy-cache/revs/c25a1c25ca9eb8d4a7c1459add82849b16f665b3/node_modules/hyperswitch/lib/hyperswitch.js:420:27)

Event Timeline

jijiki triaged this task as Medium priority.Feb 1 2020, 5:26 PM

Mentioned in SAL (#wikimedia-operations) [2020-02-01T18:17:51Z] <effie> pool scb2003, no need for host to stay depooled - T244069

Oh! The core reason is that a counter has overflowed int32. Our software is so stable, we can overflow int32 now!

The HTCP purging protocol contains a sequential number of the request written into an int32 field of the HTCP request. The counter is maintained in an in-memory variable. We need to reset the variable when we reach int32 boundaries.

I've restarted the service on the host which should fix the immediate issue.

Achievement unlocked.

Joe added a subscriber: Joe.

Maybe change it to use a 64 bit integer instead?

Maybe change it to use a 64 bit integer instead?

The HTCP protocol only allows us 32 bits. If there was space in a datagram, I['d happily give you 64 bits @Joe. I'd give you 128 bits if I could!

However, the id in question should only be unique within possible lifetime of a UDP datagram, so it's ok to roll over.

Changes to the library are tracked under https://github.com/wikimedia/htcp-purge/pull/9 so let's not close this until that's merged at least to no loose track

@Clarakosi has fixed the underlying issue and a htcp-purge@0.3.1 was published and will be deployed on the next change-prop deploy.