Page MenuHomePhabricator

Rollback DNS record creation on dynamicproxy registration failure
Open, MediumPublic

Description

Follow up suggestion from T288962: Cannot create web proxy because of Duplicate RecordSet

My sense making about what happened here is that the proxy was partially created by having it's DNS record setup but not getting the proper state data about that pushed into the wmcloud.org front proxy configuration. If you read the code that actually does this work you can see that the DNS record is created first and then the config change is sent to the front proxy for storage. There is an error handler block for the update to the config, but when it detects a failure it does not roll back the DNS record. This feels like a thing that could be improved.

[17:11]  <    bd808> bstorm: having looked at the code I don't think we even get to blame rabbit. This looks like a boring HTTP request failure talking to the dynamicproxy control API
[17:11]  <   bstorm> ah damn it
[17:11]  <   bstorm> I mean, that sounds like a fixable thing (possibly)
[17:12]  <   bstorm> We won't have logs if it's an error at dynamic proxy at this point.
[17:12]  <    bd808> The boring fix would be to rm the DNS record when there is a failure caught updating the dyncamicproxy config
[17:13]  <   bstorm> True. That's a boring yet entirely sensible fix
[17:13]  <    bd808> the gold plated fix would be to also add logic to check for an existing DNS entry before trying to create a new one
[17:14]  <   bstorm> And yell at you before you try it?
[17:14]  <    bd808> I guess I was thinking of making it work like an UPSERT

Event Timeline

It would be really useful if we are able to log an error with an id (request id?) anywhere, besides if it should try/retry the action or not. Now that we have central logging should be doable, making it more trackable and debuggable, unblocking some extra observability and possible future automation.

It would. We could probably find logs from the horizon side. The project-proxy end isn't on central logging, but if it was part of the problem (as we were guessing the other day), then it at least might leave some on-disk logs that might be easier to dig in with some kind of an ID in there. By default the Lua probably doesn't log anything but nginx might.

nskaggs triaged this task as Medium priority.Aug 23 2021, 2:19 PM