Our current worker_connections limit is too low, causing at least one outage. Tune that, and tune other things as needed.
Description
Details
Event Timeline
For the record,
see https://wikitech.wikimedia.org/wiki/Incident_documentation/ToolsProxy20160823 and https://gerrit.wikimedia.org/r/#/c/297829/3/modules/dynamicproxy/templates/nginx.conf which removed the worker_connections override from 768 to 8096.
So I fell into a bit of a hole yesterday around this. nginx is running different versions between tools-static and tools-proxy, because tools-static is running on trusty because it was inheriting from the toollabs base class which brings in *all* gridengine...
I got rid of the inheritance, recreated them as jessie (tools-static- new ones, vs old tools-web-static-), but then they have only a 80G disk and cdnjs is almost full because it keeps all of history.
So things that need to happen:
- Convert the cdnjs clone to a cron rather than from puppet directly
- Switch the floating IP assigned to tools-web-static-01 to tools-static-01
- Kill tools-web-static-* nodes
- Setup new worker_connections config...
Remember we can't switch floating IP from horizon since we need to control which IP gets assigned where. Needs to happen on labcontrol1001
Change 306958 had a related patch set uploaded (by Madhuvishy):
Convert puppet clone of cdnjs to cron
Change 307621 had a related patch set uploaded (by Madhuvishy):
toollabs: Convert cdnjs pull cron command to one line
Change 307621 merged by Madhuvishy:
toollabs: Convert cdnjs pull cron command to one line
Mentioned in SAL [2016-09-06T22:10:29Z] <madhuvishy> Deleted instance tools-web-static-01 and tools-web-static-02 (T143637)
Change 309450 had a related patch set uploaded (by Madhuvishy):
dynamicproxy: Override nginx worker_connections default
Change 309450 merged by Madhuvishy:
dynamicproxy: Override nginx worker_connections default
Mentioned in SAL (#wikimedia-labs) [2016-09-13T21:09:48Z] <madhuvishy> Bumped proxy nginx worker_connections limit T143637