Beta cluster varnish fails VCL compilation because citoid.wmflabs.org does not resolve
Closed, ResolvedPublic0 Story Points

Description

Jun 21 11:28:12 deployment-cache-text04 varnishd[12163]: Error:
Jun 21 11:28:12 deployment-cache-text04 varnishd[12163]: Message from VCC-compiler:
Jun 21 11:28:12 deployment-cache-text04 varnishd[12163]: Backend host '"citoid.wmflabs.org"' could not be resolved to an IP address:
Jun 21 11:28:12 deployment-cache-text04 varnishd[12163]: Name or service not known
Jun 21 11:28:12 deployment-cache-text04 varnishd[12163]: (Sorry if that error message is gibberish.)
Jun 21 11:28:12 deployment-cache-text04 varnishd[12163]: ('/etc/varnish/wikimedia-common_text-backend.inc.vcl' Line 99 Pos 17)
Jun 21 11:28:12 deployment-cache-text04 varnishd[12163]: .host = "citoid.wmflabs.org";
Jun 21 11:28:12 deployment-cache-text04 varnishd[12163]: ----------------####################-
Jun 21 11:28:12 deployment-cache-text04 varnishd[12163]: Running VCC-compiler failed, exited with 2
Jun 21 11:28:12 deployment-cache-text04 varnishd[12163]: VCL compilation failed

Seems that comes from conftool?

hashar created this task.Jun 21 2017, 11:33 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 21 2017, 11:33 AM

Change 360639 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] (DO NOT SUBMIT) confftool: remove citoid to unbreak beta

https://gerrit.wikimedia.org/r/360639

I have cherry picked https://gerrit.wikimedia.org/r/360639 on the beta cluster. Somehow that causes the citoid backend to no more be in varnish and thus let it start just fine.

It seems most of the services on beta cluster are exposed via labs web proxy with: <service>-beta.wmflabs.org

On deployment-cache-text04 instance

/etc/varnish/wikimedia-common_text-backend.inc.vcl
backend be_citoid_wmflabs_org {
        .host = "citoid.wmflabs.org";
        .port = "1970";
        .connect_timeout = 5s;
        .first_byte_timeout = 180s;
        .max_connections = 1000;
}


backend be_cxserver_beta_wmflabs_org {
        .host = "cxserver-beta.wmflabs.org";
        .port = "8080";
        .connect_timeout = 5s;
        .first_byte_timeout = 180s;
        .max_connections = 1000;
}

Eg cxserver is setup properly but citoid is not. It should use the same trick.

Turns out the Hiera configuration is in Horizon https://horizon.wikimedia.org/project/prefixpuppet/ deployment-cache-text having:

cache::app_directors:
  citoid_backend:
    backends:
      eqiad: citoid.wmflabs.org
    be_opts:
      port: 1970
Restricted Application added a project: VisualEditor. · View Herald TranscriptJun 21 2017, 3:34 PM

Mentioned in SAL (#wikimedia-releng) [2017-06-21T15:35:48Z] <hashar> deployment-prep changing Varnish director for citoid from citoid.wmflabs.org to citoid-beta.wmflabs.org ( via https://horizon.wikimedia.org/project/prefixpuppet/ ) - T168519

Change 360639 abandoned by Hashar:
(DO NOT SUBMIT) confftool: remove citoid to unbreak beta

Reason:
fixed the rule in hiera

https://gerrit.wikimedia.org/r/360639

So varnish is fixed. Left over question is why does the beta cluster varnish cache uses the wmflabs web proxy has backend when it could reach the instance directly?

https://horizon.wikimedia.org/project/proxy/ has entries such as:

citoid-beta.wmflabs.org.http://10.68.20.183:1970
graphoid-beta.wmflabs.org.http://10.68.20.183:19000
restbase-beta.wmflabs.org.http://10.68.17.189:7231

And Varnish is configured via https://horizon.wikimedia.org/project/prefixpuppet/

mobrovac added a subscriber: mobrovac.

Since we use the horizon proxy, can't we just remove these from puppet altogether?

hashar closed this task as Resolved.Jun 22 2017, 2:13 PM
hashar claimed this task.

What I suspect is that this way:

  • all the backends are publicly exposed
  • changing IP/backend can be done directly in horizon
  • it does not require a puppet patch + cherry pick

So that is probably good :]

Jdforrester-WMF set the point value for this task to 0.