Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • mobrovac | T190266 Switch the Recommendation API to use the internal WDQS cluster | |||
Resolved | Smalyshev | T178492 Create a more controlled WDQS cluster | |||
Resolved | Gehel | T187766 Install / configure new WDQS servers | |||
Resolved | Gehel | T187800 rack/setup/install wdqs200[4-6] | |||
Resolved | • ayounsi | T188303 switch port configuration for wdq200[4-6] | |||
Resolved | RobH | T182991 New WDQS clusters eqiad + codfw | |||
Unknown Object (Task) | |||||
Unknown Object (Task) |
Event Timeline
Change 415872 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] [WIP] wdqs: configure the new internal cluster
Change 416921 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs: enable LDF server is now configurable
Change 416921 merged by Gehel:
[operations/puppet@production] wdqs: enable LDF server is now configurable
Change 416961 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs: configure new servers wdqs200[4-6]
Change 417202 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs: use the raid10-gpt-srv-lvm-ext4 partman config for new wdqs nodes
Change 415872 merged by Gehel:
[operations/puppet@production] wdqs: configure the new internal cluster
Change 417202 merged by Gehel:
[operations/puppet@production] wdqs: use the raid10-gpt-srv-lvm-ext4 partman config for new wdqs nodes
Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['wdqs2004.codfw.wmnet', 'wdqs2005.codfw.wmnet', 'wdqs2006.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201803090941_gehel_15241.log.
Change 416961 merged by Gehel:
[operations/puppet@production] wdqs: configure new servers wdqs200[4-6]
Change 417782 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs: comment out wdqs_internal nodes from eqiad
Change 417782 merged by Gehel:
[operations/puppet@production] wdqs: comment out wdqs_internal nodes from eqiad
Completed auto-reimage of hosts:
['wdqs2006.codfw.wmnet']
Of which those FAILED:
['wdqs2006.codfw.wmnet']
Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['wdqs2006.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201803091054_gehel_31807.log.
Initial data import is in progress on wdqs200[456] (note that wdqs2006 has issues with mgmt interface). Eqiad servers are not yet racked (T188432).
Change 419264 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs: collect prometheus metrics for both wdqs clusters
Change 419264 merged by Gehel:
[operations/puppet@production] wdqs: collect prometheus metrics for both wdqs clusters
Change 419707 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs: add pigz package
Change 424260 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs: configure new servers wdqs100[6-8]
Change 424260 merged by Gehel:
[operations/puppet@production] wdqs: configure new servers wdqs100[6-8]
Change 424587 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/dns@master] wdqs: new wdqs-internal service
Change 424599 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs: LVS and conftool configuration for new wdqs-internal service
Change 425051 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/dns@master] wdqs-internal: new entry for service discovery
Change 425051 abandoned by Gehel:
wdqs-internal: new entry for service discovery
Reason:
replaced by https://gerrit.wikimedia.org/r/#/c/424587/
Change 425275 had a related patch set uploaded (by Gehel; owner: Gehel):
[wikidata/query/deploy@master] add new wdqs-internal cluster to scap targets
Change 425275 merged by Smalyshev:
[wikidata/query/deploy@master] add new wdqs-internal cluster to scap targets
Change 424587 merged by Gehel:
[operations/dns@master] wdqs: new wdqs-internal service
Change 424599 merged by Gehel:
[operations/puppet@production] wdqs: LVS and conftool configuration for new wdqs-internal service
Change 426926 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs-internal: new service discovery entry
Change 426926 merged by Gehel:
[operations/puppet@production] wdqs-internal: new service discovery entry
Mentioned in SAL (#wikimedia-operations) [2018-04-16T14:25:38Z] <vgutierrez> restarting pybal on lvs2006 - T187766
Mentioned in SAL (#wikimedia-operations) [2018-04-16T14:42:00Z] <vgutierrez> restart pybal on lvs1006 - T187766
Mentioned in SAL (#wikimedia-operations) [2018-04-16T14:49:41Z] <vgutierrez> restart pybal on lvs2003 - T187766
Mentioned in SAL (#wikimedia-operations) [2018-04-16T14:53:13Z] <vgutierrez> restart pybal on lvs1003 - T187766
The DC specific endpoints and the service discovery endpoint seem to work correctly:
- curl -s wdqs-internal.svc.eqiad.wmnet/readiness-probe
- curl -s wdqs-internal.svc.codfw.wmnet/readiness-probe
- curl -s wdqs-internal.discovery.wmnet/readiness-probe (<- this is the endpoint to use for any internal client)
I'd like to have @Smalyshev to have a look and validate that this all looks correct before sending real traffic...
Change 427160 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs: tune performance limits for the new wdqs-internal cluster
Change 427160 merged by Gehel:
[operations/puppet@production] wdqs: tune performance limits for the new wdqs-internal cluster