Page MenuHomePhabricator

Create termbox release for test.wikidata.org
Closed, ResolvedPublic

Description

It would be good to have a version of the termbox service that reads from test.wikidata.org.

@fsero pointed out that just tweaking a new values file would not be sufficient. Neither would just installing it on a different port.

Actually it needs:

  • DNS entries
  • LVS entries
  • wmf-service checks

I suggest that we use termbox-test as the names and port 3031

Details

Related Gerrit Patches:
operations/dns : masterAssign termbox-test.svc.{eqiad,codfw}.wmnet LVS IPs
operations/dns : masterk8s: introducing termbox-test.staging.svc.eqiad.wmnet
operations/deployment-charts : masterAdd termbox-test release
operations/deployment-charts : masterAdd termbox-test release
operations/puppet : productiontermbox: add Kubernetes stanzas for test
operations/puppet : productionIntroduce termbox-test LVS configuration
operations/dns : masterEnable discovery for termbox-test

Event Timeline

Tarrow created this task.Jun 28 2019, 10:09 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 28 2019, 10:09 AM
Tarrow claimed this task.Jul 9 2019, 8:39 AM
Tarrow edited projects, added Wikidata-Termbox-Iteration-19; removed Wikidata-Termbox.
Tarrow moved this task from To Do to Doing on the Wikidata-Termbox-Iteration-19 board.

Change 521449 had a related patch set uploaded (by Tarrow; owner: Tarrow):
[operations/puppet@production] Introduce termbox-test LVS configuration

https://gerrit.wikimedia.org/r/521449

Change 521452 had a related patch set uploaded (by Tarrow; owner: Tarrow):
[operations/puppet@production] termbox: add Kubernetes stanzas for test

https://gerrit.wikimedia.org/r/521452

Change 521456 had a related patch set uploaded (by Tarrow; owner: Tarrow):
[operations/dns@master] Assign termbox-test.svc.{eqiad,codfw}.wmnet LVS IPs

https://gerrit.wikimedia.org/r/521456

Change 521459 had a related patch set uploaded (by Tarrow; owner: Tarrow):
[operations/dns@master] Enable discovery for termbox-test

https://gerrit.wikimedia.org/r/521459

Tarrow updated the task description. (Show Details)Jul 9 2019, 11:04 AM
Tarrow added a subscriber: akosiaris.

I believe this patch is waiting on:

From IRC

<akosiaris> tarrow: sure, I am not sure we should be adding a new LVS service though
3:29 PM we 've recently had some concerns about the scalability of it, I 'll reach out to traffic to make sure

So, having a look into this, we don't really have LVS for testing services, (as they don't really need high availability). In fact we don't really have testing services at all in production. We are also meeting some performance issues with pybal (the software that powers LVS automation so we are trying to not push more LVS services into production until it's cleared).

That being said, maybe we can address the need for having a version of the service using test.wikidata.org with a different release and without having to go through LVS. The question to ask though is, Will it be addressed by some other software in production or just testing tools?

This service would just be to serve test.wikidata.org.

I'm not clear if you would see this only as a testing service (because it doesn't have any real users) or a production service (because test.wikidata.org is a real wiki running on production hardware). The only software in production that will address it (as far as I can see) is Mediawiki.

I believe we need to keep test.wikidata.org as similar to www.wikidata.org as possible if it is to remain valuable as a test instance so I think we need to have it configured to render the termbox via SSR.

I am not really worried about how HA it is but was just trying to strive for consistency with www.wikidata.org.

I 've brought this up in the weekly SRE meeting. Overall there's a number of concerns. I 'll be listing them below in no particular order

  • There is no real precedent about having some service powering testwikis
  • There's some pybal/lvs concerns about scaling performance. Essentially we won't be able to handle tons of LVS testing services so we would like to avoid setting LVS for this. Overall some of these should be (and are being) handled but there's also operational issues. LVS, for better or worse is not something you want to change daily and the configuration system for it is built with that in mind
  • Adding LVS, which is a mechanism for achieving High Availability, in front of a testing service makes little sense.
  • We do have a staging environment which presumably could we used for this.

That being said, the use case is quite useful obviously and it does make sense to find a way to support it.

With all that in mind, it probably makes more sense to create some DNS hostname like termbox-test.staging.svc.eqiad.wmnet pointing to the hosts powering the staging cluster and either:

  • Reuse the current staging termbox helm release, with some minor config changes to have it address test.wikidata.org
  • Add a new specific helm release which talks to test.wikidata.org.

@Tarrow, @Jakob_WMDE, how does this sound?

This sounds great! Actually:

... create some DNS hostname like termbox-test.staging.svc.eqiad.wmnet pointing to the hosts powering the staging cluster and ...
Add a new specific helm release which talks to test.wikidata.org.

is perfect for this need I think. It's almost what I wanted to do but didn't know how.

I think we can't reuse staging because otherwise we'll miss out on proper staging for www.wikidata.org.

I'm really keen to make progress on this so I'm happy to make the patches if that helps. I guess I have some questions:

  • How and where do I create termbox-test.staging.svc.eqiad.wmnet? CNAMEs to kubestage1001/kubestage1002?
  • how do I create the new helm release? Mostly Copy and paste the staging helmfile.d s?
akosiaris triaged this task as High priority.Jul 19 2019, 8:43 AM
akosiaris moved this task from Backlog to Doing on the serviceops board.

Change 521459 abandoned by Alexandros Kosiaris:
Enable discovery for termbox-test

Reason:
Per https://phabricator.wikimedia.org/T226814#5346159

https://gerrit.wikimedia.org/r/521459

Change 521449 abandoned by Alexandros Kosiaris:
Introduce termbox-test LVS configuration

Reason:
Per https://phabricator.wikimedia.org/T226814#5346159

https://gerrit.wikimedia.org/r/521449

Change 521452 abandoned by Alexandros Kosiaris:
termbox: add Kubernetes stanzas for test

Reason:
Per https://phabricator.wikimedia.org/T226814#5346159

https://gerrit.wikimedia.org/r/521452

Change 524797 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/dns@master] k8s: introducing termbox-test.staging.svc.eqiad.wmnet

https://gerrit.wikimedia.org/r/524797

Thanks for the work and your thoughts on this so far @akosiaris and @fsero!
Would you be able to estimate when the new version of the service could be available? We'd gladly take the bad news, no need for optimistic estimates :)

The deployment test.wikidata.org is a part of the deployment plan we've created for deploying our new feature to Wikidata, and actually this seems to be the only step the we're missing. If the deployment to test.wikidata.org (including the testwiki "version" of the service) is going to take some more weeks, we'd likely consider some alternative deployment plans and timelines.

And to repeat what @Tarrow have mentioned above: if there is any work we could help with to get this done, please let us know.

Change 524817 had a related patch set uploaded (by Tarrow; owner: Tarrow):
[operations/deployment-charts@master] Add termbox-test release

https://gerrit.wikimedia.org/r/524817

Change 524817 merged by Fsero:
[operations/deployment-charts@master] Add termbox-test release

https://gerrit.wikimedia.org/r/524817

Change 525054 had a related patch set uploaded (by Fsero; owner: Tarrow):
[operations/deployment-charts@master] Add termbox-test release

https://gerrit.wikimedia.org/r/525054

Change 525054 merged by Fsero:
[operations/deployment-charts@master] Add termbox-test release

https://gerrit.wikimedia.org/r/525054

Change 524797 merged by Fsero:
[operations/dns@master] k8s: introducing termbox-test.staging.svc.eqiad.wmnet

https://gerrit.wikimedia.org/r/524797

fsero closed this task as Resolved.Jul 23 2019, 10:06 AM

This has been deployed via the DNS artifact previously discused .

fsero@deploy1001:~$ curl -v termbox-test.staging.svc.eqiad.wmnet:3031/_info
*   Trying 10.64.16.92...
* TCP_NODELAY set
* Connected to termbox-test.staging.svc.eqiad.wmnet (10.64.16.92) port 3031 (#0)
> GET /_info HTTP/1.1
> Host: termbox-test.staging.svc.eqiad.wmnet:3031
> User-Agent: curl/7.52.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Content-Type: application/json; charset=utf-8
< Content-Length: 45
< ETag: W/"2d-1x8woC5uJVpEHsLEfQAm4TMbX4I"
< Vary: Accept-Encoding
< Date: Tue, 23 Jul 2019 10:05:39 GMT
< Connection: keep-alive
< 
* Curl_http_done: called premature == 0
* Connection #0 to host termbox-test.staging.svc.eqiad.wmnet left intact
{"name":"wikibase-termbox","version":"0.1.0"}

Change 521456 abandoned by Tarrow:
Assign termbox-test.svc.{eqiad,codfw}.wmnet LVS IPs

https://gerrit.wikimedia.org/r/521456