Page MenuHomePhabricator

Expose wdqs1009 to wdqs users and gather feedback
Closed, ResolvedPublic5 Estimated Story Points

Description

As a wdqs maintainer I would like to expose some test servers to wdqs users so that I can collect feedback on breaking/major changes.

There are few changes we would like to gather feedback before moving forward:

  • T244341: Stop using blank nodes for encoding SomeValue and OWL constraints in WDQS
  • T244590: Rework the WDQS updater as an event driven application

wdqs1009 has been reloaded with skolem IRIs instead of blank nodes and is close to being updated using the streaming updater.

AC:

  • a new service (name tbd but could be: query-test.wikidata.org) is available publicly

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 636420 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/puppet@production] [wdqs] switch wikibase:isSomeValue to skolem for wdqs1009

https://gerrit.wikimedia.org/r/636420

Change 636420 merged by Ryan Kemper:
[operations/puppet@production] [wdqs] switch wikibase:isSomeValue to skolem for wdqs1009

https://gerrit.wikimedia.org/r/636420

CBogen set the point value for this task to 5.Nov 2 2020, 6:22 PM

This task is about exposing wdqs1009 as a test server to the internet. This needs a new DNS entry, configuration of the caching / routing layer and communication with the traffic team.

RKemper triaged this task as Medium priority.

TODO from IRC meeting with bblack/gehel: create a DNS entry (CNAME to dyna.wm.o), another set of entries in backend.yaml map, create another minisite (with the appropriate configuration)

Change 668173 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] wdqs: expose wdqs1009 externally

https://gerrit.wikimedia.org/r/668173

Posting logs of our IRC convo from ~1 month ago for context when I tag people for review:

[2021-02-04 11:31:36] <gehel> bblack: around ?
[2021-02-04 11:31:58] <gehel> I'm here with ryankemper to talk about T266470, unless you want to jump in a meet
[2021-02-04 11:31:59] <stashbot> T266470: Expose wdqs1009 to wdqs users and gather feedback - https://phabricator.wikimedia.org/T266470
[2021-02-04 11:32:51] <gehel> taking notes in https://etherpad.wikimedia.org/p/expose-wdqs1009
[2021-02-04 11:33:08] <bblack> gehel: hi :)
[2021-02-04 11:33:15] <gehel> o/
[2021-02-04 11:33:15] <bblack> ryankemper: hi too :)
[2021-02-04 11:33:23] <ryankemper> \o
[2021-02-04 11:33:40] <bblack> so, I guess my best understanding of the present stuff is what I see in:
[2021-02-04 11:33:47] <gehel> context: we want to expose wdqs1009 as a test server so that our users can make sure we're not breaking anything with the new WDQS updater
[2021-02-04 11:33:53] <bblack> https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/common/profile/trafficserver/backend.yaml#157
[2021-02-04 11:34:12] <bblack> there's several related definitions there for https://query.wikidata.org/
[2021-02-04 11:34:24] <bblack> for different subspaces of the URI routing to different backends
[2021-02-04 11:34:40] <bblack>  /bigdata/ldf goes explicitly to wds1005
[2021-02-04 11:35:07] <bblack>  /bigdata and /sparql both go to the discovery service to a set of wdqsNNNN
[2021-02-04 11:35:25] <bblack> but the root url hits our misc webapps setup, for the JS powering the query UI
[2021-02-04 11:35:29] <gehel> yep, somewhat similar, but in this case, we want to have a different FQDN (probably query-preview.wikidata.org or something similar) and route to a different server that isn't part of the current WDQS pools
[2021-02-04 11:36:45] <bblack> so in this case, I would guess, you'd want no special exception for /bigdata/ldf, and you'd want all of /bigdata and /sparql -> wdqs1009
[2021-02-04 11:37:06] <bblack> which is something we can just define in that file I linked, and adding a name at the DNS layer, and done
[2021-02-04 11:37:16] <bblack> but the root URL for the UI part, might be a little trickier
[2021-02-04 11:37:39] <bblack> I need to see where that's defined in git, and it might need a new site definition, and to template in a new place to send the UI queries...
[2021-02-04 11:37:57] <bblack> but there's no other special requirements I think/expect?
[2021-02-04 11:37:57] <gehel> we'll probably want to create another microsite for the JS part, that's fairly straightforward
[2021-02-04 11:38:15] <ryankemper> no other special requirements IIRC
[2021-02-04 11:38:17] <gehel> yep, that seems fairly simple to me
[2021-02-04 11:38:22] <gehel> so to resume:
[2021-02-04 11:39:08] <gehel> create a DNS entry (CNAME to dyna.wm.o), another set of entries in backend.yaml map, create another minisite (with the appropriate configuration)
[2021-02-04 11:39:22] <gehel> do we need to do anything for SSL certs? Or is that fully automated?
[2021-02-04 11:39:48] <bblack> that part's all shared between everything that runs through the cache, so long as you stick to our canonical domains and just change the leading hostname part
[2021-02-04 11:40:04] <bblack> to e.g. query-preview.wikidata.org
[2021-02-04 11:40:05] <gehel> "canonical" includes wikidata.org?
[2021-02-04 11:40:11] <gehel> ok, good!
[2021-02-04 11:40:37] <bblack> yeah for reference, the canonical set is: https://wikitech.wikimedia.org/wiki/HTTPS#For_the_Foundation's_canonical_domain_names
[2021-02-04 11:41:13] <gehel> ok, one last stupid and off topic question: what does "dyna" stands for in dyna.wm.o?
[2021-02-04 11:42:14] <bblack> it's not a "real" DNS RR-type you would see in the wild.  It's a custom thing our authserver implements which provides dynamic A and AAAA records depending on geodns.  that's how we route to the closest of our geographic edges for users
[2021-02-04 11:42:36] <bblack> https://github.com/gdnsd/gdnsd/wiki/GdnsdPluginGeoip
[2021-02-04 11:42:41] <bblack> ^ that, basically
[2021-02-04 11:42:57] <gehel> thanks!
[2021-02-04 11:43:23] <bblack> templates/wikimedia.org:dyna            600 IN DYNA geoip!text-addrs
[2021-02-04 11:43:36] <bblack> ^ is what all the other production CNAMEs are pointing towards
[2021-02-04 11:43:54] <bblack> gehel: also I don't know if you've created new service hostnames since the move to netbox
[2021-02-04 11:44:03] <bblack> but we don't do commits on the ops/dns repo for that kind of thing anymore
[2021-02-04 11:44:25] <gehel> Oh no, I havent
[2021-02-04 11:44:33] <gehel> ryankemper: is the one who's goign to do it
[2021-02-04 11:45:00] <gehel> I assume it's documented
[2021-02-04 11:45:01] <bblack> oh wait, I'm getting ahead of ourselves
[2021-02-04 11:45:03] — gehel is searching
[2021-02-04 11:45:13] <bblack> that part still is done in ops/dns ! :)
[2021-02-04 11:45:25] <gehel> now I'm curious anyway!
[2021-02-04 11:45:43] <bblack> https://gerrit.wikimedia.org/r/plugins/gitiles/operations/dns/+/refs/heads/master/templates/wikidata.org#35
[2021-02-04 11:45:49] <bblack> ^ is where the new hostname goes for DNS
[2021-02-04 11:46:52] <bblack> [ https://wikitech.wikimedia.org/wiki/DNS/Netbox if you want to be curious about how Netbox is used to automate a lot of other DNS records]
[2021-02-04 11:47:25] <gehel> cool! no more looking for a free IP!
[2021-02-04 11:48:16] <bblack> ryankemper: is that enough info to go on to propose patches? are there parts we need to help more with?
[2021-02-04 11:48:29] <gehel> if we want to prepare everything and turn it on when ready, we can already prepare the microsite, the mapping and only publish the DNS entry when everything is working
[2021-02-04 11:48:37] <ryankemper> bblack: no I think that's a great starting point, thanks for the context!
[2021-02-04 11:48:57] <gehel> anything special about merging the mapping?
[2021-02-04 11:48:57] <ryankemper> wasn't aware of the canonical hostname stuff so glad to know that it's already automated
[2021-02-04 11:48:58] <bblack> yeah probably push the microsite update first, then the backends.yaml change in puppet, then the DNS entry last
[2021-02-04 11:49:14] <gehel> we'll ping you when we're ready!
[2021-02-04 11:49:21] <bblack> ok sounds good
[2021-02-04 11:49:22] <ryankemper> thanks
[2021-02-04 11:49:28] <gehel> thanks for the help!
[2021-02-04 11:49:55] <bblack> np!

Change 668255 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/dns@master] wdqs: new service alias query-preview

https://gerrit.wikimedia.org/r/668255

Change 668543 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] wdqs: new query-preview microsite

https://gerrit.wikimedia.org/r/668543

Change 668543 merged by Ryan Kemper:
[operations/puppet@production] wdqs: new query-preview microsite

https://gerrit.wikimedia.org/r/668543

Mentioned in SAL (#wikimedia-operations) [2021-03-05T00:39:34Z] <ryankemper> T266470 Ran sudo run-puppet-agent on miscweb1002 without issue; /var/log/apache2/query*.log looks as expected

Change 668173 merged by Ryan Kemper:
[operations/puppet@production] wdqs: expose wdqs1009 externally

https://gerrit.wikimedia.org/r/668173

Mentioned in SAL (#wikimedia-operations) [2021-03-05T00:47:02Z] <ryankemper> T266470 [ats] Deploying new mappings for query-preview.wikidata.org microsite: https://gerrit.wikimedia.org/r/c/operations/puppet/+/668173/

Here's how the ats mapping looks afterdeploy of the backend.yaml changes:

ryankemper@cp1075:~$ sudo cat /etc/trafficserver/remap.config | grep query
map http://query-preview.wikidata.org/bigdata https://wdqs1009.eqiad.wmnet/bigdata
map http://query-preview.wikidata.org/sparql https://wdqs1009.eqiad.wmnet/sparql
map http://query-preview.wikidata.org https://webserver-misc-apps.discovery.wmnet
map http://query.wikidata.org/bigdata/ldf https://wdqs1005.eqiad.wmnet/bigdata/ldf
map http://query.wikidata.org/sparql https://wdqs.discovery.wmnet/sparql
map http://query.wikidata.org/bigdata https://wdqs.discovery.wmnet/bigdata
map http://query.wikidata.org https://webserver-misc-apps.discovery.wmnet

Mentioned in SAL (#wikimedia-operations) [2021-03-05T00:50:39Z] <ryankemper> T266470 [ats] sudo cumin 'A:cp-ats' 'sudo run-puppet-agent'

Change 669972 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] New cert for webserver-misc-apps

https://gerrit.wikimedia.org/r/669972

Change 669972 merged by BBlack:
[operations/puppet@production] New cert for webserver-misc-apps

https://gerrit.wikimedia.org/r/669972

Mentioned in SAL (#wikimedia-operations) [2021-03-10T04:52:47Z] <ryankemper> T266470 Temporarily disabling puppet on all wdqs* hosts in preparation for wdqs.discovery.wmnet certificate revocation

Mentioned in SAL (#wikimedia-operations) [2021-03-10T04:53:38Z] <ryankemper> T266470 ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - T266470"'

Mentioned in SAL (#wikimedia-operations) [2021-03-10T04:53:45Z] <ryankemper> T266470 ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - T266470"'

Mentioned in SAL (#wikimedia-operations) [2021-03-10T04:55:08Z] <ryankemper> T266470 Certificate revoked: ryankemper@puppetmaster1001:/srv/private$ sudo puppet cert clean wdqs.discovery.wmnet

Mentioned in SAL (#wikimedia-operations) [2021-03-10T04:56:08Z] <ryankemper> T266470 In the /srv/private repo, /srv/private/modules/secret/secrets/certificates/certificate.manifests.d/wdqs.certs.yaml has been edited to add the relevant alt_names

Mentioned in SAL (#wikimedia-operations) [2021-03-10T04:57:18Z] <ryankemper> T266470 sudo rm -fv certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.crt.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.csr.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.jks certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.p12 certificates/wdqs.discovery.wmnet/truststore.jks (full paths not provided to fit the IRC line)

Mentioned in SAL (#wikimedia-operations) [2021-03-10T04:58:17Z] <ryankemper> T266470 The above two actions mean that we're ready to generate the new certificate files. Proceeding: sudo cergen -c 'wdqs.*' --generate --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secret/secrets/certificates/certificate.manifests.d on ryankemper@puppetmaster1001:/srv/private

Certificate(wdqs.discovery.wmnet, authorities=[PuppetCA(puppetmaster1001.eqiad.wmnet_8140)]):
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.key.private.pem: PRESENT (mtime: 2019-10-21T08:11:52.741452)
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.key.public.pem: PRESENT (mtime: 2019-10-21T08:11:52.745452)
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.crt.pem: PRESENT (mtime: 2021-03-10T04:58:24.029629)
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/ca.crt.pem: PRESENT (mtime: 2019-12-10T14:26:42.496697)
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.p12: PRESENT (mtime: 2021-03-10T04:58:24.049629)
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.jks: PRESENT (mtime: 2021-03-10T04:58:25.893621)
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/truststore.jks: PRESENT (mtime: 2021-03-10T04:58:26.865617)


ryankemper@puppetmaster1001:/srv/private$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   modules/secret/secrets/certificates/certificate.manifests.d/wdqs.certs.yaml
        modified:   modules/secret/secrets/certificates/wdqs.discovery.wmnet/truststore.jks
        modified:   modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.crt.pem
        modified:   modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.csr.pem
        modified:   modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.jks
        modified:   modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.p12

no changes added to commit (use "git add" and/or "git commit -a")

Relevant status after performing the cergen. (Git commit not created yet as seen by the dirty working directory)

Change 670337 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] wdqs: update cert for query-preview.wikidata.org

https://gerrit.wikimedia.org/r/670337

Mentioned in SAL (#wikimedia-operations) [2021-03-10T05:06:47Z] <ryankemper> T266470 New wdqs.discovery.wmnet.crt added to public operations/puppet repo: https://gerrit.wikimedia.org/r/c/operations/puppet/+/670337/

Change 670337 merged by Ryan Kemper:
[operations/puppet@production] wdqs: update cert for query-preview.wikidata.org

https://gerrit.wikimedia.org/r/670337

Mentioned in SAL (#wikimedia-operations) [2021-03-10T05:13:29Z] <ryankemper> T266470 [/srv/private] chown gitpuppet:gitpuppet on all modified files (were owned by root, probably because I sudo'd - may be that a git commit hook would have caught that but explicitly chowning just to be safe)

Mentioned in SAL (#wikimedia-operations) [2021-03-10T05:15:25Z] <ryankemper> T266470 [/srv/private] All changes commited to private git repo, commit SHA ec1d6cfae8c72e4f807b343cdb9f25c27817d98d

Mentioned in SAL (#wikimedia-operations) [2021-03-10T05:18:15Z] <ryankemper> Enabling puppet on single public wdqs host to verify certificate update is without issue: ryankemper@wdqs1004:~$ sudo enable-puppet "revoking old cert and generating new one with new alt_names - T266470 - root" followed by ryankemper@wdqs1004:~$ sudo run-puppet-agent

Mentioned in SAL (#wikimedia-operations) [2021-03-10T05:20:55Z] <ryankemper> T266470 Enabled puppet on single public wdqs host to verify certificate update is without issue: ryankemper@wdqs1004:~$ sudo enable-puppet "revoking old cert and generating new one with new alt_names - T266470 - root" followed by ryankemper@wdqs1004:~$ sudo run-puppet-agent

Mentioned in SAL (#wikimedia-operations) [2021-03-10T05:24:33Z] <ryankemper> T266470 Test queries passing on wdqs1004, and https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&refresh=1m&var-cluster_name=wdqs&from=now-1h&to=now looks as expected. Proceeding to rest of fleet

Mentioned in SAL (#wikimedia-operations) [2021-03-10T05:26:29Z] <ryankemper> T266470 ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo enable-puppet "revoking old cert and generating new one with new alt_names - T266470 - root"' and ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo run-puppet-agent'

Mentioned in SAL (#wikimedia-operations) [2021-03-10T05:27:34Z] <ryankemper> T266470 Rollout of updated certificate complete. We're now ready to implement envoy for wdqs-test which will allow wdqs1009 to be reachable via port 443 and thereby allow us to go live with query-preview.wikidata.org when the time comes

Change 670339 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] wdqs: impl. envoy for wdqs-test

https://gerrit.wikimedia.org/r/670339

Mentioned in SAL (#wikimedia-operations) [2021-03-10T19:14:18Z] <ryankemper> T266470 Temporarily disabling puppet on all wdqs* hosts in preparation for wdqs.discovery.wmnet certificate revocation

Mentioned in SAL (#wikimedia-operations) [2021-03-10T19:14:27Z] <ryankemper> T266470 on ryankemper@cumin1001: sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - T266470"'

Mentioned in SAL (#wikimedia-operations) [2021-03-10T19:15:43Z] <ryankemper> T266470 sudo puppet cert clean wdqs.discovery.wmnet

Mentioned in SAL (#wikimedia-operations) [2021-03-10T19:16:45Z] <ryankemper> T266470 sudo rm -fv certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.crt.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.csr.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.jks certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.p12 certificates/wdqs.discovery.wmnet/truststore.jks (full paths not provided to fit the IRC line)

Mentioned in SAL (#wikimedia-operations) [2021-03-10T19:18:17Z] <ryankemper> T266470 sudo cergen -c 'wdqs.*' --generate --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secret/secrets/certificates/certificate.manifests.d

Certificate(wdqs.discovery.wmnet, authorities=[PuppetCA(puppetmaster1001.eqiad.wmnet_8140)]):
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.key.private.pem: PRESENT (mtime: 2019-10-21T08:11:52.741452)
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.key.public.pem: PRESENT (mtime: 2019-10-21T08:11:52.745452)
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.crt.pem: PRESENT (mtime: 2021-03-10T19:17:36.206506)
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/ca.crt.pem: PRESENT (mtime: 2019-12-10T14:26:42.496697)
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.p12: PRESENT (mtime: 2021-03-10T19:17:36.226506)
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.jks: PRESENT (mtime: 2021-03-10T19:17:38.030503)
        /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/truststore.jks: PRESENT (mtime: 2021-03-10T19:17:38.938502)


ryankemper@puppetmaster1001:/srv/private$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   modules/secret/secrets/certificates/certificate.manifests.d/wdqs.certs.yaml
        modified:   modules/secret/secrets/certificates/wdqs.discovery.wmnet/truststore.jks
        modified:   modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.crt.pem
        modified:   modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.csr.pem
        modified:   modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.jks
        modified:   modules/secret/secrets/certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.p12

Change 670562 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] wdqs: revert wdqs.discovery.wmnet changes

https://gerrit.wikimedia.org/r/670562

Change 670562 merged by Ryan Kemper:
[operations/puppet@production] wdqs: revert wdqs.discovery.wmnet changes

https://gerrit.wikimedia.org/r/670562

Mentioned in SAL (#wikimedia-operations) [2021-03-10T19:26:23Z] <ryankemper> T266470 sudo chown -Rv gitpuppet:gitpuppet /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/ && sudo chown -v gitpuppet:gitpuppet /srv/private/modules/secret/secrets/ssl/wdqs.discovery.wmnet.key

Mentioned in SAL (#wikimedia-operations) [2021-03-10T19:27:18Z] <ryankemper> T266470 /srv/private commit SHA for this change is 45852086679616bccb5bba3dd6396082b0f25a3d

Mentioned in SAL (#wikimedia-operations) [2021-03-10T19:28:32Z] <ryankemper> T266470 ryankemper@wdqs1004:~$ sudo enable-puppet "revoking old cert and generating new one with new alt_names - T266470 - root" && sudo run-puppet-agent

Mentioned in SAL (#wikimedia-operations) [2021-03-10T19:32:17Z] <ryankemper> T266470 ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo enable-puppet "revoking old cert and generating new one with new alt_names - T266470 - root"' && ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo run-puppet-agent'

Finished rolling back to the previous iteration of wdqs.discovery.wmnet cert since we're now going to create a net-new cert wdqs1009.eqiad.wmnet for wdqs-test

Hit a big blocker with the current proposed approach of using wdqs1009.eqiad.wmnet as the cert name:

ryankemper@puppetmaster1001:/srv/private$ sudo cergen -c 'wdqs1009.*' --generate --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secret/secrets/certificates/certificate.manifests.d
2021-03-12 07:55:03,774 INFO     cergen                                   Generating certificates ['wdqs1009.eqiad.wmnet'] with force=False
2021-03-12 07:55:03,774 INFO     Certificate(wdqs1009.eqiad.wmnet)        Generating all files, force=False...
2021-03-12 07:55:03,776 INFO     Certificate(wdqs1009.eqiad.wmnet)        Generating certificate file
/usr/lib/python3/dist-packages/urllib3/connection.py:362: SubjectAltNameWarning: Certificate for puppetmaster1001.eqiad.wmnet has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
Traceback (most recent call last):
  File "/usr/bin/cergen", line 11, in <module>
    load_entry_point('cergen==0.2.5', 'console_scripts', 'cergen')()
  File "/usr/lib/python3/dist-packages/cergen/main.py", line 93, in main
    certificate.generate(force=args['--force'])
  File "/usr/lib/python3/dist-packages/cergen/certificate.py", line 304, in generate
    self.generate_crt(force=force)
  File "/usr/lib/python3/dist-packages/cergen/certificate.py", line 344, in generate_crt
    self.x509_cert = self.authority.sign(csr, self.expiry)
  File "/usr/lib/python3/dist-packages/cergen/puppet.py", line 193, in sign
    self.name, common_name
RuntimeError: puppetmaster1001.eqiad.wmnet_8140 already has a signed certificate for wdqs1009.eqiad.wmnet. If you are trying to regenerate a Puppet CA signed certificate, you need to first remove the certificate from the Puppet CA. On the puppetmaster, `puppet cert clean wdqs1009.eqiad.wmnet` should do it.

The important part of that output being

RuntimeError: puppetmaster1001.eqiad.wmnet_8140 already has a signed certificate for wdqs1009.eqiad.wmnet. If you are trying to regenerate a Puppet CA signed certificate, you need to first remove the certificate from the Puppet CA. On the puppetmaster, `puppet cert clean wdqs1009.eqiad.wmnet` should do it.

Corresponding to this cergen manifest file modules/secret/secrets/certificates/certificate.manifests.d/wdqs1009.certs.yaml:

wdqs1009.eqiad.wmnet:
  authority: puppet_ca
  expiry: null
  alt_names: ["wdqs1009.eqiad.wmnet","query-preview.wikidata.org"]
  key:
    password: REDACTED
    algorithm: ec

I'm a bit lost on the right way to proceed. Thinking out loud here:

Option 1 Just actually re-generate the cert with puppet cert clean wdqs1009.eqiad.wmnet before running cergen. This feels wrong though, and I think ultimately it comes down to using cergen in a way that wasn't intended: name collision with the existing cert puppet has.

Option 2 Do this instead:

query-preview.wikidata.org:
  authority: puppet_ca
  expiry: null
  alt_names: ["query-preview.wikidata.org", "wdqs1009.eqiad.wmnet"]
  key:
    password: REDACTED
    algorithm: ec

Might want to talk to traffic again. But in the interest of iteration speed I think trying out Option 2 is not a bad idea; there's not really any risk since this is pertaining to wdqs1009.eqiad.wmnet which is a test host.

Option 2 fails to even generate the cert. All the cergen documentation is written for a certificate like query-preview.discovery.wmnet and not wdqs1009.eqiad.wmnet or query-preview.wikidata.org. So I do think this just isn't what cergen is built to do.

ryankemper@puppetmaster1001:/srv/private$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

        modules/secret/secrets/certificates/certificate.manifests.d/wdqs1009.certs.yaml

nothing added to commit but untracked files present (use "git add" to track)
ryankemper@puppetmaster1001:/srv/private$ sudo cergen -c 'wdqs1009.*' --generate --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secret/secrets/certificates/certificate.manifests.d
2021-03-12 08:06:03,210 INFO     cergen                                   Generating certificates [] with force=False

Status of certificates []

Ah, so poking around the certificate.manifests.d repo I see certs that don't necessarily follow the discovery.wmnet pattern. To me that implies Option 2 should be working, so I might be missing something. Here's an example that doesn't use discovery:

ryankemper@puppetmaster1001:/srv/private$ cat modules/secret/secrets/certificates/certificate.manifests.d/analytics_http_ui.certs.yaml
yarn.wikimedia.org:
  authority: puppet_ca
  expiry: null
  alt_names: ["yarn.wikimedia.org", "hue.wikimedia.org", "hue-next.wikimedia.org", "superset.wikimedia.org", "pivot.wikimedia.org", "turnilo.wikimedia.org", "stats.wikimedia.org", "analytics.wikimedia.org", "piwik.wikimedia.org", "datasets.wikimedia.org"]
  key:
    password: REDACTED
    algorithm: ec

EDIT: Okay, figured it out. The problem was the filename I was using. Option 2 works if I name the file like so: modules/secret/secrets/certificates/certificate.manifests.d/query-preview.certs.yaml. I don't like that it doesn't have wdqs in the name but that can be dealt with at a later point.

ryankemper@puppetmaster1001:/srv/private$ sudo cergen -c 'query-preview.*' --generate --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secre
t/secrets/certificates/certificate.manifests.d
2021-03-12 08:19:15,044 INFO     cergen                                   Generating certificates ['query-preview.wikidata.org'] with force=False
2021-03-12 08:19:15,044 INFO     Certificate(query-preview.wikidata.org)  Generating all files, force=False...
2021-03-12 08:19:15,046 INFO     Certificate(query-preview.wikidata.org)  Generating certificate file
/usr/lib/python3/dist-packages/urllib3/connection.py:362: SubjectAltNameWarning: Certificate for puppetmaster1001.eqiad.wmnet has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python3/dist-packages/urllib3/connection.py:362: SubjectAltNameWarning: Certificate for puppetmaster1001.eqiad.wmnet has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python3/dist-packages/urllib3/connection.py:362: SubjectAltNameWarning: Certificate for puppetmaster1001.eqiad.wmnet has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
2021-03-12 08:19:16,459 INFO     Certificate(query-preview.wikidata.org)  Generating CA certificate file
2021-03-12 08:19:16,459 INFO     Certificate(query-preview.wikidata.org)  Generating PKCS12 keystore file
2021-03-12 08:19:16,769 INFO     Certificate(query-preview.wikidata.org)  Generating Java keystore file
2021-03-12 08:19:17,826 INFO     Certificate(query-preview.wikidata.org)  Importing PuppetCA(puppetmaster1001.eqiad.wmnet_8140) cert into Java keystore
2021-03-12 08:19:18,849 INFO     Certificate(query-preview.wikidata.org)  Generating Java truststore file with CA certificate PuppetCA(puppetmaster1001.eqiad.wmnet_8140)

Status of certificates ['query-preview.wikidata.org']

Certificate(query-preview.wikidata.org, authorities=[PuppetCA(puppetmaster1001.eqiad.wmnet_8140)]):
        /srv/private/modules/secret/secrets/certificates/query-preview.wikidata.org/query-preview.wikidata.org.key.private.pem: PRESENT (mtime: 2021-03-12T08:19:15.045442)
        /srv/private/modules/secret/secrets/certificates/query-preview.wikidata.org/query-preview.wikidata.org.key.public.pem: PRESENT (mtime: 2021-03-12T08:19:15.045442)
        /srv/private/modules/secret/secrets/certificates/query-preview.wikidata.org/query-preview.wikidata.org.crt.pem: PRESENT (mtime: 2021-03-12T08:19:16.457441)
        /srv/private/modules/secret/secrets/certificates/query-preview.wikidata.org/ca.crt.pem: PRESENT (mtime: 2021-03-12T08:19:16.457441)
        /srv/private/modules/secret/secrets/certificates/query-preview.wikidata.org/query-preview.wikidata.org.keystore.p12: PRESENT (mtime: 2021-03-12T08:19:16.473441)
        /srv/private/modules/secret/secrets/certificates/query-preview.wikidata.org/query-preview.wikidata.org.keystore.jks: PRESENT (mtime: 2021-03-12T08:19:18.285438)
        /srv/private/modules/secret/secrets/certificates/query-preview.wikidata.org/truststore.jks: PRESENT (mtime: 2021-03-12T08:19:19.201437)

Change 671079 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[labs/private@master] wdqs: new query-preview for wdqs1009 (test host)

https://gerrit.wikimedia.org/r/671079

Change 671079 merged by Gehel:
[labs/private@master] wdqs: new query-preview for wdqs1009 (test host)

https://gerrit.wikimedia.org/r/671079

Change 670339 merged by Ryan Kemper:
[operations/puppet@production] wdqs: impl. envoy for wdqs-test

https://gerrit.wikimedia.org/r/670339

Current status for when I pick this back up:

  • query-preview.wikidata.org (with wdqs1009.eqiad.wmnet as an alt_name) is generated
Error: /Stage[main]/Envoyproxy/Exec[verify-envoy-config]: Failed to call refresh: '/usr/local/sbin/build-envoy-config -c '/etc/envoy'' returned 1 instead of one of [0]
Error: /Stage[main]/Envoyproxy/Exec[verify-envoy-config]: '/usr/local/sbin/build-envoy-config -c '/etc/envoy'' returned 1 instead of one of [0]

Error: /Stage[main]/Profile::Tlsproxy::Envoy/Sslcert::Certificate[query-preview.wikidata.org]/File[/etc/ssl/localcerts/query-preview.wikidata.org.crt]: Could not evaluate: Could not retrieve information from environment production source(s) puppet:///files/ssl/query-preview.wikidata.org.crt

Change 671267 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] wdqs: new query-preview cert for wdqs-test

https://gerrit.wikimedia.org/r/671267

I missed a step yesterday: I'd updated /srv/private as well as the public labs/private repo but missed the step for updating operations/puppet with the new pubkey: https://gerrit.wikimedia.org/r/c/operations/puppet/+/671267

Change 671267 merged by Ryan Kemper:
[operations/puppet@production] wdqs: new query-preview cert for wdqs-test

https://gerrit.wikimedia.org/r/671267

Change 671273 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] wdqs: new query-preview cert for wdqs-test

https://gerrit.wikimedia.org/r/671273

Change 671273 merged by Ryan Kemper:
[operations/puppet@production] wdqs: new query-preview cert for wdqs-test

https://gerrit.wikimedia.org/r/671273

Mentioned in SAL (#wikimedia-operations) [2021-03-12T22:53:50Z] <ryankemper> T266470 Manually disabled service notifications for Check no envoy runtime configuration is left persistent, will need to circle back on Monday to restore notifications

Change 668255 merged by Ryan Kemper:
[operations/dns@master] wdqs: new service alias query-preview

https://gerrit.wikimedia.org/r/668255

This comment was removed by RKemper.

The issues with envoy were resolved by running sudo /usr/local/sbin/build-envoy-config -c /etc/envoy to properly build /etc/envoy/envoy.yaml. That should have been done by puppet already, triggered upon a sudo systemctl restart envoyproxy.service, but it didn't - perhaps a race condition. See https://gerrit.wikimedia.org/g/operations/puppet/+/b7dacbca9fae42b32bb91fd485a3f2c70ff903b3/modules/envoyproxy/manifests/init.pp#81 and https://gerrit.wikimedia.org/g/operations/puppet/+/b7dacbca9fae42b32bb91fd485a3f2c70ff903b3/modules/envoyproxy/manifests/conf.pp#30 for the puppet code that normally does it automatically.

Envoy is working properly now and wdqs1009 is now reachable from the caching layer

DNS change logs

ryankemper@authdns1001:~$ sudo authdns-update
Updating authdns1001.wikimedia.org (self)...
Pulling the current revision from https://gerrit.wikimedia.org/r/operations/dns.git
Reviewing 85d9b49dc2ff0f8e3657f6f2cd91ce3df79bd1cf...

 templates/wikidata.org | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git templates/wikidata.org templates/wikidata.org
index 9f1a5ed3..cea393b1 100644
--- templates/wikidata.org
+++ templates/wikidata.org
@@ -32,7 +32,8 @@
 ; Servers (alphabetic order)

 ; Service aliases
-query       1D IN CNAME dyna.wikimedia.org.
+query         1D IN CNAME dyna.wikimedia.org.
+query-preview 1D IN CNAME dyna.wikimedia.org.

 ; Wikis (alphabetic order)


Merge these changes? (yes/no)? yes
Updating 3a84bd1c..85d9b49d
Fast-forward
 templates/wikidata.org | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Deploying via utils/deploy-check.py...
Assembling and testing data in /tmp/dns-check.voy682v4
 -- Generating zonefiles from zone templates
 -- Processed 205 zones into directory /tmp/dns-check.voy682v4/zones
OK: No tabs
Summary of violations:
    W001|MISSING_IP_FOR_NAME_AND_PTR: 35
    W002|MISSING_PTR_FOR_NAME_AND_IP: 27
    W103|MISSING_MGMT_FOR_NAME: 59
    W105|TOO_MANY_PUBLIC_NAMES: 3
RESULT: 0 Errors, 124 Warnings, 0 Ignored violations, 0 Ignored lines
 -- Copying automatically generated zone files under target tree
 -- Copying repo-driven real config files and admin_state
 -- Copying puppetized config and GeoIP from /etc/gdnsd
 -- Checking for illegal tabs in zonefiles
 -- Running zone_validator to check WMF rules
 -- Running /usr/sbin/gdnsd checkconf on /tmp/dns-check.voy682v4
 -- Preflight checkconf is OK
Deploying from /tmp/dns-check.voy682v4 to system dirs
 -- Descending to subdirectory: netbox
 -- Done with subdir: netbox
 -- Zone changed: wikidata.org
Reloading gdnsd zonefiles
info: Zone data reloaded
OK
---------------
authdns2001.wikimedia.org,dns[1001-1002,2001-2002,3001-3002,4001-4002,5001-5002].wikimedia.org (11)
---------------
OK - authdns updated successfully

OK - authdns-update successful on all nodes!

Validation looks good:

user@computer ~/wmf/gui-deploy [production]% for i in 0 1 2 ; do dig @ns${i}.wikimedia.org -t any query-preview.wikidata.org ; done

; <<>> DiG 9.10.6 <<>> @ns0.wikimedia.org -t any query-preview.wikidata.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58419
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1024
;; QUESTION SECTION:
;query-preview.wikidata.org.    IN      ANY

;; ANSWER SECTION:
query-preview.wikidata.org. 86400 IN    CNAME   dyna.wikimedia.org.

;; Query time: 82 msec
;; SERVER: 208.80.154.238#53(208.80.154.238)
;; WHEN: Fri Mar 12 17:06:00 PST 2021
;; MSG SIZE  rcvd: 84


; <<>> DiG 9.10.6 <<>> @ns1.wikimedia.org -t any query-preview.wikidata.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3886
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1024
;; QUESTION SECTION:
;query-preview.wikidata.org.    IN      ANY

;; ANSWER SECTION:
query-preview.wikidata.org. 86400 IN    CNAME   dyna.wikimedia.org.

;; Query time: 61 msec
;; SERVER: 208.80.153.231#53(208.80.153.231)
;; WHEN: Fri Mar 12 17:06:00 PST 2021
;; MSG SIZE  rcvd: 84


; <<>> DiG 9.10.6 <<>> @ns2.wikimedia.org -t any query-preview.wikidata.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53512
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1024
;; QUESTION SECTION:
;query-preview.wikidata.org.    IN      ANY

;; ANSWER SECTION:
query-preview.wikidata.org. 86400 IN    CNAME   dyna.wikimedia.org.

;; Query time: 345 msec
;; SERVER: 91.198.174.239#53(91.198.174.239)
;; WHEN: Fri Mar 12 17:06:01 PST 2021
;; MSG SIZE  rcvd: 84

Mentioned in SAL (#wikimedia-operations) [2021-03-13T01:18:44Z] <ryankemper> T266470 Re-enabled icinga service notifications for Check no envoy runtime configuration is left persistent on wdqs100[9,10]

I didn't notice this task before. Where can I read the feedback you got? From who did you get feedback?

@dcausse thanks for the pointers. Might be worth switch structured data on Commons first to the new approach. It is much less integrated in all sorts of processes and it uses a ton of blank nodes or was Commons already switched? See for example P170 (creator) on https://commons.wikimedia.org/entity/M106076433.rdf for how blank nodes are currently used on Commons.

Change 908635 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: remove query-preview microsite

https://gerrit.wikimedia.org/r/908635

Change 908635 merged by Ryan Kemper:

[operations/puppet@production] wdqs: remove query-preview microsite

https://gerrit.wikimedia.org/r/908635