Page MenuHomePhabricator

Make testreduce web UI publicly accessible on the internet
Closed, ResolvedPublic

Description

At one point, scandium used to have http://parsoid-rt-tests.wikimedia.org/ pointed at the parsoid-rt webservice web UI. But, once scandium became a mediawiki appserver, and since parsoid's rt test services aren't as heavily security hardened or tested or updated compared to production code, we decided to disable that public web access as part of this patch.

Now, as we move the parsoid-rt and parsoid-rt-client node services away from scandium onto testreduce1001, we can revisit this decision. testreduce1001 is not (need not be) a mediawiki app server and doesn't need to run PHP code at all. Right now parsoid-rt on testreduce1001 continues to connect to a database hosted on a production database server. However, as noted in T257906#6390890 parsoid-rt on testreduce1001 can simply connect to a local database on testreduce1001 and be completely isolated from any production services (but it still needs enough access to be able to issue Parsoid REST API requests to scandium).

So, here are some tasks:

  • Enable mysql/maraiadb on testreduce1001
  • Create a new database
  • Initialize this with a fresh set of test titles
  • Revert some version of https://gerrit.wikimedia.org/r/c/operations/puppet/+/534271 to enable the webserver on testreduce1001 and to point parsoid-rt-tests.wikimedia.org to parsoid-rt webserver UI
  • create certificate for testreduce.discovery.wmnet in private repo, copy to public repo, create fake cert in labs/private
  • add testreduce.discovery.wmnet in DNS and point to testreduce1001
  • add envoy on backend for TLS termination and let it speak to 8001 on nginx as upstream
  • add parsoid-rt-tests to the envoy TLS cert for testreduce.discovery.wmnet

This is not high priority and if any of this work is cumbersome or involves a lot of work, feel free to decline. And, this can also be done after the parent task is resolved as well. So free to edit / update the task as appropriate.

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+20 -19
operations/puppetproduction+0 -1
operations/puppetproduction+1 -0
operations/puppetproduction+1 -0
operations/puppetproduction+4 -1
operations/puppetproduction+4 -0
operations/dnsmaster+1 -0
operations/puppetproduction+3 -0
labs/privatemaster+3 -0
operations/puppetproduction+23 -0
operations/puppetproduction+2 -2
operations/dnsmaster+1 -0
operations/puppetproduction+1 -1
operations/puppetproduction+6 -8
operations/puppetproduction+2 -0
Show related patches Customize query in gerrit

Event Timeline

I will be afk for about 2 weeks. If this needs earlier attention (I assume not, based on low prio etc) please contact the subteam.

Dzahn triaged this task as Medium priority.Nov 3 2020, 2:14 AM
Dzahn moved this task from Incoming 🐫 to API Gateway 🥌 on the serviceops board.

Change 654318 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] parsoid: include a generic mariadb server in testreduce role

https://gerrit.wikimedia.org/r/654318

Change 654318 merged by Dzahn:
[operations/puppet@production] parsoid: include a generic mariadb server in testreduce role

https://gerrit.wikimedia.org/r/654318

Change 654322 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] parsoid::testing: fix duplicate declaration of mariadb-client for buster

https://gerrit.wikimedia.org/r/654322

Change 654322 merged by Dzahn:
[operations/puppet@production] parsoid::testing: fix duplicate declaration of mariadb-client for buster

https://gerrit.wikimedia.org/r/654322

A generic mariadb server has now been installed by puppet on testreduce1001. (no change on scandium which at first conflicted with this).

The config is in /etc/my.cnf

The data_dir is /srv/sqldata.

No database has been created yet though.

Change 653998 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] Revert "remove parsoid-vd/parsoid-rt.wikimedia.org"

https://gerrit.wikimedia.org/r/653998

Next we need to re-create the DNS entries (parsoid-rt-tests, parsoid-vd-tests) before we can point them to the new backend in the caching layer.

nginx config exists on testreduce1001 as it is puppetized.

Couple observations:

  1. We don't need parsoid-vd-tests on testreduce1001 anymore since there are no immediate plans to run visual diff tests there. If we need do that on production vms in the future, that will probably be its own vm.
  2. If we are going to have the test db be local on testreduce1001, then all parsoid-test-roots should have all privileges to those db. You can pick the same db-name and user-name from /etc/testreduce/parsoid-rt.settings.js for that new local db.

Change 654351 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] ATS: re-add config for parsoid-rt-tests.wikimedia.org

https://gerrit.wikimedia.org/r/654351

@ssastry ACK, only "rt" no "vd" needed. Adjusted the patches accordingly.

Regarding the database:

  • I created a new database "testreduce" on the local MariaDB server
  • I then granted "all privileges" to a user also called "testreduce" and with the password from parsoid-rt.settings.js.

So database name, user name and password are all exactly like before, just that it is now running on localhost instead of on m5-master.

[testreduce1001:~] $ mysql -h localhost -u testreduce testreduce -p
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 56

Now you can use that user to import the data you need.

Change 654565 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] parsoid::testing: switch db_host from m5-master to localhost

https://gerrit.wikimedia.org/r/654565

Change 654565 merged by Dzahn:
[operations/puppet@production] parsoid::testing: switch db_host from m5-master to localhost

https://gerrit.wikimedia.org/r/654565

Change 653998 merged by Dzahn:
[operations/dns@master] Revert "remove parsoid-rt-tests.wikimedia.org"

https://gerrit.wikimedia.org/r/653998

Change 658679 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[operations/puppet@production] Parsoid Testing: Switch rt/vd server db hosts to localhost

https://gerrit.wikimedia.org/r/658679

Change 658679 merged by Dzahn:
[operations/puppet@production] Parsoid Testing: Switch rt/vd server db hosts to localhost

https://gerrit.wikimedia.org/r/658679

Change 658695 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] add certificate for testreduce.discovery.wmnet

https://gerrit.wikimedia.org/r/658695

Change 658696 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[labs/private@master] add fake cert for testreduce.discovery.wmnet

https://gerrit.wikimedia.org/r/658696

Change 658695 merged by Dzahn:
[operations/puppet@production] add certificate for testreduce.discovery.wmnet

https://gerrit.wikimedia.org/r/658695

Change 658696 merged by Dzahn:
[labs/private@master] add fake cert for testreduce.discovery.wmnet

https://gerrit.wikimedia.org/r/658696

Change 654351 merged by Dzahn:
[operations/puppet@production] ATS: re-add config for parsoid-rt-tests.wikimedia.org

https://gerrit.wikimedia.org/r/654351

Change 658701 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add testreduce.discovery.wmnet, point to testreduce1001

https://gerrit.wikimedia.org/r/658701

Change 658701 merged by Dzahn:
[operations/dns@master] add testreduce.discovery.wmnet, point to testreduce1001

https://gerrit.wikimedia.org/r/658701

Change 658706 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] parsoid/testreduce: add envoy on testreduce1001 for TLS termination

https://gerrit.wikimedia.org/r/658706

Change 658706 merged by Dzahn:
[operations/puppet@production] parsoid/testreduce: add envoy on testreduce1001 for TLS termination

https://gerrit.wikimedia.org/r/658706

Change 658708 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] trafficserver/parsoid: switch TLS termination to 443, upstream port 8001

https://gerrit.wikimedia.org/r/658708

Change 658708 merged by Dzahn:
[operations/puppet@production] trafficserver/parsoid: switch TLS termination to 443, upstream port 8001

https://gerrit.wikimedia.org/r/658708

Change 659051 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] parsoid::testreduce: let envoy listen on IPv6 as well

https://gerrit.wikimedia.org/r/659051

Change 659051 merged by Dzahn:
[operations/puppet@production] parsoid::testreduce: let envoy listen on IPv6 as well

https://gerrit.wikimedia.org/r/659051

Mentioned in SAL (#wikimedia-operations) [2021-01-27T18:50:14Z] <mutante> testreduce1001 - making nginx listen on IPv6 and restarting it T266509

Change 659058 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] parsoid/testing: let nginx also listen on IPv6

https://gerrit.wikimedia.org/r/659058

Change 659058 merged by Dzahn:
[operations/puppet@production] parsoid/testing: let nginx also listen on IPv6

https://gerrit.wikimedia.org/r/659058

Change 666694 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] Revert "parsoid::testreduce: let envoy listen on IPv6 as well"

https://gerrit.wikimedia.org/r/666694

Change 666694 merged by Dzahn:
[operations/puppet@production] Revert "parsoid::testreduce: let envoy listen on IPv6 as well"

https://gerrit.wikimedia.org/r/666694

Mentioned in SAL (#wikimedia-operations) [2021-03-12T21:52:08Z] <mutante> puppetmaster1001 sudo puppet cert clean testreduce.discovery.wmnet (T266509)

Change 671275 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] ssl: add regenerated TLS cert for testreduce with new SAN

https://gerrit.wikimedia.org/r/671275

Change 671275 merged by Dzahn:
[operations/puppet@production] ssl: add regenerated TLS cert for testreduce with new SAN

https://gerrit.wikimedia.org/r/671275

@ssastry Done! https://parsoid-rt-tests.wikimedia.org/ has been reactivated.

It needed the parsoid-rt-tests.wikimedia.org name on the envoy certificate to allow for TLS termination on the backend. Fixed that and now it's back.

@Dzahn The static files aren't rendering, ex. https://parsoid-rt-tests.wikimedia.org/static/style.css

curl http://localhost:8003/static/style.css on testreduce1001 returns the expected result though.

A quick glance at nginx-parsoid-testing seems alright, so maybe the requests aren't making it there?

Reopening to follow up on the failure to fully serve all the static files.

To followup on Arlo's comments above, it looks like something is intercepting requests to parsoid-rt-tests.wikimedia.org/static/ ... That url returns 503 forbidden even when I stop nginx. Additionally, if I add /static1/ or some other path component that is not /static/ to the testreduce server, they all work fine. It is only this exact /static/ path component that is a problem.

@Dzahn or someone in serviceops, can you take a look please?

Ping! This would also make accessing test results less cumbersome without needing to set up ssh tunnels.

So, I have been debugging this again and summary is:

the chain here is (traffic layer) -> envoy (443) -> nginx (8001) -> nodejs (8003)

And i can fetch the missing /static/style.css from all 3 places:

I can get it directly from nodejs:

curl http://localhost:8003/static/style.css (works)

I can get it from nginx:

curl http://localhost:8001/static/style.css (works)

I can get it from envoy:

curl --connect-to parsoid-rt-tests.wikimedia.org:443:localhost:443 https://parsoid-rt-tests.wikimedia.org/static/style.css (works)

All of this works so that should rule out a bunch of things we have been guessing it could be.

looks like something is intercepting requests to parsoid-rt-tests.wikimedia.org/static/ ... That url returns 503 forbidden even when I stop nginx. Additionally, if I add /static1/ or some other path component that is not /static/ to the testreduce server, they all work fine. It is only this exact /static/ path component that is a problem.

@ssastry Sorry for the delay here, but see above. I can confirm it's working through the entire stack on testreduce1001 itself and was already suspicious it must then be traffic layer. Then I re-read your comment and eventually found this:

# normalize all /static to the same hostname for caching
if (req.url ~ "^/static/") { set req.http.host = "<%= @vcl_config.fetch("static_host") %>"; }

This is in puppet/modules/varnish/templates/text-frontend.inc.vcl.erb so... I am relatively sure now this is varnish and outside testreduce1001.

The options seem to escalate this as a bug to traffic or to avoid the specific string "static" in your URLs. I will add traffic to this now.

Hey traffic, I added you to this ticket because I think a line in varnish config above, the one that handles URLs with "static" in it in a different way, caused this issue here for a non-mediawiki/appserver backend behind Varnish.

effect: https://parsoid-rt-tests.wikimedia.org/ works but https://parsoid-rt-tests.wikimedia.org/static/style.css is 404 but not on the backend

Should other clients just avoid that kind of URL or can we exclude certiain backends from that rule above?

edit: uploaded suggested fix after Majavah pointed me to the list of "alternate_domains" in Hiera.

Change 749574 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] add parsoid-rt-tests.wikimedia.org to alternate_domains

https://gerrit.wikimedia.org/r/749574

Change 749574 merged by Dzahn:

[operations/puppet@production] add parsoid-rt-tests.wikimedia.org to alternate_domains

https://gerrit.wikimedia.org/r/749574

Perfect! It works now. Thanks!

Great! thanks for confirming. and sorry for the delay in between

Broken again .. not sure if someone reverted the patch or something else overwriote your changes but https://parsoid-rt-tests.wikimedia.org/static/style.css is 404ing.

its not 404ing for me right now. I suspect it was still cached on some of the caching servers. change is not reverted

Okay, thanks! :) Yes, working for me as well now.

Change 789561 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] cache::text_haproxy: Add missing parsoid-rt-tests.wm.o to alternate domains

https://gerrit.wikimedia.org/r/789561

Change 789561 merged by Vgutierrez:

[operations/puppet@production] cache::text_haproxy: Add missing parsoid-rt-tests.wm.o to alternate domains

https://gerrit.wikimedia.org/r/789561