Page MenuHomePhabricator

Default lighttpd config created by `webservice` breaks serving files starting with the same string as the tool's name under `--canonical`
Closed, ResolvedPublic

Description

Discovered by @Magnus when porting the reasonator tool to --canonical.

The default lighttpd config contains a directive to strip the legacy tool name prefix from the URL path when looking for files on disk:

alias.url = ( "/{toolname}" => "{home}/public_html/" )

{toolname} and {home} here are template values which are replaced by the webservice script with values like 'reasonator' and '/data/project/reasonator'.

This alias directive is needed for the legacy URL scheme, and mostly harmless for the new canonical URL scheme except when the tool serves a file that starts with the tool's exact name.

In the case of reasonator, the URL https://reasonator.toolforge.org/reasonator_types.js ends up being translated into the file path '/data/project/reasonator/public_html/_types.js' rather than the expected '/data/project/reasonator/public_html/reasonator_types.js' when this directive is included in the lighttpd config.

Event Timeline

Magnus created this task.Jun 6 2020, 8:54 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 6 2020, 8:54 AM
Magnus triaged this task as High priority.Jun 6 2020, 8:54 AM
Magnus added a comment.Jun 6 2020, 8:57 AM

Other files load fine. Haven't checked all though

Magnus raised the priority of this task from High to Unbreak Now!.Jun 6 2020, 12:43 PM

Same with https://geohack.toolforge.org/geohack.php . Geohack is kinda central for Wikipedia so please help!

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptJun 6 2020, 12:43 PM

Does restarting the webservice without --canonical (i. e. go back to tools.wmflabs.org) unbreak the tool for now?

Magnus added a comment.Jun 6 2020, 3:10 PM

@LucasWerkmeister thanks that worked, for now

bd808 added a subscriber: bd808.Jun 6 2020, 3:47 PM

@Magnus was there any error.log output to help debug what went wrong? The 1.3G $HOME/error.log starting with binary data for reasonator is not making it easy for me to guess what happened. Geohack has a 22G $HOME/error.log that I'm not even going to try to find useful data in.

Magnus added a comment.Jun 7 2020, 3:34 PM

I cleared the error log on reasonator and switched to the previous URL schema, then back to the canonical one. No server errors in the log (some unrelated PhP ones, now cleared again)

Magnus added a comment.Jun 7 2020, 4:04 PM

I think this is a blocker to moving all tools to the canonical URL. T234617 ?

I don’t understand why this is an UBN when as far as I understand we know a workaround (restart without --canonical)?

bd808 added a comment.Jun 8 2020, 4:50 PM

I don’t understand why this is an UBN when as far as I understand we know a workaround (restart without --canonical)?

@LucasWerkmeister because the next phase of T234617: Toolforge. introduce new domain toolforge.org is forced migration to the *.toolforge.org naming scheme for all tools. @Magnus is doing the right thing here to signal that he is having migration problems with a widely used tool. We can debate UBN! vs High, but meh. Let's just figure out the bug and fix it.

bd808 claimed this task.Jun 8 2020, 5:15 PM

The bug is related to the filename starting with the tool's name. I turned on verbose logging and found:

2020-06-08 17:05:50: (response.c.430) Request-URI     :  /reasonator_types.js
2020-06-08 17:05:50: (response.c.431) URI-scheme      :  http
2020-06-08 17:05:50: (response.c.432) URI-authority   :  reasonator.toolforge.org
2020-06-08 17:05:50: (response.c.433) URI-path (raw)  :  /reasonator_types.js
2020-06-08 17:05:50: (response.c.434) URI-path (clean):  /reasonator_types.js
2020-06-08 17:05:50: (response.c.435) URI-query       :
2020-06-08 17:05:50: (mod_access.c.177) -- mod_access_uri_handler called
2020-06-08 17:05:50: (response.c.580) -- before doc_root
2020-06-08 17:05:50: (response.c.581) Doc-Root     : /data/project/reasonator/public_html
2020-06-08 17:05:50: (response.c.582) Rel-Path     : /reasonator_types.js
2020-06-08 17:05:50: (response.c.583) Path         :
2020-06-08 17:05:50: (response.c.625) -- after doc_root
2020-06-08 17:05:50: (response.c.626) Doc-Root     : /data/project/reasonator/public_html
2020-06-08 17:05:50: (response.c.627) Rel-Path     : /reasonator_types.js
2020-06-08 17:05:50: (response.c.628) Path         : /data/project/reasonator/public_html/reasonator_types.js
2020-06-08 17:05:50: (response.c.652) -- logical -> physical
2020-06-08 17:05:50: (response.c.653) Doc-Root     : /data/project/reasonator/public_html
2020-06-08 17:05:50: (response.c.654) Basedir      : /data/project/reasonator/public_html/
2020-06-08 17:05:50: (response.c.655) Rel-Path     : /reasonator_types.js
2020-06-08 17:05:50: (response.c.656) Path         : /data/project/reasonator/public_html/_types.js
2020-06-08 17:05:50: (response.c.668) -- handling physical path
2020-06-08 17:05:50: (response.c.669) Path         : /data/project/reasonator/public_html/_types.js
2020-06-08 17:05:50: (response.c.144) -- file not found
2020-06-08 17:05:50: (response.c.145) Path         : /data/project/reasonator/public_html/_types.js

The generate lighttpd config (found in /var/run/lighttpd/reasonator inside the running pod) contains:

alias.url = ( "/reasonator" => "/data/project/reasonator/public_html/")

This config is part of the default configuration generated by webservice. That means we need to fix it upstream of the usage by this particular tool.

As a horrible short term fix for reasonator, I did:

$ ln -s $HOME/public_html/reasonator_types.js $HOME/public_html/_types.js
bd808 renamed this task from Reasonator tool on canonical URL does not expose a JS file to Default lighttpd config created by `webservice` breaks serving files starting with the same string as the tool's name under `--canonical`.Jun 8 2020, 5:28 PM
bd808 updated the task description. (Show Details)
Magnus added a comment.Jun 8 2020, 6:55 PM

I don’t understand why this is an UBN when as far as I understand we know a workaround (restart without --canonical)?

Because the oauth I switched to requires the canonical URL. So it's either broken completely (canonical URL), or view but no edit (old URL).

Magnus added a comment.Jun 8 2020, 6:57 PM

This config is part of the default configuration generated by webservice. That means we need to fix it upstream of the usage by this particular tool.

OK thanks

As a horrible short term fix for reasonator, I did:

$ ln -s $HOME/public_html/reasonator_types.js $HOME/public_html/_types.js

Ouch 😂

I'll try to change it in the tool as well so I can proceed now-ish.

I don’t understand why this is an UBN when as far as I understand we know a workaround (restart without --canonical)?

Because the oauth I switched to requires the canonical URL. So it's either broken completely (canonical URL), or view but no edit (old URL).

I see, thanks.

Change 603668 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/software/tools-webservice@master] Disable tool name alias in lighttpd config with --canonical

https://gerrit.wikimedia.org/r/603668

Change 603668 merged by jenkins-bot:
[operations/software/tools-webservice@master] Disable tool name alias in lighttpd config with --canonical

https://gerrit.wikimedia.org/r/603668

Mentioned in SAL (#wikimedia-cloud) [2020-06-15T18:14:25Z] <bd808> Rebuilding all Docker images to pick up webservice 0.71 (T254640, T253412)

bd808 added a comment.Jun 16 2020, 7:45 PM

My patch worked in a limited test environment, but does not seem to do anything in the live cluster. :/ I will try to debug.

bd808 added a comment.Jun 16 2020, 9:54 PM

My patch worked in a limited test environment, but does not seem to do anything in the live cluster. :/ I will try to debug.

Fun times! I figured out that the current webservice 0.71 build does what I wanted it to do when using --backend=gridengine, but not with --backend=kubernetes. I was piggy backing this change on some prior work for the --canonical flag, but did not properly trace the code paths. I now see that our toolsws.backends.GridEngineBackend class tells webservice-runner about --canonical being set, but toolsws.backends.KubernetesBackend does not. This made sense for the prior work as KubernetesBackend handles the ingress rule setup in a different place (the generated Kubernetes Ingress objects) than GridEngineBackend (the webservice-runner call which in turn talks to the dynamicproxy instance which routes traffic to the correct host and port on the job grid). This was missed in my testing as the thing that I tested was the lighttpd config generated by the expected call to webservice-runner rather than a full end-to-end test of webservice on the different backends.

Change 606023 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/software/tools-webservice@master] Pass --canonical to webservice-runner inside k8s pod

https://gerrit.wikimedia.org/r/606023

Change 606023 merged by jenkins-bot:
[operations/software/tools-webservice@master] Pass --canonical to webservice-runner inside k8s pod

https://gerrit.wikimedia.org/r/606023

bd808 closed this task as Resolved.Jun 17 2020, 6:31 PM

The second patch has fixed this for the Kubernetes backend as well.

$ kubectl exec -it $(kubectl get pods --output=jsonpath={.items..metadata.name}) -- /bin/bash
$ grep alias /var/run/lighttpd/bd808-test
  "mod_alias",
#alias.url = ( "/bd808-test" => "/data/project/bd808-test/public_html/" )

@Magnus A restart of any tool you have with problem should fix it. Note that the fix only works when --canonical is used as well at this point, so you could still see the problem with tools where you are just testing the $TOOL.toolforge.org routing and not forcing it to be used exclusively.

NicoV added a subscriber: NicoV.Jul 8 2020, 10:17 AM

Hello. Is it the same problem for my tool WPCleaner, T257384 ?
The redirect removes the top directory in the path, breaking all the tools for each user as the tool can't update itself.