Page MenuHomePhabricator

WPCleaner not updating due to lack of HTTP redirect handling in getdown library
Closed, ResolvedPublic

Description

Due to T234617 (Toolforge migration), WPCleaner doesn't manage to update itself when run with getdown. See my post.

See T257384 description for a way to bypass this issue.

Event Timeline

Problem with 308: Permanent Redirect.

With the migration, I see that my tool, WPCleaner , doesn't manage to update itself on user's installation due to the Permanent Redirect. T257495
I'm using getdown tool to have an automatic update when a user launches the tool: I submitted an issue there.

But I'm surprised to see that the tool replaced the content of some files with HTML code telling there's a Permanent Redirect.

<html>
<head><title>308 Permanent Redirect</title></head>
<body bgcolor="white">
<center><h1>308 Permanent Redirect</h1></center>
<hr><center>nginx/1.14.2</center>
</body>
</html>

Is this normal to have the HTML code, and not be redirected directly?
Is there any way to solve this kind of problem?

That is the redirect page that is delivered by the legacy redirecting proxy. It will also be served with a Location: ... header that the requesting user-agent (browser) can choose to follow (like Firefox would) or not (curl without -L).

My guess is that the client you are using to see this is also not understanding the 308 status code. RFC 7238 was published in 2014, but I would not be surprised to hear that there are HTTP client libraries which have not been updated to support it.

@bd808
Ok, for the explanation. Is there a possibility to change at least temporarily the way the redirect is done (not a 308)?

In my case, even if it's a bug in getdown, the tool I use for automatic updates of WPCleaner, and they fixed it, it won't help me: I will still need to warn all the users to do a manual action to make the automatic update functional again (either retrieving the last version of getdown if it's fixed, or modifying one text file containing the base URL for WPCleaner). Do you have any suggestion to avoid this painful process of finding all users and asking them to do something manual?

@NicoV Have you been able to do any testing to see if getdown handles redirects at all? If not then there probably isn't much we can do at the system level at this point.

If it does deal with a 301 redirect, and you can figure out the User Agent string sent by getdown, we could try to adjust the nginx configuration in the redirector to use the legacy 301 redirect code rather than the 308 code when visited by that user-agent.

Changing the response code for all URLs handled by the redirector is not ideal. The earlier HTTP specifications left the proper handling of legacy 301 and 302 redirect codes received in response to a POST request unspecified. In practice this means that serving a 301 in response to a POST can not be trusted to trigger a new POST to the alternate URL provided in the Location header. The introduction of the 308 status code resolved this ambiguity in the specification.

@bd808
I've got no answer from getdown developers, so I went browsing their code: I highly doubt that getdown is handling any kind of HTTP redirect, so using a 301 instead of a 308 won't achieve anything. They don't seem to use any HTTP aware library, just basic java.net classes.

I think there's another solution on your side: rather than a classic redirect (the URL change is visible by the user), could you proxy some of the files on your end so that it's transparent for the client? (like P flag for Apache, if I understand correctly the concept).
There would only be a few files to proxy I think, because once getdown has retrieved its configuration files, it will have the new URL (in the getdown.txt file) and it will use it for getting the big files. And it can be temporary, because once a client is updated it will have the configuration file pointing to the new domain.

If you're ok with that, can we try with the following files (wpcleaner-test is a beta version of WPCleaner, so only used by me and maybe a few others) so I can test on my end:

Thanks

@bd808
I've got no answer from getdown developers, so I went browsing their code: I highly doubt that getdown is handling any kind of HTTP redirect, so using a 301 instead of a 308 won't achieve anything. They don't seem to use any HTTP aware library, just basic java.net classes.

That is unfortunate.

I think there's another solution on your side: rather than a classic redirect (the URL change is visible by the user), could you proxy some of the files on your end so that it's transparent for the client? (like P flag for Apache, if I understand correctly the concept).
There would only be a few files to proxy I think, because once getdown has retrieved its configuration files, it will have the new URL (in the getdown.txt file) and it will use it for getting the big files. And it can be temporary, because once a client is updated it will have the configuration file pointing to the new domain.

If we did this, how would we know when we can remove the "temporary" setup?

I have looked at the redirect service configuration and have an idea how we could shim in reverse proxy handling for a single tool. The service was not designed to serve as a reverse proxy however, so I would not want to do this for an extended period of time. We already had an 85 day period for folks to test and ask for accommodations for their tools. I believe that testing with --canonical and your wpcleaner-test tool would have revealed this same problem in that time period. I don't say that to shame you, but I do not want to create an indefinitely long lived hack to support a shortcoming in a single tool.

To clarify something here. Everyone on the older version is as of right now facing the problem in T257384 correct? Only those of us who have manually intervened are able to run it?

To clarify something here. Everyone on the older version is as of right now facing the problem in T257384 correct? Only those of us who have manually intervened are able to run it?

Not exactly, no. We fixed the incorrect redirect target problem that was documented in T257384: Redirect for digest.txt incorrect. That then led to finding this bug in the getdown library which makes it fail badly on any redirect. For the end user, the result is likely the same (or maybe worse?), but the cause is different.

What I mean is, are all existing installs that have not been manually changed failing to launch, or is it only a subset of those installs?

@bd808
The temporary setup would be for a few weeks (ideally in my mind a bit after the holidays), so that most users will use WPCleaner at least once. When they use it once, it will retrieve the new configuration and will be fixed.

For the 85 day period, the problem is that I wasn't aware of this migration: I don't know if I missed the notification, or didn't receive it, but I didn't know this migration was going on until a very recent post from @Bamyers99. In the short time I had, I did some modifications, but didn't have enough time (a lot of work at the office before the holidays) to test that it would work once the migration was completed.

@Jerodlycett
The problem described in T257384 is related but different : the incorrect redirect target prevented WPCleaner to even start giving an error message to users. Now that this is fixed, WPCleaner starts without any visible problem (but the update process silently fails in the background). So, the current situation is that users can use WPCleaner as if everything was normal, but an outdated version of it and they don't know it's an outdated version... The problem in this situation is to reach to the users to tell them to upgrade, or to find a way to restore the automatic upgrade (at least for a majority of the users)

@bd808
The temporary setup would be for a few weeks (ideally in my mind a bit after the holidays)

By "holidays" do you mean Northern hemisphere summer (June-August) or something else? Can you propose an explicit end date?

bd808 renamed this task from WPCleaner not updating after Toolforge migration to WPCleaner not updating due to lack of HTTP redirect handling in getdown library.Jul 12 2020, 4:51 PM

@bd808
I'm indeed not subscribed to cloud-announce, I will register for it then...

Yes, I meant Northen hemisphere holidays. I propose around mid-September, I hope most users will have run WPCleaner at least once.

@bd808 Any news on the option for the temporary setup? Do you think it can be implemented?
Or do I start tracking users who haven't updated manually WPCleaner (I left the messages on English and French wikipedia to alert users and explain how to update manually)

Change 615872 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: Temp handling for tools.wmflabs.org/wpcleaner

https://gerrit.wikimedia.org/r/615872

Custom ingress needed in the tool's namespace to go with https://gerrit.wikimedia.org/r/615872:

$HOME/T257495-ingress.yaml
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: wpcleaner-t257495
  namespace: tool-wpcleaner
  labels:
    name: wpcleaner-t257495
  annotations:
    nginx.ingress.kubernetes.io/configuration-snippet: |
      rewrite ^(/wpcleaner)$ $1/ redirect;
    nginx.ingress.kubernetes.io/rewrite-target: /$2
spec:
  rules:
    - host: tools.wmflabs.org
      http:
        paths:
          - backend:
              serviceName: wpcleaner
              servicePort: 8000
            path: /wpcleaner(/|$)(.*)

I have created this ingress object via kubectl apply --validate=true -f T257495-ingress.yaml. With the ingress created, curl can be used from inside Toolforge to test the expected behavior of the Puppet config change:

$ curl -H 'Host: tools.wmflabs.org' http://k8s.tools.eqiad1.wikimedia.cloud:30000/wpcleaner/wpcleaner/getdown.txt
# Configuration file for running WPCleaner through getdown

# General configuration
appbase = https://wpcleaner.toolforge.org/wpcleaner/
allow_offline = true

# Configure the update UI
ui.name = WPCleaner
ui.icon = commons-nuvola-web-broom-64px.png
ui.icon = commons-nuvola-web-broom.png
# ui.error_background =
# ui.progress_bar =
# ui.progress_text =
# ui.status =
# ui.status_text =
ui.install_error = https://wpcleaner.toolforge.org/
ui.hide_decorations = true

# Code
code = WPCleaner.jar
code = libs/commons-codec.jar
code = libs/commons-compress.jar
code = libs/commons-httpclient.jar
code = libs/commons-lang3.jar
code = libs/commons-logging.jar
code = libs/gettext-commons.jar
code = libs/jackson-annotations.jar
code = libs/jackson-core.jar
code = libs/jackson-databind.jar
code = libs/jaxen.jar
code = libs/jdom.jar
code = libs/logback-classic.jar
code = libs/logback-core.jar
code = libs/slf4j-api.jar
code = libs/xercesImpl.jar
code = libs/xml-apis.jar
code = logback.xml

# Resources
resource = WPCleaner.png
resource = commons-nuvola-web-broom-64px.png
resource = commons-nuvola-web-broom.png
resource = WPCleaner.ico
resource = Bot.bat
resource = WPCleaner.bat
resource = Bot.sh
resource = WPCleaner.sh
resource = libs/LICENSE_commons-codec.txt
resource = libs/NOTICE_commons-codec.txt
resource = libs/LICENSE_commons-compress.txt
resource = libs/NOTICE_commons-compress.txt
resource = libs/LICENSE_commons-httpclient.txt
resource = libs/NOTICE_commons-httpclient.txt
resource = libs/LICENSE_commons-io.txt
resource = libs/NOTICE_commons-io.txt
resource = libs/LICENSE_commons-lang3.txt
resource = libs/NOTICE_commons-lang3.txt
resource = libs/LICENSE_commons-logging.txt
resource = libs/NOTICE_commons-logging.txt
resource = libs/LICENSE_gettext-commons.txt
resource = libs/LICENSE_jackson.txt
resource = libs/LICENSE_jdom.txt
resource = libs/LICENSE_jaxen.txt
resource = libs/LICENSE_xerces.txt
resource = libs/NOTICE_xerces.txt
resource = libs/getdown-launcher-1.8.6.jar
resource = tasks/cswiki/_Weekly.txt
resource = tasks/enwiki/_Common.txt
resource = tasks/enwiki/_Weekly.txt
resource = tasks/enwiki/ISBN_ISSN.txt
resource = tasks/enwiki/ListCheckWiki.txt
resource = tasks/enwiki/ListCheckWiki_After.txt
resource = tasks/enwiki/ListCheckWiki_Before.txt
resource = tasks/enwiki/MarkCheckWiki.txt
resource = tasks/frwiki/_Common.txt
resource = tasks/frwiki/_Common_Other.txt
resource = tasks/frwiki/_Common_Talk.txt
resource = tasks/frwiki/_Daily.txt
resource = tasks/frwiki/_Weekly.txt
resource = tasks/frwiki/FixDump.txt
resource = tasks/frwiki/ISBN_ISSN.txt
resource = tasks/frwiki/ListCheckWiki.txt
resource = tasks/frwiki/ListCheckWiki_After.txt
resource = tasks/frwiki/ListCheckWiki_Before.txt
resource = tasks/frwiki/MarkCheckWiki.txt
resource = tasks/frwiki/UpdateCheckWiki.txt
resource = tasks/meta/_Common.txt
resource = tasks/meta/ListCheckWiki.txt

# Requirements on Java
java_min_version = 1070000
# java_location = [windows] /java_vm/java_windows.jar
# java_location = [linux] /java_vm/java_linux.jar

# Parameters passed to the JVM
jvmarg = -Xmx1024M
optimum_jvmargs = -Xmx4096M
jvmarg = -Dlogback.configurationFile=logback.xml

# The main Java class
class = org.wikipediacleaner.WikipediaCleaner

# Lets us pass "client" as the app id and use the rest of the command line as app arguments
client.class = org.wikipediacleaner.WikipediaCleaner

# Lets us pass "bot" as the app id and use the rest of the command line as app arguments
bot.class = org.wikipediacleaner.Bot

# We don't pass any argument
# apparg =

@bd808 Any news on the option for the temporary setup? Do you think it can be implemented?

I think I have something figured out with the patch to the legacy redirector and the custom ingress object described above. It needs a bit more testing before we deploy, but I think it is close.

Change 615872 merged by Andrew Bogott:
[operations/puppet@production] toolforge: Temp handling for tools.wmflabs.org/wpcleaner

https://gerrit.wikimedia.org/r/615872

@bd808 Any news on the option for the temporary setup? Do you think it can be implemented?

I think I have something figured out with the patch to the legacy redirector and the custom ingress object described above. It needs a bit more testing before we deploy, but I think it is close.

Changes are deployed. URLs under https://tools.wmflabs.org/wpcleaner/ are now proxied rather than redirected. I will make a cleanup task to remove that functionality on or after 2020-09-15.

Thanks a lot @bd808 !

I tested with an old installation of WPCleaner, and it managed to update itself and is now pointing to wpcleaner.toolforge.org !

I will monitor WPCleaner's version on English wikipedia and French wikipedia to check if the upgrade works for other users.