Page MenuHomePhabricator

Add a tool to proxy API requests to WhoColor API
Closed, ResolvedPublic5 Estimated Story Points

Description

For WhoWroteThat extension and gadget, we want to provide an extra layer of privacy where we do not share the IPs and user agents outside Wikimedia servers.

Acceptance criteria:

  • Register WikiWho API account
  • Create a Toolforge/VPS tool that proxies requests to WhoColor API, authenticating using basic access authentication
  • Ask to WikiWho maintainers for more quota
  • Change WhoWroteThat code to use the new proxy

Event Timeline

ifried set the point value for this task to 5.Sep 3 2019, 11:31 PM
ifried moved this task from To Be Estimated/Discussed to Estimated on the Community-Tech board.

PR for the proxy script: https://github.com/wikimedia/WhoWroteThat/pull/48/files. I will change WWT to use the proxy in a separate PR.

Credentials are in ~/wikiwho.ini on the wikiwho Toolforge account. You all have access if you become community-tech-tools then become wikiwho. The proxy is up and running now, but note you must make requests from Wikipedia. Example endpoint (throws 403 if you attempt to access directly): https://tools.wmflabs.org/wikiwho/en/whocolor/v1.0.0-beta/Hanksy/

Nothing to QA yet. Still need to do steps 3 and 4

I'm running into the same issue we had with SVG Translate where responses are getting truncated; see T217815.

There was an issue with very large responses (e.g. for [[Barack Obama]]). This was fixed by using cURL. PR at https://github.com/wikimedia/WhoWroteThat/pull/97

The proxy is now set up on both Toolforge and VPS. I did the latter only because I thought Toolforge was the reason why large responses weren't working, but https://github.com/wikimedia/WhoWroteThat/pull/97 fixed that.

Now the question is, which do we use? On average, it seems Toolforge and VPS take around the same amount of time to give a response. Unfortunately both are slower than hitting api.wikiwho.net directly, but only by a second or two.

The VPS instance is now doing everything entirely through the Apache config. I'm not 100% certain the basic auth was set correctly, and there's no way to test this without attempting to hit the API limits (which are high to begin with). Here's what I have, some review would be appreciated:

<VirtualHost *:80>
        ServerName wikiwho.wmflabs.org
        ServerAdmin tools.wikiwho@tools.wmflabs.org

        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined

        <Directory />
                AuthUserFile /etc/apache2/htpasswd
                AuthName "Identify"
                AuthType Basic
                require valid-user
        </Directory>

        RequestHeader set Authorization "Basic [hash]"

        SSLProxyEngine on
        ProxyPass / https://api.wikiwho.net/
        ProxyPassReverse / https://api.wikiwho.net/

        RewriteEngine On

        RewriteCond %{HTTP:X-Forwarded-Proto} !https
        RewriteRule ^/?(.*) https://%{SERVER_NAME}/$1 [R=301,L]
</VirtualHost>

where [hash] is the output of echo -n "username:password" | base64, and /etc/apache2/htpasswd was created with htpasswd -c /out/of/web/space/htpasswd username.

I also don't yet have the VPS proxy set to only accept requests from wikipedia.org, but I think this is doable.

PR: https://github.com/wikimedia/WhoWroteThat/pull/114

Instructions for setting up a new instance are at https://wikitech.wikimedia.org/wiki/Tool:CommTech#WikiWho. This includes the updated Apache config. Some review on that would still be appreciated!

MusikAnimal updated the task description. (Show Details)

Resolving because there's nothing else to test or review.