Page MenuHomePhabricator

pywikibot support for https-only
Closed, ResolvedPublic

Description

Wikimedia sites are apparently going https only, which means that any code using http:// may break.
The announcement is very light on details.

https://meta.wikimedia.org/wiki/HTTPS#2015

Both core and compat support using https, however they both have instances where a URL is created manually (not via Family methods), and that URL has a http:// protocol/schema.

I suspect that the servers will still forward http:// URLs to https:// , but that needs to be confirmed.
In any case, we should fix any instances where the code is using the Family methods to create a URL.

Here is one regex to see some examples of the problem

git grep '^[^#@]*http:\/\/.*wiki.*\.org'

Event Timeline

jayvdb raised the priority of this task from to Needs Triage.
jayvdb updated the task description. (Show Details)
jayvdb subscribed.

It is good to be careful but I am sure the WMF will keep the servers listening on HTTP and serve 301 redirects to HTTPS.

Yes, we'll be serving HTTP 301's on port 80 indefinitely into the future as far as I'm aware. It's still better to avoid using the redirect and go directly HTTPS when possible, though.

(also, not in the regex at the top, we have one primary production domain that doesn't match: wiktionary.org)

Okay looking through the Pywikibot repository (excluding the tests):

  • There are three instances where links are used as documentation so they won't cause the bot to fail (patrol script, pywikibot.interwiki_graph, pywikibot.family)
  • One instance is in the example result when using rcstream so at most the test might fail (need to look into that more closely)
  • One mismatch in pywikibot.page.FilePage (for wikitravel which is afaik nicht managed by WMF and even though fixing T74847 should help reduce the necessity for that)
  • The only large problem would be the Wikidata family which uses a bunch of hardcoded links. To be honest looking at that I think there must be a more dynamic implementation or not such a verbose one (e.g. instead of the full URL maybe just the name of the Wikidata entry).

The tests are not as easy to answer because a request via HTTP might be intentional and most matches are from the Q60.wd file for the tests so we might want to download a newer version of it or something.

Regarding hard coded http values for Wikidata, in code and in tests, see T102741.

Nemo_bis renamed this task from https only support to pywikibot support for https-only.Jun 17 2015, 6:53 AM
Nemo_bis set Security to None.

Is pywikibot still using http:// URLs for WMF sites? The real problem we've run into with bot code in general is that a lot of bots do direct POST requests to their configured http URLs (as opposed to perhaps a GET to the configured URL, then seeing a 301 to https, then using https from there out for further POST requests). POST traffic cannot effectively be redirected and will eventually just get broken if it's not https...

Well if you look in my comment (T102315#1371485) you'll see that for most purposes we are not doing HTTP requests but HTTPS requests.

Andrew subscribed.

As I understand it, this ticket is a request for updates to the pywikibot code. I'm going to remove the Operations tag; please re-add with a clear request if you need something from Ops.

Change 388428 had a related patch set uploaded (by Framawiki; owner: Framawiki):
[pywikibot/core@master] Change last links from http to https

https://gerrit.wikimedia.org/r/388428

Change 388428 merged by jenkins-bot:
[pywikibot/core@master] Change last http links to https

https://gerrit.wikimedia.org/r/388428

Framawiki claimed this task.
Framawiki subscribed.

All http links in code have been changed in https. Those that remain are in the comments.

I'm not sure those last wikidata links should have been changed, per T153563. See note in https://gerrit.wikimedia.org/r/390872

Change 390872 had a related patch set uploaded (by Dalba; owner: Dalba):
[pywikibot/core@master] Use http scheme for Wikidata entity URIs

https://gerrit.wikimedia.org/r/390872

Change 390872 merged by jenkins-bot:
[pywikibot/core@master] Use http scheme for Wikidata entity URIs

https://gerrit.wikimedia.org/r/390872