Page MenuHomePhabricator

citoid/VisualEditor silently changes source from HTTPS to HTTP
Open, Needs TriagePublicBUG REPORT

Description

citoid/VisualEditor silently changes source from HTTPS to HTTP

  • hit edit (in this case it was a section link not full page)
  • click an existing ref
  • click "convert" button
  • click "insert" button
  • publish

https://en.wikipedia.org/wiki/special:diff/1222072576

Android 14 Chrome Beta 125.0.6422.14

Event Timeline

This will be an upstream issue with our data source for Citoid, as it noramlises URLs in the API response, as seen here: https://en.wikipedia.org/api/rest_v1/data/citation/mediawiki/https%3A%2F%2Fwww.thecity.nyc%2F2024%2F05%2F02%2Fnypd-officer-fired-gun-columbia-hamilton-hall-raid%2F?action=query&format=json

Someone has probably set the canonical URL for that site as http:// instead of https://.

On that website there's <meta property="og:url" content="http://www.thecity.nyc/2024/05/02/nypd-officer-fired-gun-columbia-hamilton-hall-raid/" /> in the source, which I'd assume is what's being picked up on.

(There's also <link rel="canonical" href="https://www.thecity.nyc/2024/05/02/nypd-officer-fired-gun-columbia-hamilton-hall-raid/" /> being ignored, so I'd imagine there's some prioritization of canonicalization going on.)

I see the same markup @DLynch described. regardless there's no situation where citation generation by VisualEditor should change the URL without telling the user it's changed.

There's also other signals to consider:

we could also just hardcode a preference that we shouldn't downgrade to HTTP if the original was HTTPS no matter what any other signal says. (at least if the rest of the URL is unchanged)

$ curl -A 'Mozilla/5.0 (X11; CrOS x86_64 14541.0.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36' -vs http://www.thecity.nyc/2024/05/02/nypd-officer-fired-gun-columbia-hamilton-hall-raid/ 2>&1 | egrep -e '^< Locat' -e '^< HTTP/1'
< HTTP/1.1 301 Moved Permanently
< Location: https://www.thecity.nyc/2024/05/02/nypd-officer-fired-gun-columbia-hamilton-hall-raid/

regardless there's no situation where citation generation by VisualEditor should change the URL without telling the user it's changed.

This actually happens quite regularly, converting to a canonical URL, or un-shortening a link shortener URL. I think most of the time people are not bothered by this (it's never been raised as an issue before), but we should try to prevent simple https -> http downgrades as you say.

I'm not completely for or against substituting in the canonical URL in general. Besides the aforementioned additional signals like is the new URL a 200, I must restate that:

I object to any sort of substitution without notifying the user that a change has been made. e.g. could be shown after clicking convert, before insert.

you say no one's complained. I wonder how many people actually noticed this was happening?