Page MenuHomePhabricator

Parsoid should use protocol-relative URLs for media
Closed, ResolvedPublic

Description

Currently Parsoid/JS uses the URLs provided by ApiQueryImageInfo to generate media links, whether <img src> or <source src> or what-have you.

In ApiQueryImageInfo.php there are generated with the following code:

						$vals['thumburl'] = wfExpandUrl( $mto->getUrl(), PROTO_CURRENT );
[...]
				$vals['url'] = wfExpandUrl( $file->getFullUrl(), PROTO_CURRENT );
			}
			$vals['descriptionurl'] = wfExpandUrl( $file->getDescriptionUrl(), PROTO_CURRENT );

			$shortDescriptionUrl = $file->getDescriptionShortUrl();
			if ( $shortDescriptionUrl !== null ) {
				$vals['descriptionshorturl'] = wfExpandUrl( $shortDescriptionUrl, PROTO_CURRENT );

Note the use of wfExpandUrl( ..., PROTO_CURRENT). This makes the protocol match whatever protocol is being used to make the *api request* (ie, http or https). In Parsoid/JS we always use https:// to talk to the mediawiki production cluster, so these always come out as https://. In Parsoid/PHP we are currently debugging on a non-production machine w/ SSL termination, so they are always coming out as http://.

But actual links in the parsed HTML output of the legacy parser use protocol-relative form to match the protocol-relative configuration of $wgServer.

We should probably match this -- there's no reason to hard code a specific protocol into our HTML.

On the PHP side this is easy: remove the wgExpandUrl( ... , PROTO_CURRENT) in Parsoid/extension/src/Config/DataAccess.php so we use the same URLs as MediaTransformOutput::getURL() returns.

It's a little more complicated for the Parsoid/JS side, because we have to reverse-engineer an appropriate URL from the fully-expanded url that API gives us. It's probably easiest just to force Parsoid/JS to always emit protocol relative URLs (strip everything up to and including the first colon) -- and this strategy is probably appropriate for the standalone/api configuration of Parsoid/PHP as well.

Event Timeline

ssastry triaged this task as Medium priority.Oct 10 2019, 10:43 PM
ssastry edited projects, added Parsoid-PHP; removed Parsoid.

Change 549221 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] WIP: use protocol-relative URLs for media

https://gerrit.wikimedia.org/r/549221

Change 549221 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Mostly use protocol-relative URLs for media

https://gerrit.wikimedia.org/r/549221

Change 549632 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Followup: mostly use protocol-relative URLs for media

https://gerrit.wikimedia.org/r/549632

Change 549632 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Followup: mostly use protocol-relative URLs for media

https://gerrit.wikimedia.org/r/549632

Change 550547 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/TimedMediaHandler@master] Add url-format options to TimedMediaTransformOutput::getAPIData()

https://gerrit.wikimedia.org/r/550547

Change 550551 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Followup #2: mostly-use protocol-relative URLs for media

https://gerrit.wikimedia.org/r/550551

Change 550547 merged by jenkins-bot:
[mediawiki/extensions/TimedMediaHandler@master] Add url-format options to TimedMediaTransformOutput::getAPIData()

https://gerrit.wikimedia.org/r/550547

Change 550551 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Followup #2: mostly-use protocol-relative URLs for media

https://gerrit.wikimedia.org/r/550551