Page MenuHomePhabricator

Parsoid should use protocol-relative URLs for media
Closed, ResolvedPublic

Description

Currently Parsoid/JS uses the URLs provided by ApiQueryImageInfo to generate media links, whether <img src> or <source src> or what-have you.

In ApiQueryImageInfo.php there are generated with the following code:

						$vals['thumburl'] = wfExpandUrl( $mto->getUrl(), PROTO_CURRENT );
[...]
				$vals['url'] = wfExpandUrl( $file->getFullUrl(), PROTO_CURRENT );
			}
			$vals['descriptionurl'] = wfExpandUrl( $file->getDescriptionUrl(), PROTO_CURRENT );

			$shortDescriptionUrl = $file->getDescriptionShortUrl();
			if ( $shortDescriptionUrl !== null ) {
				$vals['descriptionshorturl'] = wfExpandUrl( $shortDescriptionUrl, PROTO_CURRENT );

Note the use of wfExpandUrl( ..., PROTO_CURRENT). This makes the protocol match whatever protocol is being used to make the *api request* (ie, http or https). In Parsoid/JS we always use https:// to talk to the mediawiki production cluster, so these always come out as https://. In Parsoid/PHP we are currently debugging on a non-production machine w/ SSL termination, so they are always coming out as http://.

But actual links in the parsed HTML output of the legacy parser use protocol-relative form to match the protocol-relative configuration of $wgServer.

We should probably match this -- there's no reason to hard code a specific protocol into our HTML.

On the PHP side this is easy: remove the wgExpandUrl( ... , PROTO_CURRENT) in Parsoid/extension/src/Config/DataAccess.php so we use the same URLs as MediaTransformOutput::getURL() returns.

It's a little more complicated for the Parsoid/JS side, because we have to reverse-engineer an appropriate URL from the fully-expanded url that API gives us. It's probably easiest just to force Parsoid/JS to always emit protocol relative URLs (strip everything up to and including the first colon) -- and this strategy is probably appropriate for the standalone/api configuration of Parsoid/PHP as well.

Details

Related Gerrit Patches:
mediawiki/services/parsoid : masterFollowup #2: mostly-use protocol-relative URLs for media
mediawiki/extensions/TimedMediaHandler : masterAdd url-format options to TimedMediaTransformOutput::getAPIData()
mediawiki/services/parsoid : masterFollowup: mostly use protocol-relative URLs for media
mediawiki/services/parsoid : masterMostly use protocol-relative URLs for media

Event Timeline

cscott created this task.Oct 10 2019, 8:38 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 10 2019, 8:38 PM
ssastry triaged this task as Medium priority.Oct 10 2019, 10:43 PM
ssastry edited projects, added Parsoid-PHP; removed Parsoid.
ssastry raised the priority of this task from Medium to High.Nov 5 2019, 4:53 PM
cscott claimed this task.Nov 6 2019, 5:13 PM

Change 549221 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] WIP: use protocol-relative URLs for media

https://gerrit.wikimedia.org/r/549221

Change 549221 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Mostly use protocol-relative URLs for media

https://gerrit.wikimedia.org/r/549221

Change 549632 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Followup: mostly use protocol-relative URLs for media

https://gerrit.wikimedia.org/r/549632

Change 549632 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Followup: mostly use protocol-relative URLs for media

https://gerrit.wikimedia.org/r/549632

Change 550547 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/TimedMediaHandler@master] Add url-format options to TimedMediaTransformOutput::getAPIData()

https://gerrit.wikimedia.org/r/550547

Change 550551 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Followup #2: mostly-use protocol-relative URLs for media

https://gerrit.wikimedia.org/r/550551

Change 550547 merged by jenkins-bot:
[mediawiki/extensions/TimedMediaHandler@master] Add url-format options to TimedMediaTransformOutput::getAPIData()

https://gerrit.wikimedia.org/r/550547

Change 550551 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Followup #2: mostly-use protocol-relative URLs for media

https://gerrit.wikimedia.org/r/550551

cscott closed this task as Resolved.Thu, Nov 14, 5:31 PM