parsoid is trying to use relative addressing of images, even though wikitext doesn't.
Open, NormalPublic

Description

I've set up Parsoid with mw and are successfully parsing articles without images.
Pages with images causes Parsoid to fail with the message "Generation of the document file has failed. Status: Bundling process died with non zero code: 1"
Looking to the logs I find:

Jan 26 11:36:57 wikiServer mw-ocg-service: {"name":"mw-ocg-service","hostname":"wikiServer","pid":989,"level":50,"channel":"backend.bundler.bin","job":{"id":"57e5f36ffb0d4eae511d9cbd59ab8f0c155de1a4","writer":"rdf2latex"},"msg":"Error: Bad article title: ./File:20130705163106.gif\n at ParsoidResult._resolve (/opt/ocg/mw-ocg-bundler/lib/parsoid.js:52:18)\n at null.<anonymous> (/opt/ocg/mw-ocg-bundler/lib/parsoid.js:104:16)\n at Array.map (native)\n at ParsoidResult.getImages (/opt/ocg/mw-ocg-bundler/lib/parsoid.js:98:29)\n at siteinfo.fetch.then.then.then.then.then.then.then.then.item.title (/opt/ocg/mw-ocg-bundler/lib/index.js:191:12)\n at run (/opt/ocg/mw-ocg-bundler/node_modules/core-js/modules/es6.promise.js:89:39)\n at /opt/ocg/mw-ocg-bundler/node_modules/core-js/modules/es6.promise.js:100:28\n at process._tickCallback (node.js:448:13)","time":"2016-01-26T10:36:57.308Z","v":0}
Jan 26 11:36:57 wikiServer mw-ocg-service: {"name":"mw-ocg-service","hostname":"wikiServer","pid":989,"level":50,"channel":"backend.bundler.error","job":{"id":"57e5f36ffb0d4eae511d9cbd59ab8f0c155de1a4","writer":"rdf2latex"},"err":{"message":"Bundlingprocess died with non zero code: 1","name":"BundlerError","stack":"BundlerError: Bundling process died with non zero code: 1\n at fork.catch.then.console.info.channel (/opt/ocg/mw-ocg-service/lib/threads/backend.js:560:11)\n at run (/opt/ocg/mw-ocg-service/node_modules/core-js/modules/es6.promise.js:89:39)\n at /opt/ocg/mw-ocg-service/node_modules/core-js/modules/es6.promise.js:100:28\n at process._tickCallback (node.js:448:13)"},"msg":"Bundling process died with non zero code: 1","time":"2016-01-26T10:36:57.313Z","v":0}

In the article itself, the markup is "[[File:20130705163106.gif|alt=foss i akserselva|frame|akserselva]]"

Trying to open the same article title as parsoid is manually MW responds with "Title has relative path. Relative page titles (./, ../) are invalid, because they will often be unreachable when handled by user's browser. "

Jssl created this task.Jan 26 2016, 10:44 AM
Jssl updated the task description. (Show Details)
Jssl raised the priority of this task from to Needs Triage.
Jssl added a project: Parsoid.
Jssl added a subscriber: Jssl.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptJan 26 2016, 10:44 AM
Arlolra set Security to None.
cscott added a subscriber: cscott.Jan 26 2016, 8:05 PM

I need more help here from @Jssl. It sounds like your wiki configuration is broken. Can you provide a screenshot of where MW is responding with "Title has relative path"? It sounds like there's some PHP-side misconfiguration there.

In particular, [[File:20130705163106.gif|alt=foss i akserselva|frame|akserselva]] should be fine. That doesn't involve relative paths on the MW side, so I don't understand why MW is complaining about that.

Jssl added a comment.Jan 26 2016, 8:22 PM

well, sort of. It's mw-ocg-service that's doing the complaining (or rather, trying to fetch the relatively addressed file):


I tried changing the url to reflect what mw-ocg is trying, and that gives me the error about relative paths:

Does this help?

cscott added a comment.Feb 8 2016, 3:57 PM

It seems that what we need is the wikitext input which causes parsoid to output an image tag with a relative path. If that exists (and isn't just a misconfiguration), then that would be a parsoid bug.

In particular, the exception is being thrown in this code:

ParsoidResult.prototype._resolve = function(href) {
	var path = url.parse(url.resolve(this.getBaseHref(), href), false, true).
		pathname;
	// Now remove the articlepath.
	var m = this._resolveRE.exec(path);
	if (!m) { throw new Error('Bad article title: ' + href); }

So we just attempted to resolve the href against the base href. If we're still getting a relative path here, it seems like the base href is missing or (more likely) invalid. The getBaseHref function is:

ParsoidResult.prototype.getBaseHref = function() {
	var result = '';
	var base = this.document.querySelector('head > base[href]');
	if (base) {
		result = base.getAttribute('href').replace(/^\/\//, 'https://');
	}
	this.getBaseHref = function() { return result; };
	return result;
};

which just takes the <base href="...."> element from the <head> which should be emitted by parsoid. In parsoid, that href is emitted by:

	// Add base href pointing to the wiki root
	appendToHead(document, 'base', { href: env.conf.wiki.baseURI });

which in term comes from this processing of the siteinfo response from the wiki:

	if (resultConf.general) {
		if (general.mainpage) {
			this.mainpage = general.mainpage;
		}

		if (general.articlepath) {
			if (general.server) {
				this.baseURI = general.server + general.articlepath.replace(/\$1/, '');
			}

			this.articlePath = general.articlepath;
		}

So it seems like your mediawiki installation either doesn't have $wgServer defined, or has it defined to something invalid or relative. It's also possible that '$wgScriptPath` and/or $wgArticlePath are broken in some way.

cscott added a comment.Feb 8 2016, 4:34 PM

From IRC:
jsl4: cscott: when I do curl -L http://localhost:8000/alpha/v3/page/html/Main_Page , I get output like this: <img resource="./File:AlphaPro_11_5.PNG" src="//alpha.zenitel.com/images/thumb/e/eb/AlphaPro_11_5.PNG/100px-AlphaPro_11_5.PNG" data-file-width="1278" data-file-height="853" data-file-type="bitmap" height="67" width="100" data-parsoid='{"a":{"resource":"./File:AlphaPro_11_5.PNG","height":"67","width":"100"},"sa":{"resource":"Image:AlphaPro 11 5.PNG"}}'/>
(10:51:56 AM) thcipriani|afk is now known as thcipriani
(10:58:03 AM) cscott-free: jsl4: it seems like your $wgServer (or possibly $wgScriptPath or $wgArticlePath) are broken in your wiki's LocalSettings.php
(10:59:05 AM) cscott-free: jsl4: what do you get in the <base href="..."> section of the <head> when you curl -L http://localhost:8000/alpha/v3/page/html/Main_Page
(11:00:10 AM) cscott-free: jsl4: if it does turn out to be a misconfiguration, it might be worth trying to emit a better/more informative error message in that case, so i'm interested to learn the details in any case.
(11:00:20 AM) jsl4: ><base href="http://alpha.zenitel.com/index.php?title="/>
(11:02:11 AM) cscott-free: jsl4: ah, that should probably be just "http://alpha.zenitel.com/index.php". i'm pretty sure your article paths should be http://alpha.zenitel.com/index.php/PAGENAME
(11:02:16 AM) cscott-free: (not http://alpha.zenitel.com/index.php?title=PAGENAME)

Perhaps we can emit a better warning in Parsoid if the baseURI has a query part?

Jssl added a comment.Feb 9 2016, 8:38 AM

From checking the configuration, these are the current values used:

$wgScriptPath = "";
$wgServer = "http://alpha.zenitel.com";

$wgArticlePath isn't defined anywhere, as I didn't get as far as to setting up rewrite rules yet..

Note that these settings are (to the best of my recollection) as defined by the installed itself - can't remember changing them since installing.

Jssl added a comment.Feb 19 2016, 9:25 AM

are there any more details I can provide to help figure out why this isn't working?

Arlolra triaged this task as Normal priority.Apr 19 2016, 10:41 PM
Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 19 2016, 10:41 PM
Jssl added a comment.Apr 20 2016, 9:06 AM

Hi guys. I've tried redeploying ocg with parsoid on a standalone machine, using the instructions from https://www.mediawiki.org/wiki/Parsoid/Setup .
Rendering still fails with the same error messages - are there anything I can do to provide additional details or info about the setup? I'm more than happy to provide access to the box itself, or upload the complete configuration/dump of the system?

Change 284232 had a related patch set uploaded (by Cscott):
Resolve titles even when $wgArticlePath involves a query string.

https://gerrit.wikimedia.org/r/284232

Setting $wgUsePathInfo to true in your wiki configuration should fix the problem, if you can do that.

Otherwise, there's a patch for mw-ocg-bundler which should help: https://gerrit.wikimedia.org/r/284232

Change 284232 merged by jenkins-bot:
Resolve titles even when $wgArticlePath involves a query string.

https://gerrit.wikimedia.org/r/284232

As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.