Page MenuHomePhabricator

QueryPath error in Html2Wiki
Closed, ResolvedPublic

Description

MediaWiki: 1.24.2

After the import cc.zip (https://www.mediawiki.org/wiki/Extension:Html2Wiki#Import_a_blog_post_.2F_webpage_complete ) :

PHP Catchable fatal error:  Argument 1 passed to DOMXPath::__construct() must be an instance of DOMDocument, null given, called in /var/lib/mediawiki/extensions/Html2Wiki/vendor/querypath/querypath/src/QueryPath/CSS/DOMTraverser.php on line 406 and defined in /var/lib/mediawiki/extensions/Html2Wiki/vendor/querypath/querypath/src/QueryPath/CSS/DOMTraverser.php on line 440,

Event Timeline

Axone_Su raised the priority of this task from to High.
Axone_Su updated the task description. (Show Details)
Axone_Su added a subscriber: Axone_Su.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 21 2015, 10:35 PM
Axone_Su set Security to None.

Can you provide more details on your installation? OS, Webserver? Have you had any successes?

I just installed the extension on a clean Ubuntu with Apache and was successful with the example. https://freephile.org/wiki/43316

Admittedly, my results show the need to be able to specify 'recipes' for selecting / deselecting desirable content because I doubt anybody wants to capture the navigation content of an HTML page. This is a feature that I hope to work on shortly.

freephile renamed this task from Does not import html page with Html2Wiki to QueryPath error in Html2Wiki.Apr 28 2015, 5:57 PM
freephile changed the task status from Open to Stalled.Apr 28 2015, 6:26 PM

If you 'git pull', I just committed an update wraps the file processing with a try/catch block that should provide more error detail.

Hello!
I have the same problem: when I try to import zip file with html I see error

PHP message: PHP Catchable fatal error: Argument 1 passed to DOMXPath::__construct() must be an instance of DOMDocument, null given, called in ...mediawiki/extensions/Html2Wiki/vendor/querypath/querypath/src/QueryPath/CSS/DOMTraverser.php on line 406 and defined in ...mediawiki/extensions/Html2Wiki/vendor/querypath/querypath/src/QueryPath/CSS/DOMTraverser.php on line 440.

OS: Debian 7.8
Nginx 1.8.0, PHP 5.5.25-1, MediaWiki 1.25.1, Html2Wiki 2015.02 (6632bef) 20:53, 21 of June 2015, pandoc 1.9.4.2-2.

Part of LocalSettings.php:
...
wfLoadExtension( 'SyntaxHighlight_GeSHi' );
$wgNamespacesWithSubpages[NS_MAIN] = true;
require_once( "$IP/extensions/Html2Wiki/Html2Wiki.php" );
...

Ran this with the latest from your repo.

Two calls, the first is with your old code, the second is with the new code:

Old and busted:
[Fri Aug 07 15:56:29.414352 2015] [:error] [pid 30850] [client 10.2.100.50:52751] PHP Catchable fatal error: Argument 1 passed to DOMXPath::__construct() must be an instance of DOMDocument, null given, called in /var/lib/mediawiki/extensions/Html2Wiki/vendor/querypath/querypath/src/QueryPath/CSS/DOMTraverser.php on line 406 and defined in /var/lib/mediawiki/extensions/Html2Wiki/vendor/querypath/querypath/src/QueryPath/CSS/DOMTraverser.php on line 440, referer: http://ibss-info/mediawiki/index.php/Special:Html2Wiki

New hotness:
[Fri Aug 07 16:00:26.984102 2015] [:error] [pid 30848] [client 10.2.100.50:53064] PHP Fatal error: Call to undefined function htmlqp() in /var/lib/mediawiki/extensions/Html2Wiki/specials/SpecialHtml2Wiki.php on line 1439, referer: http://ibss-info/mediawiki/index.php/Special:Html2Wiki

versions:

Product Version
MediaWiki 1.25.1
PHP 5.5.9-1ubuntu4.11 (apache2handler)
MySQL 5.5.44-0ubuntu0.14.04.1

Foozleface, the htmlqp() function is defined by the 'querypath' project. This is an external dependency managed by Composer. Did you run "composer install" during the installation (described at https://www.mediawiki.org/wiki/Extension:Html2Wiki#Installation)? (It doesn't hurt to run composer install again)

If you did, you should have a directory structure like so:
[MEDIAWIKI_DIR]/extensions/Html2Wiki/vendor/querypath/querypath

(where MEDIAWIKI_DIR means the full filesystem path to wherever you installed MediaWiki.)

Check to make sure that these new directories are readable by your web server. They might not be if you ran the composer command as root.

To correct permissions, you can run a command like the following:

cd MEDIAWIKI_DIR/extensions/Html2Wiki
# list files that are not readable by "other" users
sudo find . ! -perm -o=r -ls
# change those files to be readable by "other" users
sudo chmod -R o+r ./

If the permissions seem to be correct, does your Apache error log tell you anything more about this error?

Okay, better news, I got composer up and running. We've moved on to:

[:error] [pid 30842] [client 10.2.100.50:64862] PHP Catchable fatal error: Argument 1 passed to DOMXPath::__construct() must be an instance of DOMDocument, null given, called in /var/lib/mediawiki/extensions/Html2Wiki/vendor/querypath/querypath/src/QueryPath/CSS/DOMTraverser.php on line 406 and defined in /var/lib/mediawiki/extensions/Html2Wiki/vendor/querypath/querypath/src/QueryPath/CSS/DOMTraverser.php on line 440, referer: http://ibss-info/mediawiki/index.php/Special:Html2Wiki

Oh, and permissions are (and were) good. Thanks for the followup for my brain-deaditude about composer.

Please download the newest version (or git pull)

I committed a new version which wraps all calls to htmlqp() in try/catch blocks so that you can see what content is having parse errors.

QueryParse is an excellent library, but not all HTML is as excellent.

Thanks for spurring me to do this, I meant to do this earlier. I will mark this as closed.

freephile closed this task as Resolved.Aug 8 2015, 3:12 AM

p.s. if you can actually report back whether it shows you what content is malformed/unparsable I'd appreciate it. I don't have a good "bad" example to test.

Switching to email for now...

It was also found that a missing dependency on Tidy could produce this error. The latest code now checks for all dependencies. The documentation has also been updated to be more clear and concise.