Page MenuHomePhabricator

dumpHTML output have wrong relative paths to math, images, etc.
Closed, ResolvedPublic

Description

Author: gomyhr

Description:
php dumpHTML.php -d /some/directory

produces directory content approximately like this
user@server:/some/directory$ ls
a b c d e f g h i index.html l m math n o p q raw s t

The html files end up in places like /some/directory/s/a/n/Sandbox.html, but they refer to math files in "../../../../math" which would mean /some/math, instead of "../../../math" which translates to /some/directory/math. As a result no advanced math formulas show up in the exported html.

If --image-snapshot is used, and the fix from bug 12122 is used so that the images get relative paths, the relative paths of images have the same problem as the math, only that the directory name is upload. Curiously, the relative paths to files in the raw directory are correct ("../../../raw").

Also, the html files refer to CSS and images in "../../../../skins", but the skins are not copied to the new directory.

Workaround:
php dumpHTML.php -d /some/directory/wiki
cd /some/directory/wiki
mv math ..
cp -r <mwdir>/skins ..


Version: 1.12.x
Severity: enhancement

Details

Reference
bz13061

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:02 PM
bzimport set Reference to bz13061.
bzimport added a subscriber: Unknown Object (MLST).

I think this is solved in version v2.11 which creates MD5-hashed filenames for everything (articles, images, thumbnail images, media files). Dumps can be safely written to CD/DVD-ROMS see https://bugzilla.wikimedia.org/show_bug.cgi?id=8147 with attached diff and download url.