Page MenuHomePhabricator

Standalone markup-to-HTML translator?
Closed, ResolvedPublic

Description

Author: anakin

Description:
I have recently been using a personal installation of MediaWiki to maintain a set of web pages on my home machine, for my own personal use in scribbling assorted notes about things. I've found MediaWiki's markup to be pretty much ideal for this purpose, since it supports mathematics typesetting (many of my notes are mathematical in nature), tables and image embedding, but is an extremely light markup format (much easier to write than LaTeX or even HTML) and has much better support for generating output in the form of a collection of HTML pages with hyperlinks between them.

However, the rest of the MediaWiki infrastructure is not useful to me, and in fact is somewhat counterproductive: my database and web server administration skills are virtually zero, and so I fluctuate between accidentally allowing anyone to edit my personal notes pages and accidentally not even allowing myself to. And if my database ever gets corrupted, I've really no idea how I'll restore it; migrating it through a Debian upgrade was more than enough of a headache.

I would much prefer that my files of notes were simply stored on disk as text files in MediaWiki markup format, and that there were some standalone Unix utility I could just run from a makefile which converted them all into static HTML pages. Then I could keep all my notes, and the accompanying image files and (occasionally) scripts that generate image files from PostScript source or similar, in an ordinary source control repository, which would allow me to track changes and control access to them by means which I'm more familiar with and which are more suited to my particular needs.

I've checked over the MediaWiki 1.12 download archive and haven't found any obvious indication of such a utility already existing; but it seems to me that it shouldn't be too difficult to construct one by writing a simple front end on some of the existing code.

Does such a thing exist (or is there an existing Bugzilla entry for it) which I've missed? If not, would there be any interest in providing one?

(I'm willing to write code for it myself, given a few pointers about where to start. My vague thought is that one would start with includes/Parser.php, and write a simple command-line front end to the Parser class plus some back-end hooks to divert the database access functions to a text-file based storage model. But I'd like to find out if there's interest in including such a utility in the real code base before I start; if I have to maintain my own hacked-on version of the MediaWiki code base in perpetuity then I might look at alternative options instead.)


Version: unspecified
Severity: enhancement

Details

Reference
bz15118

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:17 PM
bzimport set Reference to bz15118.
bzimport added a subscriber: Unknown Object (MLST).

Just for kicks, I tried playing around with php-gtk this past weekend to see if I could implement a standalone parser app. In theory, it could be done, but I can't seem to get it to work locally (thread-safe build of PHP + extensions doesn't jive well with non-thread-safe php-gtk, and by not well I mean not at all).

^_^ For my own kicks, awhile back I was experimenting with my own little parser.

http://dev.dev.wiki-tools.com/extensions2/TestingGrounds/XWT.php
http://dev.wiki-tools.com/extensions2/TestingGrounds/XWT.php
(Btw, the dev.dev. has XDebug enabled, but that's a single instance. dev. runs on my production php which is watched to keep alive)

XWT or eXtensible Wiki Text (Just because the original idea was to use XML but now it's an object tree and it'll be easily convertable to a XML or JSON format) is a little project.

The idea is an attempt at creating a fully rule based parser for a Wiki Text *INSPIRED* syntax. I can't stress the fact that it's inspired enough. I can guarantee that while 90% of the syntax is the same, there are parts of the syntax that will be different, and most of the quirks that are used in MW aren't valid in XWT. (ie: {{Start}}Link{{End}} to create a link is invalid in XWT; Additionally the parametrized Image: embed format won't be valid, I'll be doing something using curly brace syntax because of parse differences)

Originally the idea was based around the ability to convert XWT into a XML format which WYSIWYG editors could modify and feed back to the parser to get the same, but modified XWT back. But when you think of it, the first step of the parser is completely independent of any data MediaWiki has stored in the DB, and only a few config parameters would even be needed. So it does have different possible application, perhaps standalone, or ;) perhaps a port to JavaScript, heh... ^_^ Imagine the browser doing the conversion live.

Resolving this LATER. We need a standalone parser period, before we can think about using it in 3rd party apps.

This is work in progress and basically working already: https://www.mediawiki.org/wiki/Parsoid
Hence closing as FIXED nowadays.