When wikitext was created, in 1995, it served a vital function in allowing inexperienced users to easily create and edit pages. In the twenty years since, no standard wikitext emerged, and since 2004 the stripped-down formatting of Markdown has emerged as the plaintext formatting syntax of choice for much of the web. Mediawiki-style wikitext failed to be adopted outside our project. For a decade it has been on a declining mindshare trajectory.
It is time to decouple wikitext from core.
It should be possible to create an HTML-only wiki, with Visual Editor as the primary editing mechanism and no wikitext parsing for typical views and edits. Advanced users could install Parsoid to round-trip from the HTML DOM to wikitext for source editing, translating from wikitext back to the HTML DOM for database storage and display. Eventually new projects may arise to similarly allow round-trip "source" editing in other formats, such as Markdown or a new and refreshed "wikitext 2.0". But simple installations need none of that.
After outlining this vision, we will describe the architectural changes needed to achieve it:
- ContentHandler laid the groundwork for non-wikitext page content, we must build on it: An HTML-format "Mediawiki DOM" ContentHandler must be written, using DOM methods to separate sections and extract redirects. The "Mediawiki DOM" Content implementation must extract secondary data (links, categories, etc) directly from the DOM. (Alternatively, page metadata should be stored in a separate JSON "page metadata" attachment and custom editors provided.)
- An HTML-based DifferenceEngine must be implemented to allow visualizing changes without resorting to wikitext.
- VisualEditor must be tweaked to fetch Mediawiki DOM directly, bypassing Parsoid; ditto on save.
- System messages must be associated with a content model, to allow HTML-formatted system messages. Localization workflows need to accommodate non-wikitext messages. Most messages do not need formatting and should probably shift to a "plaintext" content model.
- The Sanitizer will need improvement so that it is appropriate to run directly on Mediawiki DOM.
- Compatibility thunks are also desirable. These would use Parsoid to dynamically generate wikitext from the Mediawiki DOM to allow some legacy extensions and APIs to function.
Perhaps a rough prototype can be demonstrated. The attendees will be able to suggest other areas that might present roadblocks to an HTML-only wiki.
The long-term goal of the Parsoid team is for Parsoid to eventually disappear, replaced by HTML-only wikis and round-trip conversion tools to simpler "source" formats. The main Wikipedia projects will continue to rely on wikitext for a long time yet, but this work would be the first step towards deprecating Parsoid for some users: allowing small wikis to install a monolithic PHP-only mediawiki core with native HTML storage and visual editing, in the same way Flow has been able to use native HTML storage.
- Agreement with stakeholders on the major implementation tasks above.
- Input from broader community about wikitext dependencies in our tooling/processes/extensions/gadgets/etc which could be reimagined, could be reimplemented using Parsoid DOM, or require deeper thought.