Page MenuHomePhabricator

Explain the wiki syntax in detailed EBNF
Closed, DeclinedPublic

Description

Author: xmlizer

Description:
It is important to make a project to give the exact EBNF syntax wich contain all
the subtilities of the wikisyntax


Version: unspecified
Severity: enhancement

Details

Reference
bz7

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 6:42 PM
bzimport set Reference to bz7.
bzimport added a subscriber: Unknown Object (MLST).

leercontainer-bugzilla wrote:

(In reply to comment #0)

It is important to make a project to give the exact EBNF syntax wich contain all
the subtilities of the wikisyntax

Why don't you start a meta page with the basic framework?

timwi wrote:

I boggled my mind over this recently. What exactly would the [E]BNF for Wiki
Syntax describe?

In theoretical computer science, formal grammars are used to generate a language
(a set of strings). Some grammars can be turned into a characteristic algorithm,
i.e. one that determines if a given string is in the language. The algorithm is
said to "accept" or "reject" input strings. However, MediaWiki is supposed to
accept *ALL* strings: all strings are valid inputs and are turned into some
valid XHTML.

In practice, grammars are used to write parsers such as the one I'm currently
working on. Here, the grammar tells the parser what to do - or more precisely,
the production rules do, and as such, they sort of set out the semantics of the
mark-up. But how do you clarify semantics without the production rules?

Makes you wonder about stuff :)

timwi wrote:

Oh, and I forgot to mention this. EBNF seems to be for context-free grammars
only. The MediaWiki syntax for lists is not context-free however. I am
circumventing this in my parser by using a post-processing step, but if you're
only writing BNF, you can't do that...

wmahan_04 wrote:

(In reply to comment #4)

Oh, and I forgot to mention this. EBNF seems to be for context-free grammars
only. The MediaWiki syntax for lists is not context-free however. I am
circumventing this in my parser by using a post-processing step, but if you're
only writing BNF, you can't do that...

In light of that, is this bug WONTFIX? Or is it possible to describe wiki
in some sort of pseduo-BNF, short of duplicating your flex/bison parser?

robchur wrote:

This bug is, "go write it on Meta" fix. ;-)

Not sure I understand why this was closed.
A formal grammar is something we really need (and it may require
fixes to the grammar as well ;)

Some work has been going on at mediawiki.org
(http://www.mediawiki.org/wiki/Markup_spec and
http://www.mediawiki.org/wiki/Markup_spec/BNF/). It's early days and any input
would be appreciated.

A hopefully complete representation of the MW 1.12 preprocessor in ABNF is at:

http://www.mediawiki.org/wiki/Preprocessor_ABNF

Please note that the set of production rules alone does not allow you to derive the correct parse tree from a given input text. Wikitext is ambiguous in lots of complex and interesting ways. The disambiguation rules need to be specified along with the grammar.

I found the preprocessor ABNF project an enlightening exercise. You can say a lot about the syntax in a short space. And while I attempted to explain the disambiguation process, I know of no way to do this rigorously, without resorting to writing algorithms.

No it is not fixed. That page only describes a tiny portion of parser behaviour.

We have a fairly complete PEG tokenizer grammar in Parsoid (http://www.mediawiki.org/wiki/Parsoid), which describes the context-free portions of wikitext. Context-sensitive portions are handled in token stream transformers. The PEG parse tree is flattened to a token stream so that we can support unbalanced template expansions, and finally converted into a DOM using a tree builder library according to the error recovery algorithms described in the HTML5 spec.

The grammar is interspersed with actions and uses syntactic scope flags to compress the grammar productions a bit, so it is not the most readable grammar ever. Unrolling productions for all scope permutations might not help that much either, as this would increase the size of the grammar a lot.

Describing all of WikiText in EBNF is simply impossible, as parts of it are context-sensitive. Closing as wontfix for that reason.