Page MenuHomePhabricator

Add new datatypes for wikitext
Open, MediumPublic


We may create a new datatype for wikitext. Values may have simple html formatting (bold, italics, etc) and link (internal, external), but not templates, parser functions, magic words and most tags. The text may be:

  1. Untranslated wikitext
  2. Monolingual wikitext
  3. Multilingual wikitext (most usually)

This may be exported as three ways:

  1. Wikitext used in Wikimedia project (e.g. [[c:|Commons]] is a '''Wikimedia''' project.) (Default display in Wikidata interface; need to resolve T42128 first)
  2. Wikitext used in other 3rd-party MediaWiki project (e.g. [[commons:|Commons]] is a '''Wikimedia''' project.)
  3. HTML (e.g. <a href="">Commons</a> is a <b>Wikimedia</b> project.)

Use cases:

  1. Several usage note properties ( for example)
  2. Pages like , which is proposed to be converted to items
  3. File descriptions in Commons may contains links to other pages
  4. See T139573: Simple html formatting within Wikidata labels

Event Timeline

Another use case, several times discussed and used on frwiki, is image legends which frequently use wikitext on Wikipedia.

Bugreporter renamed this task from Add a new datatype for wikitext to Add new datatypes for wikitext.Feb 15 2017, 7:53 PM

As we are still waiting for T86517 to be able to add chemical nomenclature to Wikidata, that won't be possible without simple formatting.

Formatting (like <sub>, <sup>, <i> and <small>) is inseparable part of chemical systematic names and the need to use formatting is clearly indicated in the IUPAC nomenclature recommendations. Inability to add fully correct chemical names will be a significant step backwards, especially in view of the fact that many other chemistry databases provide fully correct names (cf. ChEBI database for example: (R)-methyl phenyl sulfoxide). It is understandable that label/description/aliases are not meant for this, because are just simple text, but properties should allow the addition of correct data.

It is therefore important to:

  1. use simple formatting with multilingual and monolingual text datatypes or
  2. provide different way to indicate which parts of the multilingual/monolingual values should be formatted by the end-user and how, e.g. by using regex in specific property that (a) would format the value in WD and (b) could be used by the end-user of data.

Using the second option has some limitations, e.g. right now we are losing information about which parts of the title should be italic, what is quite important in reusing the titles of scientific papers, e.g. Anti-complement Activity of Constituents from the Stem-Bark of //Juglans mandshurica// – because of this I'm not able to reuse WD data, as imported title won't be typographically correct. Seems a minor problem? Maybe, but this is a problem that decides whether to use the title or not.