Page MenuHomePhabricator

Prototype Abstract Wikipedia in Scribunto
Closed, ResolvedPublic

Description

The Scribunto extension for MediaWiki provides many of the facilities required for prototyping Abstract Wikipedia:

  • Ability to create and modify code-objects on wiki, collaboratively.
  • Ability to invoke these code-objects from articles to generate prose.
  • A programming environment featuring an expressive, general-purpose language (Lua).
  • Behavior of code can vary by user or content language.
  • Interfaces for querying Wikidata.

As a proof-of-concept, I tried to implement the example given in figures 1 and 2 of Architecture for a Multilingual Wikipedia on MetaWiki, and I think it worked pretty well:

Hot take: this demonstrates the suitability of Scribunto as a platform for prototyping Abstract Wikipedia, which is something we should start ASAP, so the learnings from that process can influence the design of Wikifunctions.

Event Timeline

I'm not convinced we should do this but happy to discuss.

As the person behind Ninai and Udiron, I must agree with Cory :-) . Besides, there are some facilities I believe are needed for traversing the item and lexeme RDF graph, which do not yet exist, that will make multilingual text generation more feasible in the long run.

Besides, there are some facilities I believe are needed for traversing the item and lexeme RDF graph, which do not yet exist, that will make multilingual text generation more feasible in the long run.

Cool, sounds like you've thought about this quite a lot -- can you expand on that / do you have this written down somewhere?

Cool, sounds like you've thought about this quite a lot -- can you expand on that / do you have this written down somewhere?

The specific facilities in question would allow me to find, given a particular item/sense as a starting point, all senses connected to it via a certain set of properties. Starting at "band"@nb (meaning 'leash'), for example, I should be able to get to "leash"@en, ideally without needing to know what specific property or inverse property sequence was needed to get there (although perhaps knowing this path might be useful for debugging).

In Ninai this is implemented by running SPARQL queries for all connections involving certain properties (translation, synonym, antonym, troponym of, hyperonym, pertainym, item for this sense, predicate for, demonym of), putting them into a NetworkX graph, and then traversing certain subgraphs of that graph (when looking for verbs or other predicates, only the 'predicate for', 'translation', and 'synonym' relations are currently used in the traversal).

Given the size of the resulting graph, generating it frequently is not desirable (Ninai caches the graph on disk for a week, or more or less if configured to do so). As such having a way to traverse the existing RDF graph in a similar fashion would be useful.

@Mahir256 that's a very useful comment, thank you. I think there is no disagreement, actually.

I edited the task description to read "provides many of the facilities" rather than "all".

It's not that I think there are no additional facilities to implement, but that it makes the most sense to me to prototype these additional facilities on top of Scribunto, since in addition to providing the major building blocks, it is also quite extensible. (As an example, the Wikidata APIs that I used in the demo above are part of the Wikibase extension, not Scribunto proper.) I don't expect Scribunto to be an exact fit, but it does a huge amount of the heavy lifting required.

I will fully accept the first two bullet points you mentioned, and to some extent the third bullet point, but believe the fourth is somewhat limited and must take issue with the fifth.

With respect to the qualifications in the above sentence:

  • A programming environment featuring an expressive, general-purpose language (Lua).
    • (Python is offered as an implementation language at Wikifunctions's launch, whereas I understand Lua to be coming later (T307171).)
    • Wikifunctions functions are statically typed, yet there is no way to mimic this with type annotations the way Python's typing module offers. (Some Lua forks do have type annotations, but those are not in Scribunto.)
    • Wikifunctions requires inputs/outputs to be immutable, yet there is no way to enforce this immutability of Lua tables the way Python's NamedTuples offer.
  • Behavior of code can vary by user or content language.
    • Not all languages which have lexemes have MediaWiki content languages, making customization to the user's preferred language incomplete.
    • Even for some more frequently occurring languages, there isn't a proper correspondence between language codes that might be returned by MediaWiki and language codes or corresponding language items used within lexemes. (Consider Bokmål which uses 'nb' and the item for 'Bokmål' on Wikidata but 'no' (and the item for 'Norwegian', probably) on Wikipedia.)
  • Interfaces for querying Wikidata.
    • In my view, the absence of the specific facilities I mentioned in my previous comment negate the validity of this point and will make a Scribunto prototype at this time more unwieldy than it needs to be.
    • (Never mind, of course, that lexicographical data access is only enabled on bnwiktionary and euwiktionary.)
    • (Even if lexicographical data access were enabled elsewhere, a breadth-first search through forward property paths could be mimicked in Lua, but it is very likely to run up against limits on execution time and resource usage, and any inverse property paths which might link two entities cannot currently be used in this way.)

I believe it is time to revisit this. The NLG architecture proposal put forward by @AGutman-WMF envisions a system where non-programmers implement renderers via a template language (or a UI that generates the syntax of this template language). The template language is translated into the Wikifunctions composition language and executed on Wikifunctions. Effectively, the templating language is a kind of intermediate representation that gets "compiled" into Wikifunctions code.

I think it is fair to ask whether Wikifunctions is the correct execution platform to be targeting. AIUI, with the NLG system being proposed, most Abstract Wikipedia contributors will not be interacting with Wikifunctions directly, and thus will not stand to derive much benefit from the Wikifunction UIs or its support for multiple implementation languages. So what is it that is gained by targeting a language that is partially-specified, complex, suffers from performance issues, and lacks the ecosystem of tooling that modern programming environments provide?

IMO we should, at the very least, prototype an implementation that translates the template language to Lua, so we have some basis for a competitive evaluation.

I agree with @ori it's worth the while to attempt a Lua prototype of this.
This raises however some design questions:

  1. Where would the NLG templates be stored? Would they exist as special pages within Wikipedia (as Wikitext templates do, AFAIU)?
  2. Would the NLG templates be compiled into Lua code at authoring time, or will they be interpreted by a Lua parser on the go? This affects the question of how calls to sub-templates should be handled - as normal function calls or as templates which need special parsing.
  3. In general, how one would go ahead and execute the functions embedded within template slots? The most straightforward possibility is to use Lua's loadstring function, however, this is currently disabled in Scribunto. Also, this would allow running any arbitrary Lua code in a template slot, which is arguably too much. Another option is to parse the functional expressions in the slots and call the functions through the environment variable _G.

I feel that the direction this design has taken argues strongly against Scribunto prototyping. Largely, @AGutman-WMF 's very good questions address my concerns. In particular, the ability for Wikifunctions to generate a function composition from the template language makes questions of pipelining and nested function calls tractable. I am asking the same questions as @AGutman-WMF about sub-templates and the like.

I'd like to add some other concerns. We've framed the NLG system currently under development as an open standard and reference system, which can be adopted modularly, in whole or in part, by other NLG systems within the Wikifunctions ecosystem. This would be impossible to accomplish with Scribunto.

The current design relies heavily on Wikifunctions's ability to create and share community-defined types. This would also be difficult to do in Scribunto without touching the source code itself. Contributing to source code operates against the design philosophy of Abstract Wikipedia. The goal is for people without a lot of coding knowledge to be able to contribute functions.

@ori , while your criticisms of Wikifunctions are not incorrect, we're talking about an unfinished and unreleased system. To address each concern individually:

  • "partially-specified": I'm not sure what this means, but @AAssaf-WMF is working hard on tightening our function model.
  • "complex": Our designers are working hard to make that not the case. On the flip side, I do not find Scribunto particularly intuitive, but I am happy to learn.
  • "suffers from performance issues": Yes, and that is one of the work items you are in charge of.
  • "lacks the ecosystem of tooling": I'm not sure what this means, either. This is an unreleased system. We have some designs around how to include external libraries in the supported languages, which I don't think Scribunto can do.

If you'd like to translate the template language to Lua for a competitive evaluation, by all means do. I'd be happy to see the template language translator. However, please note that we are slated to support Lua in Wikifunctions. A competitive evaluation, then, would mean running the Lua code both in Wikifunctions and in Scribunto. If you'd like to do this evaluation, then the addition of a Lua code runner to Wikifunctions blocks that work and should take priority.

One other major point I forgot to mention: coding in Scribunto does not support multiple natural languages. The internationalization of functions is integral to allowing community members to contribute to NLG templates.