VisualEditor: Support "Language conversion blocks" for multi-script wikis
Open, NormalPublic40 Story Points

Description

The "language conversion blocks" are a wikitext feature that allow users to define text content in parallel scripts. The most high profile is in Chinese, which has two major writing systems and automated conversion between them, but there are 28 others, some of which are not automated conversion (so VE will need to not just mark the text with an appropriate <span>, but allow the user to understand when they have edited text that needs editing twice or thrice (possibly in scripts they cannot/do not want to use?) - see https://meta.wikimedia.org/wiki/Wikipedias_in_multiple_writing_systems

Documentation of the feature (focussed on the syntax) is here: https://www.mediawiki.org/wiki/Writing_systems/Syntax

Parsoid will need to add support for this first, which is T43716: Support language variant conversion in Parsoid.

Details

Reference
bz47411
bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz47411.
Tbayer added a subscriber: Tbayer.Aug 23 2015, 10:59 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 23 2015, 10:59 PM
Jdforrester-WMF updated the task description. (Show Details)
Jdforrester-WMF set the point value for this task to 40.
Krinkle removed a subscriber: Krinkle.Sep 2 2016, 12:18 AM
Jdforrester-WMF changed the task status from Open to Stalled.Sep 2 2016, 6:24 PM
cscott changed the task status from Stalled to Open.May 31 2017, 4:49 PM

@cscott Any updates for us on this since it's reopened? :-)

Change 356739 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/VisualEditor@master] WIP: Specialized inspector for LanguageConverter markup

https://gerrit.wikimedia.org/r/356739

cscott added a subscriber: Catrope.Jun 1 2017, 9:45 PM

@Deskana sure: Theres's fully-functioning Parsoid support in https://gerrit.wikimedia.org/r/140235 (T43716 phase 1). VE currently "alienates" these blocks, which means they are invisible and uneditable but round-trip fine if you don't touch them.

I'm currently working on some basic support for displaying/editing the language converter rules in VE (which is what this task is about). Very rough start of a patch above; there are some representation issues to work out. VE is a little weak on directly editing generated content. I'll copy some discussion with @Catrope here:

(05:17:05 PM) cscott-free: Roan: is there any precedent in VE for an inline BranchNode ?
(05:17:30 PM) cscott-free: that is, a Node that behaved like an annotation: you could edit inside it, and it was laid out inline.
(05:17:43 PM) RoanKattouw: No I don't think so
(05:18:01 PM) RoanKattouw: There are inline nodes like images
(05:18:14 PM) cscott-free: yes, but they are ve.ce.LeafNodes, not allowed to have any contents
(05:18:22 PM) RoanKattouw: But no inline nodes that have children
(05:18:33 PM) cscott-free: even BlockImage is a LeafNode, and you have to click into the inspector to edit the caption
(05:18:38 PM) RoanKattouw: Yeah
(05:18:52 PM) RoanKattouw: So, I would like us to have inline editing of image captions
(05:19:01 PM) RoanKattouw: But it wouldn't be done that way
(05:19:09 PM) RoanKattouw: What is the application you have in mind?
(05:19:34 PM) cscott-free: I tried to use toDataElement to convince the dm that my <span> was actually an annotation, but that didn't work quite right with generated content.
(05:20:24 PM) cscott-free: -{R|foo}- is a silly sort of <nowiki>, right? But it's represented by Parsoid as <span typeof="mw:LanguageVariant" data-mw-variant='{....text:"foo"}'></span>
(05:20:43 PM) RoanKattouw: What does that syntax mean again?
(05:21:01 PM) cscott-free: It just means "protect foo from language conversion"
(05:21:09 PM) RoanKattouw: Aha OK
(05:21:20 PM) cscott-free: but there are other similar forms
(05:21:22 PM) RoanKattouw: Does it also output Foo?
(05:21:25 PM) cscott-free: yes
(05:21:37 PM) RoanKattouw: Then shouldn't the span be not empty?
(05:21:49 PM) cscott-free: well... maybe.
(05:22:16 PM) RoanKattouw: I mean, at least for display purposes you would want that, right?
(05:22:22 PM) cscott-free: see https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec/Language_conversion_blocks#Alternative_2
(05:22:45 PM) RoanKattouw: Also, am I allowed to have annotations inside?
(05:22:56 PM) RoanKattouw: Can part of foo be bold, or a link?
(05:23:01 PM) cscott-free: yes, you can have annotations inside
(05:22:56 PM) cscott-free: the idea is that eventually we'll do the actual language conversion client-side, and so we'd fill the empty spans with the correct thing based on the currently-selected variant
(05:23:12 PM) RoanKattouw: Oh, I see
(05:23:30 PM) RoanKattouw: So it behaves kind of like a reference then
(05:23:42 PM) RoanKattouw: Or a mix between that and an auto numbered link
(05:23:43 PM) cscott-free: in other cases, there might be more than one possible output in there, and I only want to see one of them (at a time)
(05:23:49 PM) RoanKattouw: Right
(05:24:10 PM) RoanKattouw: So the content isn't text then, it's HTML
(05:24:14 PM) cscott-free: yeah
(05:24:38 PM) cscott-free: and it's got the usual "could be block, could be inline" thing that MWTransclusionNode deals with
(05:24:55 PM) RoanKattouw: That's a tricky one, I'd recommend picking edsanders and David's brains too
(05:25:12 PM) RoanKattouw: The fact that it can be block is very annoying
(05:25:23 PM) RoanKattouw: That makes it hard to make it an annotation
(05:26:01 PM) cscott-free: RoanKattouw: yes, and it's something I'd eventually like to fix on the PHP side. But there are a few cases like -{zh-cn:==Foo==;zh-tw:==Bar==}-
(05:26:15 PM) cscott-free: which should really be rewritten as == -{zh-cn:Foo;zh-tw:Bar}- ==
(05:27:19 PM) cscott-free: usual annoying story about balance, templates, markup boundaries, etc.
(05:26:14 PM) RoanKattouw: Right
(05:25:13 PM) cscott-free: At any rate, I guess the easiest way to get started here is to implement it as a 'boring' LeafNode and not allow direct editing, you'll have to use an inspector
(05:27:15 PM) RoanKattouw: If it's only annotated text, then you could make it an annotation that would have to do some magic to generate its own content and feed the right content back into data-mw
(05:29:03 PM) cscott-free: yeah, with an annotation i just hit a roadblock in ve.dm.Converter:getDomSubtreeFromData which doesn't give me a way to return the annotation *and* the data contained as an array from toDataElement, the way that nodes can
(05:29:30 PM) cscott-free: so I might patch that and handle the case where toDataElement returns an array of length > 1 from toDataElement
(05:29:45 PM) cscott-free: but figured I'd check to make sure I wasn't missing something obvious first
(05:30:01 PM) RoanKattouw: Yeah you'll have to invent something new here
(05:30:16 PM) RoanKattouw: In both directions
(05:30:31 PM) RoanKattouw: Or add pre- and post-processing steps
(05:30:15 PM) cscott-free: I might also just give in an generate explicit <span typeof="mw:LanguageVariant/raw">foo</span> for some of these cases
(05:30:24 PM) cscott-free: which would be a more direct analog of <nowiki>
(05:30:43 PM) RoanKattouw: Also look at how nowiki works on the way out
(05:31:07 PM) RoanKattouw: It's not really the same, it drops the annotation, but it might give you ideas
(05:31:04 PM) cscott-free: If you did -{foo<div>bar}- then I *think* the usual HTML5 treebuilding would split the <span> over the <div> and you'd effectively get -{foo}-<div>-{bar}-
(05:31:32 PM) cscott-free: that works for the "raw output" case, but things get really hairy if there are multiple alternatives.
(05:31:38 PM) RoanKattouw: Well if the resulting HTML is too weird, you can just alienate it
(05:32:25 PM) RoanKattouw: Which might even happen automatically because of special treatment of the mw: prefix and protections against misnesting
(05:33:13 PM) cscott-free: yeah, right now VE is alienating everything which is fine but because the <span>s are empty the result is that the content goes missing
(05:33:43 PM) cscott-free: again, maybe an indicator that this whole "empty span to be filled with converted output" idea isn't all that hot. we'll see.
(05:34:20 PM) cscott-free: anyway, I think using a Node and just dealing with non-direct editing is the way to go for the crappy-first-draft.
(05:34:32 PM) cscott-free: although i'm curious how you planned to allow direct figure caption editing
(05:41:02 PM) RoanKattouw: cscott: Basically, make captions work the way references work, with their contents being in an internalList item or subdocument, then create a surface on that subdoc and embed it in the image frame
(05:41:22 PM) RoanKattouw: The subdocuments thing is a refactor I started in 2014 and never finished
(05:42:25 PM) RoanKattouw: You could also do it without changing the DM representation if you are able to make a surface for a subset of the document
(05:42:54 PM) RoanKattouw: But the change in representation would allow inline images to retain captions
(05:44:55 PM) cscott-free: Yeah, I just would want to be able to cursor seamlessly "into" the embedded subdoc.

Change 358396 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/VisualEditor@master] Node inspector for LanguageConverter markup

https://gerrit.wikimedia.org/r/358396

Change 361921 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/VisualEditor@master] WIP: Dialog for editing LanguageConverter markup

https://gerrit.wikimedia.org/r/361921

Change 356739 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Display LanguageConverter markup in VisualEditor

https://gerrit.wikimedia.org/r/356739

Change 358396 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Context item for LanguageConverter markup

https://gerrit.wikimedia.org/r/358396

Change 361921 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Inspectors for editing LanguageConverter markup

https://gerrit.wikimedia.org/r/361921

I believe there's still more outstanding here, right?