Page MenuHomePhabricator

Preprocessor/Parser irregularities with -{...}- variant constructs.
Open, NormalPublic

Description

This is a parent task for various subtasks having to do with irregularities parsing -{...}- constructs, especially if they contain embedded vertical bar characters. See the subtasks for specific bugs having to do with different places in the parser and preprocessor where irregularities have been found.

Details

Reference
bz52661

Related Objects

StatusAssignedTask
OpenNone
OpenNone
OpenNone
StalledNone
OpenNone
StalledNone
Resolvedovasileva
OpenNone
DuplicateNone
StalledABorbaWMF
OpenNone
ResolvedPchelolo
Resolvedmobrovac
ResolvedPchelolo
ResolvedJdforrester-WMF
ResolvedMarkTraceur
OpenNone
ResolvedJdforrester-WMF
Resolvedcscott
OpenNone
OpenNone
OpenNone
Opencscott
OpenNone
Opencscott
Invalid GWicke
Resolvedliangent
Resolvedthiemowmde
OpenNone
Resolvedcscott
Resolvedcscott
ResolvedElitre
Resolvedcscott
Resolvedcscott
Resolvedcscott
Resolvedcscott
Resolvedcscott
Opencscott
Resolvedcscott
Opencscott
Opencscott
Opencscott

Event Timeline

bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz52661.
bzimport added a subscriber: Unknown Object (MLST).
cscott created this task.Aug 9 2013, 2:38 AM
cscott added a comment.Aug 9 2013, 2:39 AM

From IRC:

07:36:15 PM) TimStarling: but it wouldn't be too hard to make the preprocessor annotate it, the same way it does with links
(07:36:33 PM) TimStarling: you know the preprocessor is responsible for expanding templates
(07:36:55 PM) TimStarling: but it marks up links for the sole purpose of getting correct template DOM
(07:37:28 PM) TimStarling: e.g. for parameter splitting in {{ a [[b|c]] }}
(07:37:45 PM) TimStarling: it would probably be beneficial for -{}- to be handled in the same way
(07:38:15 PM) TimStarling: then {{ a -{b|c}- }} would work in the intuitive way
(07:38:10 PM) cscott-free: yes. i think i'm going to add [[File:foobar.jpg|-{R|rawcaption}-]] as a parser test and open a bugzilla for that. for the future.

Change 78330 had a related patch set uploaded by Cscott:
Add parserTests for language converter markup.

https://gerrit.wikimedia.org/r/78330

btw There's another wikitext snippet that isn't handled well currently:

;-{zh-cn:AAA;zh-tw:BBB}-

Is this resolvable with the preprocessor change?

(In reply to comment #3)

;-{zh-cn:AAA;zh-tw:BBB}-
Is this resolvable with the preprocessor change?

Yes, I believe this has the same root cause.

(In reply to comment #4)

(In reply to comment #3)
> ;-{zh-cn:AAA;zh-tw:BBB}-
> Is this resolvable with the preprocessor change?

Yes, I believe this has the same root cause.

Lists are not handled by the preprocessor. The issue here is that the list handler (doBlockLevels) is not aware of -{ }- either and (wrongly) recognizes the embedded colon as a single-line dt/dd pair.

Right. But if the preprocessor lifts out the -{...}- constructs, then doBlockLevels won't get confused. So yes, same root cause.

If you reintroduce language conversion blocks only after doBlockLevels is done, then you'll need to find a different way to parse the contents of those blocks independently of the main content.

One more thing being broken:

{|

-
-{RB}-
}

Also:
-{zh-cn:[[Category:A]];zh-tw:[[Category:B]];}-

This shouldn't be in both A and B (should it?). We don't want the category to depend on the variant. So maybe it *should* be in both?

I think it should be in neither. (gwicke agrees.)

[[Category:foo]] would add it to the 'foo' category. in a variant where foo=>bar, it might appear like [[Category:foo|bar]], and be edited that way by VE, but that wouldn't change the category of the page. Category links inside -{...}- would be forbidden (that is, parsed as plain text).

Change 78330 merged by jenkins-bot:
Add parserTests for language converter markup.

https://gerrit.wikimedia.org/r/78330

  • Bug 72875 has been marked as a duplicate of this bug. ***
  • Bug 72010 has been marked as a duplicate of this bug. ***

Change 311849 had a related patch set uploaded (by C. Scott Ananian):
WIP: protect language converter markup in the preprocessor.

https://gerrit.wikimedia.org/r/311849

cscott renamed this task from Preprocessor should handle -{...}- variant constructs. to Preprocessor/Parser irregularities with -{...}- variant constructs..Sep 21 2016, 6:17 PM
cscott updated the task description. (Show Details)

Change 312066 had a related patch set uploaded (by C. Scott Ananian):
Other language converter bugs (test case tweaks)

https://gerrit.wikimedia.org/r/312066

There are also irregularities in how lists and tables with language converter markup are handled; see https://gerrit.wikimedia.org/r/312066

Change 312066 abandoned by C. Scott Ananian:
Other language converter bugs (test case tweaks)

Reason:
Squashed into https://gerrit.wikimedia.org/r/327127

https://gerrit.wikimedia.org/r/312066

I added T153761 as blocker since it would be nice to have a test for that case (I see things are moving: T146305#2891350).

cscott added a comment.Jun 5 2017, 2:16 AM

There's an issue with autolink URLs, like:

-{en-us:http://elevator.com;en-gb:http://lift.net}-

because the autolink regexp won't stop at the semicolon and thus will grab the en-gb and break the language converter nesting. See T166429: Getting a unclean output with {{#property:P856}} on site which enables Language Converter.

Not entirely sure this is fixable, it seems to be a genuine priority mismatch between autolink and language converter constructs.

Dalba added a subscriber: Dalba.Feb 11 2018, 2:25 AM