Page MenuHomePhabricator

Getting a unclean output with {{#property:P856}} on site which enables Language Converter
Closed, ResolvedPublic

Description

I put this into the article on zh.wp. And I found that it outputs a incomplete code-block of Language Converter.

Like :w:zh:新浪微博, It outputed this.

http://www.weibo.com">http://www.weibo.com;zh-hans:http://www.weibo.com;zh-hant:http://www.weibo.com;zh-cn:http://www.weibo.com;zh-hk:http://www.weibo.com;zh-mo:http://www.weibo.com;zh-sg:http://www.weibo.com;zh-tw:http://www.weibo.com;}-

But I used Special:ExpandTemplates to parse it, it's output is clean. just 'http://www.weibo.com'.

Tech/News/2016/41 has a news that 'Language converter syntax will soon no longer work inside external links.' , Is it about this problem? And I'm not very sure whether it have a Mediawiki message page to set the format of the property's output, or a hard code to format it when the site enable Language Converter

Event Timeline

Cwek created this task.May 27 2017, 12:43 PM
Restricted Application added a project: Wikidata. · View Herald TranscriptMay 27 2017, 12:43 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
thiemowmde triaged this task as High priority.May 29 2017, 7:09 AM
thiemowmde moved this task from incoming to consider for next sprint on the Wikidata board.

The bug only happens in wikis with multiple language variants and is most probably a result of the most recent parser change described in https://www.mediawiki.org/wiki/Parsoid/Language_conversion/Preprocessor_fixups. Also see T146305: Parser should protect -{...}- variant constructs in links and related.

Steps to reproduce:

Possible solutions:

  • We can not always add a space before all semicolons. This would make the weblink regex stop and fix the issue. But the space is never stripped, becomes part of the output, and will create all kinds of other issues. Try for yourself: a-{zh:b ;zh-hans:c ;}-d.
  • In the example the wikitext snippet that is show to the user is always the same, in all variants. We can optimize our code and not output any -{…}- syntax if this is the case. I believe this is enough to fix all user-facing issues.
  • We can change the weblink renderer to not output http://www.weibo.com, but [http://www.weibo.com http://www.weibo.com]. This is a heavy breaking change to the #property parser function we always wanted to avoid, and the reason we added the #statements function instead. However, we could make this change for languages with variants only, and leave it as it is in all other cases.

Originally reported at https://www.mediawiki.org/wiki/Topic:Trb03em5x4y2kypg.

Cwek added a comment.Jun 2 2017, 5:58 AM

@thiemowmde
I think the second solution is maybe better. And I think that we can check the data type of property to confirm whether it is URL or not. If the date type of property is URL and the site have enabled Language Converter, the output does not format with variant code block.

Change 356801 had a related patch set uploaded (by Thiemo Mättig (WMDE); owner: Thiemo Mättig (WMDE)):
[mediawiki/extensions/Wikibase@master] Optimize VariantsAwareRenderer to not output broken weblink syntax

https://gerrit.wikimedia.org/r/356801

thiemowmde moved this task from Backlog to Review on the Wikidata-Former-Sprint-Board board.
thiemowmde moved this task from consider for next sprint to in progress on the Wikidata board.
Elitre added a subscriber: ssastry.
Elitre added a subscriber: Elitre.

@ssastry was this tied to the recent preprocessor change? Thanks.

@ssastry was this tied to the recent preprocessor change? Thanks.

The preprocessor change was deployed to these group2 wikis on June 1 and the bug report and other discussion is from before that. So, that indicates this is unrelated to the preprocessor change.

cscott added a comment.Jun 3 2017, 4:39 PM

This is not related to the preprocessor change, it is long-standing bug in php LanguageConverter (to wit, conversion is applied at an awkward point in the parser pipeline, so it doesn't properly respect other syntactic structures). Sadly, LanguageConverter is riddled with these, although I've been making slow progress fixing them.

I personally prefer solution three above: there is no reason to make the URL vary by selected variant, only the link caption should change. Effectively solution three would wrap the language variant inside the <a> tag, while the autolink behavior would emit multiple <a> tags inside the variant markup.

@thiemowmde
I think the second solution is maybe better. And I think that we can check the data type of property to confirm whether it is URL or not. If the date type of property is URL and the site have enabled Language Converter, the output does not format with variant code block.

This is not related to the preprocessor change, it is long-standing bug in php LanguageConverter (to wit, conversion is applied at an awkward point in the parser pipeline, so it doesn't properly respect other syntactic structures). Sadly, LanguageConverter is riddled with these, although I've been making slow progress fixing them.
I personally prefer solution three above: there is no reason to make the URL vary by selected variant, only the link caption should change. Effectively solution three would wrap the language variant inside the <a> tag, while the autolink behavior would emit multiple <a> tags inside the variant markup.

A combination of 2 and 3 is ideal. Where there is no reason to emit a language variant (as in this example), it makes sense to adopt 2. But, when necessary, I think we should generate wikitext that parses to a reasonable looking DOM structure. So, [URL -{text1;text2}-] is preferable to -{link1;link2}- ... the principle being to restrict variant code to the smallest well-formed DOM construct.

Change 356801 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Optimize VariantsAwareRenderer to not output broken weblink syntax

https://gerrit.wikimedia.org/r/356801

WMDE-leszek moved this task from Done to Review on the Wikidata-Former-Sprint-Board board.
thiemowmde closed this task as Resolved.Jun 13 2017, 1:44 PM
thiemowmde moved this task from Review to Done on the Wikidata-Former-Sprint-Board board.
thiemowmde removed a project: Patch-For-Review.
cscott added a comment.EditedJun 15 2017, 6:50 PM

Hm, remember that -{...}- markup can have side effects (ie, adding a rule to the conversion table) if the codes involved are languages, not scripts. Seems like you ought to be a little careful, since the "escape" function being used can have nonlocal effects...

Cwek awarded a token.Jun 16 2017, 2:21 AM
Cwek rescinded a token.

@cscott, I'm afraid I don't understand. Is your comment a response to something specific that was said before? Do you think there is something we can improve, in addition to what https://gerrit.wikimedia.org/r/356801 already did? What "nonlocal" effects do you mean?

@thiemowmde Nevermind, I got a little mixed up. I was concerned that -{zh:Foo;zh-tw:Bar}- would add an entry to the global translation table, in addition to specifying a one-time conversion. But that's not the case. You need to add the A or H (or -) flag to have global effects.