Page MenuHomePhabricator

[BUG] Wikitext appearing in some descriptions, which should be plain text
Open, NormalPublicBUG REPORT

Description

This example is apparently coming from a short description override:

{{short description|Canadian regional airline owned by [[WestJet Airlines, Ltd]] .}}

@bearND manually edited this particular article to remove the wiki text from the description...
https://en.wikipedia.org/w/index.php?title=WestJet_Encore&type=revision&diff=860011406&oldid=860000062&diffmode=source
... but manually editing individual articles could become a whack-a-mole game.

Is this a rare issue or are there hundreds or thousands of such occurrences?

We may want to make the description documentation more clearly state that descriptions should be plain text.

Are there other ways we could prevent this from happening?

Event Timeline

Mhurd created this task.Sep 17 2018, 7:40 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 17 2018, 7:40 PM

IRC convo, for posterity:

11:58 PM <bearND> tgr: the enwiki TFA (of today) has an overridden description with wikitext inside: https://en.wikipedia.org/w/index.php?title=WestJet_Encore&action=edit.
11:59 PM <bearND> https://usercontent.irccloud-cdn.com/file/CtnV0ntM/Screen%20Shot%202018-09-17%20at%2011.06.08%20AM.png
11:59 PM <bearND> Is there a way wiki text could be useful in a description?
12:00 AM <tgr> not really, but in theory we are converting it to plain text
12:00 AM <bearND> where do we try to do that?
12:02 AM <bearND> Do descriptions show up anywhere else beside in the apps?
12:05 AM <tgr> nah, I'm probably misremembering
12:06 AM <tgr> the logic's at https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/client/includes/Hooks/ShortDescHandler.php#L61
12:06 AM <tgr> you can see the description on the info page
12:07 AM <tgr> also I think the short description template has some kind of hack to show it if you override default CSS with a gadget, but since it uses its own logic for that that one probably does get parsed
12:08 AM <tgr> so anyway, user error, should not contain wikitext
12:09 AM <bearND> ok. Yeah the wikitext shows up even on the info page (https://en.wikipedia.org/wiki/WestJet_Encore?action=info)
12:10 AM <tgr> https://en.wikipedia.org/wiki/Wikipedia:Short_description does not explicitly say it's plain text but does say it should not contain links so you can refer to that if you want to edit it
12:12 AM <bearND> oops, already edited it
12:13 AM <bearND> but i'll keep it in mind if anyone complains
12:19 AM <bearND> tgr: Maybe the description override can display an error when wikitext is used inside?
12:31 AM <tgr> I'm not sure how we van detect wikitext, short of parsing which is expensive. Also the description could be plain text which just happens to be valid wikitext (granted it does not seem likely)

JMinor triaged this task as Normal priority.Sep 24 2018, 6:52 PM
JMinor added a subscriber: JMinor.

I think originally Gergo's parser function was stripping out some non-allowed characters, but not sure how robust.

bearND added a subscriber: Tgr.Sep 24 2018, 7:45 PM

Related: T195551: [BUG] Formatting in local descriptions is breaking use of descriptions (example: San Francisco article)

IMO we could be much more restrictive about what characters/patterns are allowed in local descriptions. No special formatting characters, HTML tags, wikitext markup, etc.

I would think we could catch most of the common mistakes with a few simple regexes to detect things like HTML tags or common wikitext constructs (e.g., [[]], {{}}). It won't foil someone determined to beat the system and enter invalid text, but that should be enough to prevent most good faith mistakes.

For reference, this appears to be the validation logic currently applied to descriptions in Wikibase:

https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/54d283c322062e7e9ed49a6a15738bbe75d4ee10/repo/includes/Validators/TermValidatorFactory.php#L124-L135

/**
 * @return ValueValidator[]
 */
private function getCommonTermValidators() {
	$validators = [];
	$validators[] = new TypeValidator( 'string' );
	$validators[] = new StringLengthValidator( 1, $this->maxLength, 'mb_strlen' );
	// no leading/trailing whitespace, no tab or vertical whitespace, no line breaks.
	$validators[] = new RegexValidator( '/^\s|[\v\t]|\s$/u', true );
	return $validators;
}
Tgr added a comment.Sep 26 2018, 4:01 PM

I think originally Gergo's parser function was stripping out some non-allowed characters, but not sure how robust.

It matches Wikidata logic, so it mostly filters out characters which you couldn't type in an input field. { or [ is not invalid in a plaintext description, just unusual.

Making changes in the description that are invisible to the user entering the description is not a good place to be.
Rejecting the edit can be problematic when the description comes from another page (although the only realistic scenario I can think of is it coming from an infobox which generates it from the page title, in which case it is very unlikely to contain anything that happens to be valid wikitext). Also I'm not sure how well it is supported (we do it for more formal content types likes JSON, but do we currently do it for anything in wikitext?) Maybe the enwiki community could set up an abuse filter.

How do the users who make these errors? If they use some kind of tool that shows them the description (like the gadget mentioned on IRC) and that tool makes them believe wikitext works, that should be easy to fix.

LGoto added a subscriber: LGoto.Jul 3 2019, 3:58 PM

@Tgr checking in on this, can the ticket be resolved?

Restricted Application changed the subtype of this task from "Task" to "Bug Report". · View Herald TranscriptJul 3 2019, 3:58 PM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald Transcript
Tgr added a comment.Jul 3 2019, 6:38 PM

Is this a rare issue or are there hundreds or thousands of such occurrences?

72 (out of 1354718 short descriptions total) per select count(*) from page_props where pp_propname = 'wikibase-shortdesc' and pp_value like '%[[%';. So it does not seem too bad.

We may want to make the description documentation more clearly state that descriptions should be plain text.

Done.

I'll close this as resolved. Please feel free to reopen if you think more (or something else) should be done.

Tgr closed this task as Resolved.Jul 3 2019, 6:38 PM
Tgr claimed this task.
Tgr reopened this task as Open.Jul 7 2019, 10:42 AM

Reopening per the discussion on the talk page. Apparently wikilinks (even if not used much) aren't really editor errors, the {{short description}} template is used in two different ways, to invoke the {{SHORTDESC:}} magic word which provides short descriptions via the API, and to provide content for the {{Annotated link}} template (which apprently uses Lua to fetch the linked page and then regular expressions to parse the {{short description}} template and its description parameter out of the wikitext - wow) so that the descriptions can be shown on e.g. disambiguation pages. In that latter usage pattern wikimarkup does get parsed, and there's no consensus on whether it should be allowable or not.

Another interesting issue that was raised there is the capitalization - the short descriptions guideline recommends using sentence case, while Wikidata doesn't, so users will see inconsistent casing.

Given that we'd eventually like to migrate short descriptions to their own MCR slot so they can be edited easily, not sure how much effort it is worth to try fixing the current situation, or whether it's a good idea to try to change {{short description}} usage patterns when in the long term that template might be used for a different use case.

Tgr added a comment.Jul 8 2019, 7:27 AM

Another interesting issue that was raised there is the capitalization - the short descriptions guideline recommends using sentence case, while Wikidata doesn't, so users will see inconsistent casing.

Related: T227424: Sentence-casing Wikidata descriptions in the Android app results in wrong case for proper nouns starting with lowercase