Maniphest T191939

How to deal with blocked messages on client that require advanced parsing?
Closed, ResolvedPublic
Actions

Description

Soft blocks are created like so:

$block = new Block( [
       'address' => $ip,
       'byText' => 'MediaWiki default',
       'reason' => wfMessage( 'softblockrangesreason', $ip )->text(),
       'anonOnly' => true,
       'systemBlock' => 'wgSoftBlockRanges',
] );

On https://en.wikipedia.org/wiki/MediaWiki:Softblockrangesreason the custom message uses table wikitext markup.

The mobile editor before launching will check the 'blockinfo' state of the page using:
https://en.m.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&rvprop=content|timestamp&titles=Offset_printing&meta=userinfo&uiprop=blockinfo&formatversion=2&rvsection=0

The mobile editor when it encounters such a block will attempt to render the blockreason for the block in JavaScript. To do this it currently uses the jQuery message parser, assuming the message is simple. It does so like this:

parser = new mw.jqueryMsg.parser();
ast = parser.wikiTextToAst( blockReason );

This throws an exception on the softblockrangesreason message - see T191470#4108889

This is particularly problematic on API based clients such as apps and mobile web and means we cannot render blocks.

Possible solutions

Provide HTML in the blockinfo API responses or have a way for clients to render these blocks. T191558 & T194530
Make a request to API:Parsing_wikitext to parse the wikitext on the server. T194530#4204614
Limit the wikitext that can be used in a block message inside the Block class. Forbid the use of tables.
Upgrade jQuery msg to be able to handle more types of wikitext.
When blockinfo has a recognized systemblocktype, ignore the reason field in favor of some custom logic.

Details

	Subject	Repo	Branch	Lines +/-
	User: System block reasons shouldn't expand templates	mediawiki/core	master	+4 -4

Customize query in gerrit

Related Objects

Mentioned In: T236970: Block reasons should be parsed on mobile as they are on desktop
T210273: Quick UI changes to mobile web "you are blocked" message to increase helpfulness
T192037: Writeup some sort of position statement against subsets of wikitext
T191558: API list=blocks should allow bkprop=parsedreason
T194530: Add the ability to return the parsed block reason to API:Userinfo
T191991: Allow Parsoid to be run in the browser as a standard JavaScript library
T191470: MobileFrontend editor crashes on certain block messages due to issue with .jqueryMsg.parser
Mentioned Here: T191558: API list=blocks should allow bkprop=parsedreason
T194530: Add the ability to return the parsed block reason to API:Userinfo
T191991: Allow Parsoid to be run in the browser as a standard JavaScript library
T191470: MobileFrontend editor crashes on certain block messages due to issue with .jqueryMsg.parser

Event Timeline

Jdlrobson created this task.Apr 10 2018, 11:13 PM

Restricted Application added subscribers: MGChecker, Aklapper. · View Herald TranscriptApr 10 2018, 11:13 PM

Jdlrobson renamed this task from How to deal with blocked messages on client that require parsing? to How to deal with blocked messages on client that require advanced parsing?.Apr 10 2018, 11:13 PM

Jdlrobson mentioned this in T191470: MobileFrontend editor crashes on certain block messages due to issue with .jqueryMsg.parser.

dbarratt updated the task description. (Show Details)Apr 10 2018, 11:14 PM

Jdlrobson moved this task from Incoming to Needs Prioritization on the Web-Team-Backlog board.Apr 10 2018, 11:24 PM

Dinoguy1000 updated the task description. (Show Details)Apr 11 2018, 12:19 AM

BethNaught subscribed.Apr 11 2018, 6:37 AM

Anomie moved this task from Unsorted to Non-core-API stuff on the MediaWiki-Action-API board.Apr 11 2018, 1:55 PM

provide HTML in the blockinfo API responses

The trend has been to remove random HTML blobs from the action API output.

Block reasons are weird in that they get parsed as full wikitext in certain contexts while most "reason" and "comment" parameters only have wikitext links rendered. This particular case is doubly weird because the block never hits the database and so isn't subject to the usual length limitations.

or have a way for clients to render these blocks.

action=parse already exists to parse wikitext.

Limit the wikitext that can be used in a block message inside the Block class. Forbid the use of tables.

That also sounds like it could be a good idea. Although it wouldn't surprise me much if enwiki reacted by putting the existing wikitext in a template and changing the message to something like {{uw-privateiprangeblock|$1}} . That would nicely parallel how enwiki does many other blocks.

Upgrade jQuery msg to be able to handle more types of wikitext.

That's probably another good long-term goal.

Another possibility would be to look for systemblocktype in the block info and ignore the reason field if the type is recognized in favor of some custom logic. Current types in core (I see none in extensions in gerrit) are:

'wgSoftBlockRanges', where the reason field comes from MediaWiki:softblockrangesreason.
'dnsbl', where the reason field comes from MediaWiki:sorbsreason.
- Looks like $wgEnableDnsBlacklist is only enabled on enwikinews, thwiki, thwiktionary, thwikiquote, thwikibooks, and thwikisource at this time.
'proxy', where the reason field comes from MediaWiki:proxyblockreason.
- It looks like $wgProxyList is not currently used on any WMF wikis, as far as I can tell.
'global-block', which has no reason.
- This shouldn't happen on WMF sites, since the GlobalBlocking extension supplies a better block and nothing else uses this code path.

OTOH, probably the real root cause of the problem in T191470 is that reading-web-staging.wmflabs.org, being in Cloud VPS, sees a private Cloud VPS IP instead of the visitor's actual IP and for some reason (probably blindly copying from WMF prod) has that IP in $wgSoftBlockRanges. If someone can hit this auto-block in a correct production configuration it would be a more interesting problem.

Anomie updated the task description. (Show Details)Apr 11 2018, 1:59 PM

I would not recommend trying to have make jQuery.msg a full fledged parser, we already have multiple parsers.

In T191939#4123394, @Nikerabbit wrote:

I would not recommend trying to have make jQuery.msg a full fledged parser, we already have multiple parsers.

Do we have a client-side wikitext parser? I assume so since VisualEditor must?

In T191939#4123394, @Nikerabbit wrote:

I would not recommend trying to have make jQuery.msg a full fledged parser, we already have multiple parsers.

I remember back when one of the claimed "features" of writing Parsoid in nodejs was that in theory the same code could be used for parsing wikitext in-browser too. ;)

In T191939#4123421, @dbarratt wrote:

Do we have a client-side wikitext parser? I assume so since VisualEditor must?

No. As I understand it, VE calls to Parsoid (via Restbase) to turn wikitext into well-structured HTML with a lot of attached metadata (RDFa I think). Then it turns that HTML into in-browser DOM and manipulates that as the user edits. When the user hits "save", VE turns the in-browser DOM back into well-structured HTML-plus-metadata and gives that to Parsoid to turn back into wikitext.

jQuery.msg handles a subset of wikitext that's usually sufficient for handling simple i18n messages. But there's a lot it doesn't include.

Yeah... I mean I can't imagine what Parsoid would be doing that would require node.js. So maybe it's worth exploring running Parsoid in the browser. But maybe someone more familiar can weigh in on it. Seems like that would solve a lot of problems (including reducing the number of requests to edit).

Id like to avoid hitting the parse api here as on slower connections this would really slow down the ability to edit.

OTOH, probably the real root cause of the problem in T191470 is that reading-web-staging.wmflabs.org, being in Cloud VPS,

Although this particular example (softblock) is unlikely to happen in production the fact still stands that anything can be put in these block messages so it's a potential problem for any type of block. That's what im concerned about, not this specific block. So im not sure treating systemblocktype differently helps things. I can still define another block message with a table right now.

Limit the wikitext that can be used in a block message inside the Block class. Forbid the use of tables.

This seems like the best fitting solution given the above. Is there anyway to do this? How might that work?

Another improvement would be for jQueryMsg to error when it cannot parse rather than throw an exception. That would at least make it easier to fall back to a default message without resorting to exceptions.

In T191939#4123525, @Jdlrobson wrote:

Id like to avoid hitting the parse api here as on slower connections this would really slow down the ability to edit.

I was saying it should be run in the web browser, not on the server. So there would be no additional requests.

Another improvement would be for jQueryMsg to error when it cannot parse rather than throw an exception. That would at least make it easier to fall back to a default message without resorting to exceptions.

I mean we could also just catch the exception, and advise that others do the same.

dbarratt mentioned this in T191991: Allow Parsoid to be run in the browser as a standard JavaScript library.Apr 11 2018, 3:34 PM

• Nikerabbit moved this task from Backlog to Other teams/Watching on the Language-Team board.Apr 11 2018, 3:36 PM

In T191939#4123525, @Jdlrobson wrote:

Id like to avoid hitting the parse api here as on slower connections this would really slow down the ability to edit.

I note that this code path would only occur when the user is blocked, in which case the user can't edit anyway.

Jdlrobson moved this task from Needs Prioritization to Tracking on the Web-Team-Backlog board.Apr 11 2018, 4:17 PM

Jdlrobson edited projects, added Web-Team-Backlog (Tracking); removed Web-Team-Backlog.

In T191939#4123432, @Anomie wrote:

I remember back when one of the claimed "features" of writing Parsoid in nodejs was that in theory the same code could be used for parsing wikitext in-browser too. ;)

Welp.. apparently that is never going to happen because it's being rewritten in PHP anyways. T191991

• TBolliger moved this task from Untriaged to Tracking work by others on the Anti-Harassment board.Apr 13 2018, 6:08 PM

Change 432723 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] User: System block reasons shouldn't expand templates

https://gerrit.wikimedia.org/r/432723

gerritbot added a project: Patch-For-Review.May 12 2018, 12:18 PM

In T191939#4123182, @Anomie wrote:

Limit the wikitext that can be used in a block message inside the Block class. Forbid the use of tables.

That also sounds like it could be a good idea. Although it wouldn't surprise me much if enwiki reacted by putting the existing wikitext in a template and changing the message to something like {{uw-privateiprangeblock|$1}} . That would nicely parallel how enwiki does many other blocks.

In trying this on enwiki (in my volunteer capacity there), I find that the block reason is being passed through ->text() (expands templates) rather than ->plain() (does not expand templates). I suppose that makes sense if you assume the reason messages will only contain simple things like {{SITENAME}}, but as we can see wikis do weird things.

Anomie mentioned this in T194530: Add the ability to return the parsed block reason to API:Userinfo.May 14 2018, 1:57 PM

Anomie mentioned this in T191558: API list=blocks should allow bkprop=parsedreason.May 14 2018, 2:21 PM

dbarratt updated the task description. (Show Details)May 14 2018, 5:26 PM

dbarratt updated the task description. (Show Details)May 14 2018, 5:30 PM

• ssastry mentioned this in T192037: Writeup some sort of position statement against subsets of wikitext.May 14 2018, 5:58 PM

Legoktm removed projects: Parsoid, jQuery-Client.May 15 2018, 4:05 PM

Change 432723 merged by jenkins-bot:
[mediawiki/core@master] User: System block reasons shouldn't expand templates

https://gerrit.wikimedia.org/r/432723

ReleaseTaggerBot added a project: MW-1.32-notes (WMF-deploy-2018-05-22 (1.32.0-wmf.5)).May 15 2018, 6:00 PM

• TBolliger moved this task from Backlog to User blocking on the MediaWiki-User-management board.May 18 2018, 4:04 PM

In T191939#4123485, @dbarratt wrote:

Yeah... I mean I can't imagine what Parsoid would be doing that would require node.js. So maybe it's worth exploring running Parsoid in the browser. [..]. Seems like that would solve a lot of problems (including reducing the number of requests [..]

I realise this is moot, but wanted to clarify: Yes, there is a subset of wikitext that can be parsed standalone (as markup). This is already supported by mediawiki.jqueryMsg. The issues raised on this task are with the rest of wikitext (aka template expansion), such as:

Transclusion: To embed a template, one needs the template's content (requires a roundtrip). The fetched template may reveal additional templates.
Magic words, parser functions, and parser tags: These are essentially built-ins, with their contents dynamically generated by PHP rather than wikitext. Examples are [{{#ifexist:](https://www.mediawiki.org/wiki/Help:Extension:ParserFunctions), [<gallery>](https://www.mediawiki.org/wiki/Parser_extension_tags) and [{{GENDER:}}](https://www.mediawiki.org/wiki/Help:Magic_words). Their logic and response format could be re-implemented, but they still require arbitrary information about pages, thumbnails, users, and extensions.

In short: If we had a browser version of Parsoid, it would increase the number of requests, and require a heavier download. Ultimately, it couldn't render the text quick enough.

In T191939#4417115, @Krinkle wrote:

In short: If we had a browser version of Parsoid, it would increase the number of requests, and require a heavier download. Ultimately, it couldn't render the text quick enough.

I understand. I just wonder if, since templates rarely change, if templates could be cached in the browser for a long period of time. If the user encountered another wikitext string that used that template, they wouldn't have to make any requests at all to the server. Likewise, common constructs, or common templates, could be included in the dist package of Parsoid so there wouldn't be any requests to parse those strings.

You are right, the initial request would be heavier, as the user would be downloading Parsoid, but subsequent pages would actually be faster and use fewer requests.

This task is probably not the right place to have that discussion.

dbarratt mentioned this in T210273: Quick UI changes to mobile web "you are blocked" message to increase helpfulness.Feb 15 2019, 5:18 PM

Maintenance_bot removed a project: Patch-For-Review.May 22 2019, 2:52 PM

dbarratt mentioned this in T236970: Block reasons should be parsed on mobile as they are on desktop.Oct 30 2019, 11:34 PM

I think this is probably the simplest way to go:

Provide HTML in the blockinfo API responses or have a way for clients to render these blocks.

Restricted Application added a project: Platform Engineering. · View Herald TranscriptOct 30 2019, 11:46 PM

stwalkerster subscribed.Oct 31 2019, 8:20 AM

Anomie moved this task from Inbox to Triage Meeting Inbox on the Platform Engineering board.Oct 31 2019, 1:46 PM

CCicalese_WMF removed a project: Platform Engineering.Nov 6 2019, 7:12 PM

DannyS712 edited projects, added MediaWiki-Blocks; removed MediaWiki-User-management.Apr 25 2020, 12:10 AM

Restricted Application added a project: MediaWiki-User-management. · View Herald TranscriptApr 25 2020, 12:10 AM

Aklapper removed a project: MediaWiki-User-management.Apr 25 2020, 12:23 PM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:01 PM

Jdlrobson removed a project: Web-Team-Backlog (Tracking).Sep 8 2021, 6:58 PM

It seems that these days, everyone solves this by calling the action=parse API to parse the wikitext.

How to deal with blocked messages on client that require advanced parsing?Closed, ResolvedPublicActions