Page MenuHomePhabricator

How to deal with blocked messages on client that require advanced parsing?
Open, Needs TriagePublic

Description

Soft blocks are created like so:

$block = new Block( [
       'address' => $ip,
       'byText' => 'MediaWiki default',
       'reason' => wfMessage( 'softblockrangesreason', $ip )->text(),
       'anonOnly' => true,
       'systemBlock' => 'wgSoftBlockRanges',
] );

On https://en.wikipedia.org/wiki/MediaWiki:Softblockrangesreason the custom message uses table wikitext markup.

The mobile editor before launching will check the 'blockinfo' state of the page using:
https://en.m.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&rvprop=content|timestamp&titles=Offset_printing&meta=userinfo&uiprop=blockinfo&formatversion=2&rvsection=0

The mobile editor when it encounters such a block will attempt to render the blockreason for the block in JavaScript. To do this it currently uses the jQuery message parser, assuming the message is simple. It does so like this:

parser = new mw.jqueryMsg.parser();
ast = parser.wikiTextToAst( blockReason );

This throws an exception on the softblockrangesreason message - see T191470#4108889

This is particularly problematic on API based clients such as apps and mobile web and means we cannot render blocks.

Possible solutions

  • Provide HTML in the blockinfo API responses or have a way for clients to render these blocks. T191558 & T194530
  • Make a request to API:Parsing_wikitext to parse the wikitext on the server. T194530#4204614
  • Limit the wikitext that can be used in a block message inside the Block class. Forbid the use of tables.
  • Upgrade jQuery msg to be able to handle more types of wikitext.
  • When blockinfo has a recognized systemblocktype, ignore the reason field in favor of some custom logic.

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added subscribers: MGChecker, Aklapper. · View Herald TranscriptApr 10 2018, 11:13 PM
Jdlrobson renamed this task from How to deal with blocked messages on client that require parsing? to How to deal with blocked messages on client that require advanced parsing?.Apr 10 2018, 11:13 PM
dbarratt updated the task description. (Show Details)Apr 10 2018, 11:14 PM
Dinoguy1000 updated the task description. (Show Details)Apr 11 2018, 12:19 AM
Anomie added a subscriber: Anomie.Apr 11 2018, 1:59 PM

provide HTML in the blockinfo API responses

The trend has been to remove random HTML blobs from the action API output.

Block reasons are weird in that they get parsed as full wikitext in certain contexts while most "reason" and "comment" parameters only have wikitext links rendered. This particular case is doubly weird because the block never hits the database and so isn't subject to the usual length limitations.

or have a way for clients to render these blocks.

action=parse already exists to parse wikitext.

Limit the wikitext that can be used in a block message inside the Block class. Forbid the use of tables.

That also sounds like it could be a good idea. Although it wouldn't surprise me much if enwiki reacted by putting the existing wikitext in a template and changing the message to something like {{uw-privateiprangeblock|$1}} <!-- Editing without an account from a private IP range is disabled -->. That would nicely parallel how enwiki does many other blocks.

Upgrade jQuery msg to be able to handle more types of wikitext.

That's probably another good long-term goal.


Another possibility would be to look for systemblocktype in the block info and ignore the reason field if the type is recognized in favor of some custom logic. Current types in core (I see none in extensions in gerrit) are:

  • 'wgSoftBlockRanges', where the reason field comes from MediaWiki:softblockrangesreason.
  • 'dnsbl', where the reason field comes from MediaWiki:sorbsreason.
    • Looks like $wgEnableDnsBlacklist is only enabled on enwikinews, thwiki, thwiktionary, thwikiquote, thwikibooks, and thwikisource at this time.
  • 'proxy', where the reason field comes from MediaWiki:proxyblockreason.
    • It looks like $wgProxyList is not currently used on any WMF wikis, as far as I can tell.
  • 'global-block', which has no reason.
    • This shouldn't happen on WMF sites, since the GlobalBlocking extension supplies a better block and nothing else uses this code path.

OTOH, probably the real root cause of the problem in T191470 is that reading-web-staging.wmflabs.org, being in Cloud VPS, sees a private Cloud VPS IP instead of the visitor's actual IP and for some reason (probably blindly copying from WMF prod) has that IP in $wgSoftBlockRanges. If someone can hit this auto-block in a correct production configuration it would be a more interesting problem.

Anomie updated the task description. (Show Details)Apr 11 2018, 1:59 PM
Nikerabbit added a subscriber: Nikerabbit.EditedApr 11 2018, 2:56 PM

I would not recommend trying to have make jQuery.msg a full fledged parser, we already have multiple parsers.

I would not recommend trying to have make jQuery.msg a full fledged parser, we already have multiple parsers.

Do we have a client-side wikitext parser? I assume so since VisualEditor must?

Anomie added a comment.EditedApr 11 2018, 3:08 PM

I would not recommend trying to have make jQuery.msg a full fledged parser, we already have multiple parsers.

I remember back when one of the claimed "features" of writing Parsoid in nodejs was that in theory the same code could be used for parsing wikitext in-browser too. ;)

Do we have a client-side wikitext parser? I assume so since VisualEditor must?

No. As I understand it, VE calls to Parsoid (via Restbase) to turn wikitext into well-structured HTML with a lot of attached metadata (RDFa I think). Then it turns that HTML into in-browser DOM and manipulates that as the user edits. When the user hits "save", VE turns the in-browser DOM back into well-structured HTML-plus-metadata and gives that to Parsoid to turn back into wikitext.

jQuery.msg handles a subset of wikitext that's usually sufficient for handling simple i18n messages. But there's a lot it doesn't include.

Yeah... I mean I can't imagine what Parsoid would be doing that would require node.js. So maybe it's worth exploring running Parsoid in the browser. But maybe someone more familiar can weigh in on it. Seems like that would solve a lot of problems (including reducing the number of requests to edit).

Id like to avoid hitting the parse api here as on slower connections this would really slow down the ability to edit.

OTOH, probably the real root cause of the problem in T191470 is that reading-web-staging.wmflabs.org, being in Cloud VPS,

Although this particular example (softblock) is unlikely to happen in production the fact still stands that anything can be put in these block messages so it's a potential problem for any type of block. That's what im concerned about, not this specific block. So im not sure treating systemblocktype differently helps things. I can still define another block message with a table right now.

Limit the wikitext that can be used in a block message inside the Block class. Forbid the use of tables.

This seems like the best fitting solution given the above. Is there anyway to do this? How might that work?

Another improvement would be for jQueryMsg to error when it cannot parse rather than throw an exception. That would at least make it easier to fall back to a default message without resorting to exceptions.

Id like to avoid hitting the parse api here as on slower connections this would really slow down the ability to edit.

I was saying it should be run in the web browser, not on the server. So there would be no additional requests.

Another improvement would be for jQueryMsg to error when it cannot parse rather than throw an exception. That would at least make it easier to fall back to a default message without resorting to exceptions.

I mean we could also just catch the exception, and advise that others do the same.

Nikerabbit moved this task from Backlog to Other teams on the Language-Team board.Apr 11 2018, 3:36 PM

Id like to avoid hitting the parse api here as on slower connections this would really slow down the ability to edit.

I note that this code path would only occur when the user is blocked, in which case the user can't edit anyway.

I remember back when one of the claimed "features" of writing Parsoid in nodejs was that in theory the same code could be used for parsing wikitext in-browser too. ;)

Welp.. apparently that is never going to happen because it's being rewritten in PHP anyways. T191991

Change 432723 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] User: System block reasons shouldn't expand templates

https://gerrit.wikimedia.org/r/432723

Limit the wikitext that can be used in a block message inside the Block class. Forbid the use of tables.

That also sounds like it could be a good idea. Although it wouldn't surprise me much if enwiki reacted by putting the existing wikitext in a template and changing the message to something like {{uw-privateiprangeblock|$1}} <!-- Editing without an account from a private IP range is disabled -->. That would nicely parallel how enwiki does many other blocks.

In trying this on enwiki (in my volunteer capacity there), I find that the block reason is being passed through ->text() (expands templates) rather than ->plain() (does not expand templates). I suppose that makes sense if you assume the reason messages will only contain simple things like {{SITENAME}}, but as we can see wikis do weird things.

dbarratt updated the task description. (Show Details)May 14 2018, 5:26 PM
dbarratt updated the task description. (Show Details)May 14 2018, 5:30 PM

Change 432723 merged by jenkins-bot:
[mediawiki/core@master] User: System block reasons shouldn't expand templates

https://gerrit.wikimedia.org/r/432723

Krinkle added a subscriber: Krinkle.EditedJul 11 2018, 7:40 PM

Yeah... I mean I can't imagine what Parsoid would be doing that would require node.js. So maybe it's worth exploring running Parsoid in the browser. [..]. Seems like that would solve a lot of problems (including reducing the number of requests [..]

I realise this is moot, but wanted to clarify: Yes, there is a subset of wikitext that can be parsed standalone (as markup). This is already supported by mediawiki.jqueryMsg. The issues raised on this task are with the rest of wikitext (aka template expansion), such as:

  • Transclusion: To embed a template, one needs the template's content (requires a roundtrip). The fetched template may reveal additional templates.
  • Magic words, parser functions, and parser tags: These are essentially built-ins, with their contents dynamically generated by PHP rather than wikitext. Examples are {{#ifexist:, <gallery> and {{GENDER:}}. Their logic and response format could be re-implemented, but they still require arbitrary information about pages, thumbnails, users, and extensions.

In short: If we had a browser version of Parsoid, it would increase the number of requests, and require a heavier download. Ultimately, it couldn't render the text quick enough.

In short: If we had a browser version of Parsoid, it would increase the number of requests, and require a heavier download. Ultimately, it couldn't render the text quick enough.

I understand. I just wonder if, since templates rarely change, if templates could be cached in the browser for a long period of time. If the user encountered another wikitext string that used that template, they wouldn't have to make any requests at all to the server. Likewise, common constructs, or common templates, could be included in the dist package of Parsoid so there wouldn't be any requests to parse those strings.

You are right, the initial request would be heavier, as the user would be downloading Parsoid, but subsequent pages would actually be faster and use fewer requests.

This task is probably not the right place to have that discussion.

I think this is probably the simplest way to go:

Provide HTML in the blockinfo API responses or have a way for clients to render these blocks.

Restricted Application added a project: Core Platform Team. · View Herald TranscriptOct 30 2019, 11:46 PM