Page MenuHomePhabricator

Investigate the impact of T14974 for IP Masking
Open, MediumPublic

Description

Motivation

For temporary usernames we plan to use asterisk (*) as the prefix. It appears that T14974: The newline added to a template, magic word, variable, or parser function that returns line-start wikicode formatting (*#:; {|) causes unexpected parsing can cause potential issues for our plan. This task is to investigate the issues that will happen from product & technical perspective and propose mitigations.

Event Timeline

Niharika triaged this task as Medium priority.Mar 15 2023, 9:56 PM
Niharika created this task.

I can summarize the problem, since I'm familiar with T14974. The short summary is: whenever {{...}} syntax in wikitext [template or parser function] would return a value that starts with * [or a few other characters], it will magically insert a line break before it.

This is relevant to temporary usernames, because the leading * in them will break some templates. Simplified example: [[User:{{ucfirst:*Unregistered 1}}]] will not generate a working link, because of the magic line break:

image.png (1×2 px, 91 KB)

I saw a discussion about this recently at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_204#The_new_temporary_user_accounts_and_old_scripts.


I think I'd like to see that bug fixed (and many template authors would as well), the issue is that no one knows whether that would break existing content (and how much of it). There is a reason for this behavior (it's so that templates that output lists and tables would still work when not placed at the start of a line). You'd also need to fix it in Parsoid, which faithfully reproduced the buggy behavior.

I'm also not sure whether the current buggy behavior, combined with *Unregistered usernames, would actually affect things much. It's possible to work around this in templates that display user links etc., and there's a chance that people already did that (since those usernames are already valid), some of the templates named in that enwiki discussion work just fine. So this is not necessarily a blocker.

Personally I'd suggest changing the temp usernames. Perhaps a leading ? instead would be nice? I think the format is easily configurable, I don't think anyone is attached to the current one, and it'd be nice to avoid this issue coming up in discussions about IP masking (which will probably be long enough already), even if it wouldn't be a big deal in practice.

If you want to take a stab at the parser bug, I'd start by figuring out whether that would break any pages. I'd just add some logging here: https://gerrit.wikimedia.org/g/mediawiki/core/+/e90886e5736bf9afa4721a79cb530bab14e88bf6/includes/parser/Parser.php#3332 or ask Content-Transform-Team for better advice, especially on the Parsoid part.