Page MenuHomePhabricator

Check user signature for linter errors
Closed, ResolvedPublic

Description

A user can enter a custom "raw" signature in their preferences.

The linter should run on this field and at least indicate that the signature is not good enough.

Following that, the system might either prevent the user from saving his preferences, or ignore the custom signature in favor of the default system signature until such time as the raw signature passes the linter check.

Might possibly use the work in T163091: Parsoid: Add API endpoint to get lint errors for arbitrary wikitext.


Dependencies

The patches to meet the "Requirements" described below should NOT be merged until the following tickets are resolved:

Requirements

  • When someone attempts to save a signature that meets either of the conditions below, prevent the signature from being saved and present a message that explains to people: 1) that their signature cannot be saved as it is currently written, 2) why their signature cannot be saved as it is currently written and 3) what changes they need to make to the signature they have written in order to save it.
    • Conditions:
      • The signature contains any lint errors
      • The signature contains misnested tags (both kinds) or stripped tags
  • Note: this change should not affect current signatures that would become invalid as a result of this change, as noted in T140606#5829318

Testing details

Scenario A

  1. Visit: https://en.wikipedia.org/wiki/Special:Preferences
  2. Navigate to the Signature section
  3. Check the Treat the above as wiki markup box
  4. Enter a signature like: [[User:YOUR USERNAME]] <small>[[User talk:YOUR USERNAME]]<small>
  5. Click Save
  6. Notice the follow message appears beneath the Signature: text field: Your signature contains invalid or deprecated HTML syntax: …
  7. Notice a "Show error location" button; clicking it selects text from the first <small> tag to the end

Scenario B

  1. Enter a signature like: [[User:YOUR USERNAME]] '''[[User talk:YOUR USERNAME]]
  2. Click Save
  3. Notice the follow message appears beneath the Signature: text field: Your signature contains invalid or deprecated HTML syntax: …
  4. Notice a "Show error location" button; clicking it selects text from the ''' syntax to the end

Done

  • Patches are written that meet the "Requirements" described above

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I will note that the signature is already supposed to be checked for HTML correctness – but the code doing this only looks at the input on the preferences page and does not actually parse it as wikitext, so it's easily fooled by loading your signature from a template (e.g. {{SUBST:User:Matma Rex/sig}}) and does not check mismatched bold/italic markup: DefaultPreferencesFactory::validateSignature()

Change 549951 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/core@master] [WIP] preferences: Check signature for linter errors

https://gerrit.wikimedia.org/r/549951

matmarex claimed this task.Nov 9 2019, 12:36 AM
matmarex renamed this task from User preferences: Check raw signature for linter errors to Check user signature for linter errors.Nov 12 2019, 11:20 PM

Is there any progress on this feature request? Signatures are still being added that cause Linter errors, e.g. this one from 11 January 2020:

https://en.wikipedia.org/w/index.php?title=Wikipedia%3AAdministrators%27_noticeboard%2FIncidents&type=revision&diff=935325391&oldid=935325138

Is there any progress on this feature request? Signatures are still being added that cause Linter errors, e.g. this one from 11 January 2020:

https://en.wikipedia.org/w/index.php?title=Wikipedia%3AAdministrators%27_noticeboard%2FIncidents&type=revision&diff=935325391&oldid=935325138

@matmarex seems to be working on it (see the gerritbot comment earlier), but it will take a little longer to get all the infrastructure pieces squared away.

Current status is:

  • I've completed the difficult-but-interesting parts of the implementation, i.e. actually checking the signature for lint errors and disallowing invalid ones. I have still to do the easy-but-tedious parts, e.g. adding new localisation messages and writing some unit tests (and I think I'll need to move some stuff around to make unit testing possible).
  • I was considering delaying the deployment until Parsoid is fully integrated with MediaWiki, since we'll be able to fetch the lint errors in a more efficient way, but C. Scott and Subbu commented on the task that the current way is good enough, so this isn't a blocker any more.
  • We'll need to consult the proposed changes with the Wikimedia communities (I'm thinking of this task and T237700 together here). I think both are clearly desirable on most wikis but perhaps there is something we are missing. Probably I should write a documentation page similar to e.g. https://meta.wikimedia.org/wiki/Turning_off_outdated_skins and ask for comments on the talk page. Depending on the replies, I might need to make changes or add a way to opt-out (per wiki).
  • Also, currently this will only disallow setting a new invalid signature in your preferences. If you already have a signature that turns out to be invalid, it won't disappear and will still be used when you sign. This is something I'd partcularly like to hear comments about, is it fine, or should we try to disable them somehow?

Thank you for not waiting for Parsoid deployment. My thanks to C. Scott for that sage advice; it'll be my treat the next time I see you at the Haven.

I will be happy to work with you on documentation if you start a page, or at least let me know where a page should live.

As for T237700, I think you may actually get good community support, at least on en.WP, if we try to deploy T237700 (if it is ready) and this fix at the same time. Smart editors usually get behind solutions to problems that cause headaches for power users and bots. We can call it "Validating editor signatures" or something like that.

On the last point, let's at least stop new invalid sigs from being saved, with a link to the help page. Linter gnomes can notify active editors who are currently using invalid ones. It is usually easy to recommend a new sig format that looks the same but is free of Linter errors.

TheDJ removed a subscriber: TheDJ.Jan 29 2020, 11:19 PM

Change 549951 abandoned by Bartosz Dziewoński:
[WIP] preferences: Check signature for linter errors

Reason:
It'll be easier for me to work on this and related changes in one patch, see: I07c575c2d9d2afe7a89c4847d16ac044417297bf

https://gerrit.wikimedia.org/r/549951

Change 569062 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/core@master] preferences: Signature validation (lint errors, user links, nested subst)

https://gerrit.wikimedia.org/r/569062

In the commit message:
"enforced my software" → "enforced by software"

I filed a separate task about writing the documentation and requesting wider community feedback: T244519. I posted a draft of the documentation page there just now. I'd appreciate your comments (or edits).

As long as we allow {{subst}} in a signature, there is no reasonable way for us to continually validate that what is transcluded is balanced wikitext, therefore the only reasonable thing to do is to not allow any transclusions in signatures.

As long as we allow {{subst}} in a signature, there is no reasonable way for us to continually validate that what is transcluded is balanced wikitext, therefore the only reasonable thing to do is to not allow any transclusions in signatures.

See https://meta.wikimedia.org/wiki/New_requirements_for_user_signatures#Disallow_%22nested%22_substitution_in_signature

As long as we allow {{subst}} in a signature, there is no reasonable way for us to continually validate that what is transcluded is balanced wikitext, therefore the only reasonable thing to do is to not allow any transclusions in signatures.

I think we could validate the signature whenever it's used, rather than only when it's saved. We can think about doing that later. I did not implement that yet, because it would also disallow the grandfathered-in invalid signatures, which we said would be allowed (at least for now).

Wait wait wait, that's not true. (That page currently says "The use of subst: markup and tildes would also be disallowed in signatures. Previously, it was possible to use these features to set a signature that would cause a subsequent editor's name to be placed on someone else's comments.")

The patch I wrote doesn't disallow the use of subst: – it only disallows the specific abusive constructs (by basically passing them through the parser and checking if a signature comes out), see examples here: SignatureValidatorTest.php in provideApplyPreSaveTransform().

JTannerWMF added a subscriber: JTannerWMF.

Once we get community input as mentioned in T244519 we can work on this task.

JTannerWMF moved this task from Backlog to FY 2019-20 on the Editing-team (Tracking) board.
ppelberg added a subscriber: ppelberg.EditedJun 12 2020, 9:28 PM

Task description updates
The scope of this task is now clear, now that we've finalized the changes and implementation plan. See T248632 for details.

With the above in mind, I've updated the task description with what's described below...


Dependencies
The patches to meet the "Requirements" described below should NOT be merged until the following tickets are resolved:

Requirements

  • When someone attempts to save a signature that meets either of the conditions below, prevent the signature from being saved and present a message that explains to people: 1) that their signature cannot be saved as it is currently written, 2) why their signature cannot be saved as it is currently written and 3) what changes they need to make to the signature they have written in order to save it.
    • Conditions:
      • The signature contains any lint errors
      • The signature contains misnested tags (both kinds) or stripped tags
  • Note: this change should not affect current signatures that would become invalid as a result of this change, as noted in T140606#5829318
ppelberg updated the task description. (Show Details)Jun 12 2020, 9:29 PM

This task will get "unblocked" once the following tickets are resolved:

ppelberg updated the task description. (Show Details)Jun 12 2020, 9:52 PM
Izno added a comment.Jun 12 2020, 11:28 PM

Please consider using a different task for implementation then. This one requested all linter errors. The implementation the discussion tools work resolved to does not include all errors, AIUI (per T248632: Implement new signature requirements). All linter errors includes some things that were deliberately excluded in the other task, including obsolete elements and font color mismatch as examples.

Sunpriat2 added a subscriber: Sunpriat2.

@Izno @Jonesey95 I updated the patch to include a config variable $wgSignatureAllowedLintErrors. It defaults to [ 'obsolete-tag' ] (which allows <font> tags etc.), and you can file a task to change this config to an empty array [] whenever the community of English Wikipedia (or any other wiki) agrees to that.

Change 569062 merged by jenkins-bot:
[mediawiki/core@master] preferences: Signature validation (lint errors, user links, nested subst)

https://gerrit.wikimedia.org/r/569062

ppelberg moved this task from Inbox to High Priority on the Editing QA board.Jul 2 2020, 9:56 PM
ppelberg updated the task description. (Show Details)Jul 7 2020, 8:58 PM
ppelberg added a subscriber: Ryasmeen.

Like in T237700#6286867, @matmarex can you:

  1. REVIEW the "Testing details" section of the task description to ensure it is accurate and exhaustive?

cc @Ryasmeen

matmarex updated the task description. (Show Details)Jul 8 2020, 10:26 PM

I added some details and another case.

The error message "Invalid raw signature. Check HTML tags." actually isn't correct. It comes from an old validator (which was only able to report few errors, e.g. it wouldn't warn in scenario B). I basically put them in the wrong order, the errors from the new validator should be preferred. I'll submit a patch for this.

The expected error is "Your signature contains invalid or deprecated HTML syntax", followed by some more details about the problem.

Change 610399 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/core@master] preferences: Prefer new kind of warnings about invalid HTML

https://gerrit.wikimedia.org/r/610399

ppelberg added a comment.EditedJul 10 2020, 9:03 PM

I added some details and another case.

Excellent. Thank you for adding.

The error message "Invalid raw signature. Check HTML tags." actually isn't correct...

Understood. @matmarex: Can you please add the Editing QA tag back to this ticket once it [i] is ready for testing.


i. https://gerrit.wikimedia.org/r/610399

Change 610399 merged by jenkins-bot:
[mediawiki/core@master] preferences: Prefer new kind of warnings about invalid HTML

https://gerrit.wikimedia.org/r/610399

matmarex moved this task from Code Review to QA on the Editing-team (FY2020-21 Kanban Board) board.

(the fix should be on the Beta cluster now)

(the fix should be on the Beta cluster now)

Checked the fix.

It is now showing the error message: "Your signature contains invalid or deprecated HTML syntax".
However, there are now three "Show error location" buttons; clicking on the first and the third one selects text from the first <small> tag to the end <small> tag. But clicking on the second one selects only the end <small> tag.

So two questions:

a. Should there be multiple "Show error location" buttons in this case?
b. Should clicking on them select the same part of the text?

Jidanni removed a subscriber: Jidanni.Wed, Jul 15, 2:14 PM

(the fix should be on the Beta cluster now)

Checked the fix.

It is now showing the error message: "Your signature contains invalid or deprecated HTML syntax".
However, there are now three "Show error location" buttons; clicking on the first and the third one selects text from the first <small> tag to the end <small> tag. But clicking on the second one selects only the end <small> tag.

So two questions:

a. Should there be multiple "Show error location" buttons in this case?
b. Should clicking on them select the same part of the text?

Prompted by the observation @Ryasmeen made above, an additional follow up question:

  • @matmarex: what can we currently know about the signature a person sets in this case? The current treatment [i] leads me to think we can know the following and I'm not sure the extent to which the below is an accurate at exhaustive list:
    • 1. The HTML tag(s) that is being used
    • 2. Whether said tag(s) are opened/closed correctly
    • 3. TBD

I ask the above wondering what constraints we are bound by as the current implementation [i] leads me to think the people who will encounter it will be confused by the following elements which Rummana alluded to in T140606#6306653:

  • The same error message is shown twice: Missing end tag: small | Learn more | Show error location
  • Two slightly different error messages are shown for the same issue: Multiple unclosed formatting tags on the page| Learn more | Show error location
  • Clicking the Show error location selects different parts of the text

i.

In my experience (finding and fixing hundreds of thousands of Linter errors on en.WP), this is just how the current Linter works. If there are multiple errors, it does not always highlight the correct or complete string, and it does not always correctly identify the error types. It's a fundamental challenge with parsing invalid syntax: it's invalid, so it is difficult to write an algorithm to interpret its invalidity correctly every time.

File a separate bug on Linter if you want, but please don't hold up this deployment based on a well known limitation of the Linter engine. It has been this way forever. There aren't many active signatures with this level of invalid syntax, since we have already had four years of waiting for this phab task to find and fix the really bad ones. Don't lose momentum now!

Here is Parsoid's lint output on that snippet:

[subbu@earth:~/work/wmf/parsoid] echo "[[User:Ppelberg-test]]<small>[[User talk:PPelberg (WMF)]]<small>" | php bin/parse.php --linting > /dev/null
{"type":"missing-end-tag","dsr":[22,64,7,0],"params":{"name":"small","inTable":false}}
{"type":"missing-end-tag","dsr":[57,64,7,0],"params":{"name":"small","inTable":false}}
{"type":"multiple-unclosed-formatting-tags","params":{"name":"small","inTable":false},"dsr":[22,64,7,0]}

The signature does have 2 unclosed small tags and the source ranges are correct. For the 3rd error, the range is also correct. Strictly speaking, Parsoid's Linter can be smarter and suppress the first 2 errors since the 3rd error effectively subsumes those. I can tweak that, but FWIW, given what Parsoid produces right now, the UI seems to do the right thing (based on my testing your invalid sig on the beta cluster).

It is now showing the error message: "Your signature contains invalid or deprecated HTML syntax".
However, there are now three "Show error location" buttons; clicking on the first and the third one selects text from the first <small> tag to the end <small> tag. But clicking on the second one selects only the end <small> tag.

So two questions:

a. Should there be multiple "Show error location" buttons in this case?
b. Should clicking on them select the same part of the text?

@Ryasmeen @ppelberg:

  • It's expected that there are two "Missing end tag: small" errors and multiple "Show error location" buttons – there is one error for each of the two opening <small> tags, and a button for each error, and clicking "Show error location" highlights the text from the respective opening tag to the end.

    In this scenario, an single error message that explicitly says you need to add a slash / in the closing tag would be better. But in other similar scenarios, the correct fix might be to add two closing tags </small></small> at the end. I agree with @Jonesey95 here: I don't think it's possible to programmatically guess what the user meant.
  • It's… not really expected that there is also a "Multiple unclosed formatting tags on the page: small" error – however I didn't think this redundancy is a big problem, and as @ssastry says, this is something where we just depend on the results from Parsoid.

    I wonder now if we should maybe just hide this type of errors, as it seems that they always appear together with "Missing end tag"?

Prompted by the observation @Ryasmeen made above, an additional follow up question:

  • @matmarex: what can we currently know about the signature a person sets in this case?

We have a list of errors. For each error, we know the type (e.g. "missing-end-tag", see list here: https://www.mediawiki.org/wiki/Help:Lint_errors), the location (position of the first character and last character – the "Show error location" buttons highlight this range), and extra details (usually a tag name – this is the part shown in preformatted font).

Subbu pasted an example of this in his comment above.

I'm not sure if this answers your question?

I ask the above wondering what constraints we are bound by as the current implementation [i] leads me to think the people who will encounter it will be confused by the following elements which Rummana alluded to in T140606#6306653:

  • The same error message is shown twice: Missing end tag: small | Learn more | Show error location
  • Two slightly different error messages are shown for the same issue: Multiple unclosed formatting tags on the page| Learn more | Show error location
  • Clicking the Show error location selects different parts of the text

I wonder if it would help to change the list from bulleted to numbered, to make it clear that the two "Missing end tag" errors are actually separate errors and not some accidental duplication?

bearND removed a subscriber: bearND.Wed, Jul 22, 10:42 PM
Ryasmeen edited projects, added Verified; removed Editing QA.Wed, Jul 29, 1:11 AM

@ssastry + @matmarex: thank you both for looking into and explaining how the output is generated.

A couple resulting questions for @matmarex [I think]:

  • 1. Why does the top-most Missing end tag: small... error's Show error location button highlight the entire range [i] rather than the precise tag that's missing a corresponding end tag as the the middle Missing end tag: small... error's Show error location button does? [ii]
  • 2. Is it possible to suppress the last Multiple unclosed formatting tags on the page error rather than the first two? [iii]
    • Thinking: the error messages people see should be related as explicitly and precisely as possible to the cause of said error message.
  • 3. Within the Missing end tag: small error, is it possible to replace the small text with </small>?
    • Thinking: the error message should provide people explicit instructions about what they need to do to resolve the error.

⚠️Note: the answers to the above would inform follow on work to this ticket. I share this so as not to suggest this ticket is anyway blocked on the above.


i. Top-most Missing end tag: small...
ii. Middle Missing end tag: small...

iii. T140606#6314356

These questions should be in a completely separate ticket that aims to improve the functioning of the Linter engine, or whatever it is called. They are out of scope for this ticket.

If you paste the invalid Ppelberg-test signature into a sandbox and enable LintHint.js, you will see the same three Linter error descriptions, and they will highlight the same three blocks of text. The error descriptions and highlighting are features of (or bugs in, depending on your POV) the Linter engine and are completely independent of this signature validation process, AFAIK.

ppelberg closed this task as Resolved.Mon, Aug 3, 11:14 PM

These questions should be in a completely separate ticket that aims to improve the functioning of the Linter engine, or whatever it is called. They are out of scope for this ticket.

Good call. Here is that ticket: T259570.

Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptMon, Aug 3, 11:15 PM