Page MenuHomePhabricator

CX2: Error for invalid characters in the title
Closed, ResolvedPublic

Description

As part of the system to communicate errors and warnings in Content Translation (T189488), we want to surface issues such as the title containing invalid characters as illustrated below:

  • We'll show the error after the user edited the title (or initially if the user already loads a translation with such problem).
  • The publish button will be disabled while the issue remains.
  • An error marker will be shown next to the title.
  • The error details will include a "learn more" pointing to a documentation page about titles (if available in the language),
  • A "Fix problematic characters" action will transform the invalid title into a valid one (this changes the "Remove invalid characters" feature from the mockup)

In the future, following feature can be considered as extra:

  • The problematic characters will be highlighted when the input focus is on the title.

Message text:

Your translation title contains invalid characters
Some characters cannot be used in the title of pages for technical reasons.
Please, edit the title to avoid the problematic characters.


Currently, editors trying to publish a translation with an invalid title get the following message:

Event Timeline

Pginer-WMF triaged this task as Normal priority.
Pginer-WMF renamed this task from CX2: Error for invalid titles to CX2: Error for invalid characters in the title.
Pginer-WMF lowered the priority of this task from Normal to Low.
Pginer-WMF updated the task description. (Show Details)Mar 27 2018, 12:06 PM
Pginer-WMF updated the task description. (Show Details)Apr 23 2018, 11:56 AM

An Error is good. But it must prevent the user from creating a foreign language title in destination wiki.

An Error is good. But it must prevent the user from creating a foreign language title in destination wiki.

That's the idea. In this context we distinguish errors from warnings. Errors prevent from publishing. Warnings ask you to review your content but let publish if you are confident the content is good.

(Note for testing)

[CX] Publishing failed {"code":"invalidtitle","info":"Bad title \"\".","*":"See http://cx2-testing.wmflabs.org/api.php for API usage.
  • check for Console errors, e.g.

[CX] Publishing failed {"code":"invalidtitle","info":"Bad title \"\".","*":"See http://cx2-testing.wmflabs.org/api.php for API usage.

When you have invalid characters in your target title and try publishing, that fails because article with invalid title cannot be created. That failure is caught in our code and error is visually displayed to the user. In addition to informing the user about the error, there is also a log in the console happening. And, it's just that, a log about the error which is correctly caught and processed, rather than unexpected error that occurred.

I like the idea of Expressive error than unexpected error that occurred :).

Change 432540 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] [WIP] Connect issue card and target title

https://gerrit.wikimedia.org/r/432540

Change 433022 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/core@master] Expose regex for invalid title characters from mw.Title

https://gerrit.wikimedia.org/r/433022

<textarea> which is used for both source and target titles is not supporting highlighting individual characters. There are some tricky ways to achieve this, with introducing new elements which sit behind the text area and mimic the content, providing a colored mask to highlight characters. Such tricks are very fragile. Since we will be introducing new elements, which will essentially have the same content as <textarea>, we might as well drop the usage of <textarea> altogether.

Because of that, highlighting problematic characters isn't done as part of 433022 and will be done separately.

<textarea> which is used for both source and target titles is not supporting highlighting individual characters. There are some tricky ways to achieve this, with introducing new elements which sit behind the text area and mimic the content, providing a colored mask to highlight characters. Such tricks are very fragile. Since we will be introducing new elements, which will essentially have the same content as <textarea>, we might as well drop the usage of <textarea> altogether.

Because of that, highlighting problematic characters isn't done as part of 433022 and will be done separately.

I think we can skip the part of highlighting the problematic parts for this ticket. Based on your analysis, the title area will be an exception and it is better to spend efforts with the highlight on regular paragraphs, which can be applied to a greater number of errors than for the title.

I think we can skip the part of highlighting the problematic parts for this ticket. Based on your analysis, the title area will be an exception and it is better to spend efforts with the highlight on regular paragraphs, which can be applied to a greater number of errors than for the title.

Your comment looks to me like you think marker which indicates problems with the title will be a problem, where it's actually highlighting the specific characters which make the title invalid.

Here is the screenshot to better illustrate current state:

I think we can skip the part of highlighting the problematic parts for this ticket. Based on your analysis, the title area will be an exception and it is better to spend efforts with the highlight on regular paragraphs, which can be applied to a greater number of errors than for the title.

Your comment looks to me like you think marker which indicates problems with the title will be a problem, where it's actually highlighting the specific characters which make the title invalid.

Here is the screenshot to better illustrate current state:

Thanks for the clarification. "highlighting the problematic parts" was ambiguous in my message. Highlighting the paragraph is part of the basics, while highlighting the specific characters can be considered an extra to consider in the future.

For reference, this is the message that Upload Wizard provides:

Change 433022 abandoned by Petar.petkovic:
Expose regex for invalid title characters from mw.Title

https://gerrit.wikimedia.org/r/433022

There are two major cases when an invalid page title may be entered

  • when creating a page -not possible and no feedback to a user (there won't be that helpful link to create a page, e.g. "Create the page "Test1987" on this wiki!")
  • when moving a page to a page with a new invalid title - there is a lengthy explanation:

Change 432540 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Connect issue card and target title

https://gerrit.wikimedia.org/r/432540

Etonkovidova added a comment.EditedJun 12 2018, 1:20 AM

Checked in cx2-testing - there is no discrepancy in treating invalid page titles in ContentTranslation and with page creation/move.
@Petar.petkovic, @Pginer-WMF - do you think that the some of the following should be fixed as a part of this ticket?

The following is the list of my note (in order of severity) - all quotes from documentation on https://en.wikipedia.org/wiki/Wikipedia:Page_name#Technical_restrictions_and_limitations:

(1) "Fix problematic characters" throws the Console error: "Uncaught TypeError: Cannot read property 'title' of null" for the case

A pagename cannot be . or ..; or begin with ./ or ../; or contain /./ or /../; or end with /. or /...

(2) The 1 error counter - could it show more than one error? I tried the cases when there is more than one error in one-word title, or several errors in more-than-one-word titles - the error counter does not update the number of errors found. If it refers to the title as a whole, it should not refer to 1 error.

(3) The following title Wikipedia:Mavetuna will not be caught with the new system:

(4) Are we going to catch the titles with characters that are silently dropped?
e.g. :Mavetuna will be published as Mavetuna (the colon is dropped, according with the following restriction:

A pagename cannot begin with a colon :.

(5) (out of scope of this task) # is a valid character. Even more, having # in the page title will make other invalid chars "valid", so the page titles # < > [ ] | { } _ is valid.

The documentation pages needs to be updated - https://en.wikipedia.org/wiki/Wikipedia:Page_name#Technical_restrictions_and_limitations and https://meta.wikimedia.org/wiki/Help:Page_name#Restrictions have slightly different list of restricted characters that are not allowed in the page title.

e.g.
"H#e|ge|||||||||||||||||" (no quotes) seems to be valid and redirects to http://cx2-testing.wmflabs.org/index.php/H#e.7Cge.7C.7C.7C.7C.7C.7C.7C.7C.7C.7C.7C.7C.7C.7C.7C.7C.7C (discarding all characters after H):

Thank you @Etonkovidova for doing such an awesome QA.

@Petar.petkovic, @Pginer-WMF - do you think that some of the following should be fixed as a part of this ticket?

Yes, at least (1) should be done as part of this ticket. (3) should be caught as well.

(4) is probably transformed in the backend and (5) is valid, but question whether we should allow that.

(2) The 1 error counter - could it show more than one error? I tried the cases when there is more than one error in one-word title, or several errors in more-than-one-word titles - the error counter does not update the number of errors found. If it refers to the title as a whole, it should not refer to 1 error.

Currently, the issues are tied to sections or title which have a problem. Issues card should always be shown and all warnings and errors on a page will be displayed with possibility to carousel through the errors, as per design in T189488. If title is empty, that will be an error and will be added to the issues card. If title contains one or more invalid characters, that is also one error, which will populate issue card. Some of the translated paragraphs can have some other errors/warnings and only then will the card show more than one error. More than one invalid characters doesn't add up to the number of errors displayed.

Change 440115 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Don't offer to fix unfixable titles

https://gerrit.wikimedia.org/r/440115

Change 440115 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Don't offer to fix unfixable titles

https://gerrit.wikimedia.org/r/440115

Petar.petkovic moved this task from In Review to QA on the Language-2018-Apr-June board.
Petar.petkovic removed a subscriber: gerritbot.

With the last patch merged, (1) from T190804#4274325 is done.

What we should do in the case of (3) as well as for (4) and (5) is up to @Pginer-WMF.

Checked (1) and (3) in cx2-testing - all is fine.

There are few cases from https://en.wikipedia.org/wiki/Wikipedia:Page_name#Technical_restrictions_and_limitations that do not work as expected in cx2-testing. I've started a new ticket - T198147: CX2 - Validation for page title (and added (4) and (5) there) to track it.

Etonkovidova closed this task as Resolved.Jun 25 2018, 10:18 PM