Page MenuHomePhabricator

Proposal for Parsoid Cite error html generation
Closed, ResolvedPublic

Description

To handle a variety of Cite errors, we propose the following:

  • All citations will use <sup> tags and those in error include mw:Error metadata annotation in the typeof attribute and a data-mw='{"errors":[{"key":"cite_error_ref_no_input"}]}. This example includes the i18n error key of cite_error_ref_no_input which the client would look up for the actual english or other language error message.
  • We are recommending that all error messages show up in the references section, not in the actual article text. This is in contrast to core where if the error happens while processing refs, it's inlined. If the error happens when building the references list, the error is inserted there. That's seems to be a by-product of the implementation, and is not necessarily achieving the goal of the best user experience.

There are benefits to having the errors inline, like where the problem occurs is obvious and more likely to be fixed, and broken citations don't give a sentence more weight in the reader's eyes, but we feel these can just as easily be accomplished by CSS styling based on the mw:Error annotation.

Continuing to output <sup> tags even for erroneous refs means that current clients will likely not need any changes other than doing what they want with the mw:Error and so no major version bump for Parsoid's HTML would be necessary.

Also I don't expect to need to use "mw:Placeholder" as VE and other clients should be able to just ignore a citation without actual content and Selser will round trip the error correctly, and not cause the error text to become the content, while still allowing an editor to edit the content at a point of citation which will then replace the error message.

Implementation details:

This will require for some error types, saving additional data to allow mw:Error to be inserted at the point of citation if the error could not be determined during the first pass. We can collect pointers to potential errors on the first pass and release them when they're proven not to be errors. At the end there is a concise collection to annotate.
An example where this can happen are named refs with no content and no other identically named ref found on a page with content.

An example of correct citation inline and references html output for comparison to proposed incorrect citation missing content/text in result output below

zero<ref name="one">two</ref>
<references />

<p>zero<sup about="#mwt3" class="mw-ref" id="cite_ref-one_1-0" rel="dc:references" typeof="mw:Extension/ref" data-mw='{"name":"ref","attrs":{"name":"one"},"body":{"id":"mw-reference-text-cite_note-one-1"}}'><a href="./index.php%3Ftitle=CiteTest#cite_note-one-1" style="counter-reset: mw-Ref 1;"><span class="mw-reflink-text">[1]</span></a></sup></p>

<div class="mw-references-wrap" typeof="mw:Extension/references" about="#mwt6" data-mw='{"name":"references","attrs":{}}'><ol class="mw-references references" data-parsoid="{}"><li about="#cite_note-one-1" id="cite_note-one-1" data-parsoid="{}"><a href="./index.php%3Ftitle=CiteTest#cite_ref-one_1-0" rel="mw:referencedBy"><span class="mw-linkback-text">↑ </span></a> <span id="mw-reference-text-cite_note-one-1" class="mw-reference-text">two</span></li></ol></div>

Incorrect citation inline (no content/text) produces an editable citation inline, but no inline error message, which is located in the references text. In this case the i18n error message key cite_error_references_no_text is included in the data-mw errors property.

zero<ref name="one"></ref>
<references />

<p>zero<sup about="#mwt3" class="mw-ref" id="cite_ref-1" rel="dc:references" typeof="mw:Extension/ref mw:Error" data-mw='{"errors":[{"key":"cite_error_ref_no_text"}],"name":"ref","attrs":{},"body":{"id":"mw-reference-text-cite_note-1"}}'><a href="./Main_Page#cite_note-1" style="counter-reset: mw-Ref 1;" data-parsoid="{}"><span class="mw-reflink-text" data-parsoid="{}">[1]</span></a></sup></p>

<div class="mw-references-wrap" typeof="mw:Extension/references" about="#mwt6" data-mw='{"name":"references","attrs":{}}'><ol class="mw-references references" data-parsoid="{}"><li about="#cite_note-1" id="cite_note-1"><a href="./Main_Page#cite_ref-1" rel="mw:referencedBy"><span class="mw-linkback-text">↑ </span></a> <span id="mw-reference-text-cite_note-1" class="mw-reference-text"></span></li></ol></div>

Event Timeline

Implementation details:

requires a second pass if an error has been found that requires mw:Error to be inserted at the point of citation which could not be determined during the first pass.

Or collect pointers to would be errors on the first pass and release them when they're proven not to be. Then at the end you'll have a collection to annotate.

Note that this is in contrast to core where if the error happens while processing refs, it's inlined.
If the error happens when building the references list, the error is inserted there.
That's a by-product of the implementation, and not necessarily a goal.

The caveat being that Parsoid's numbering will continue to be off on some pages with citation errors, but that seems an acceptable difference.

Arlolra triaged this task as Medium priority.May 5 2020, 6:35 PM
Arlolra moved this task from Needs Triage to Missing Functionality on the Parsoid board.

I think this can work. Nice work putting it all together!

Couple of high-level comments:

data-mw should always be present
As currently proposed, there is no data-mw object added when there is an error. However, I think we should always include a data-mw, and in addition, also include an errors property. See https://www.mediawiki.org/wiki/Specs/HTML/2.1.0#Error_handling for example for error annotation on images. But, this errors property would be in addition to any other information that would still be useful. So, to use your working example in the description,

zero<ref name="one"></ref>
<references />

I think something like this would be better:

<p>zero<sup about="#mwt3" class="mw-ref" id="cite_ref-one_1-0" rel="dc:references" typeof="mw:Error mw:Extension/ref" data-mw='{"errors": [...ERROR KEYS TO BE DECIDED...], "name":"ref","attrs":{"name":"one"},"body":{"id":"mw-reference-text-cite_note-one-1"}}'><a href="./index.php%3Ftitle=CiteTest#cite_note-one-1" style="counter-reset: mw-Ref 1;"><span class="mw-reflink-text">[1]</span></a></sup></p>

<div class="mw-references-wrap" typeof="mw:Extension/references" about="#mwt6" data-mw='{"name":"references","attrs":{}}'><ol class="mw-references references" data-parsoid="{}"><li about="#cite_note-one-1" id="cite_note-one-1" data-parsoid="{}"><a href="./index.php%3Ftitle=CiteTest#cite_ref-one_1-0" rel="mw:referencedBy"><span class="mw-linkback-text">↑ </span></a> <span id="mw-reference-text-cite_note-one-1" class="error mw-ext-cite-error" lang="en" dir="ltr">Cite error: Invalid <code>&lt;ref&gt;</code> tag; no text was provided for refs named <code>one</code></span></li></ol></div>

So, in other words, the only difference with regular output is (a) add a "errors" key to the existing data-mw (b) replace (or alteratively add) the "error mw-ext-cite-error" classes to the ref content span in the references section (c) The actual ref content is replaced with the error message. We should figure out what those error keys are going to be. I don't have any immediate thoughts there. It depends on what we find as we work through all the error scenarios. But, we could include the i18n message key, for example, and possibly duplicate the message as well. This lets clients (reading / editing) inspect data-mw of nodes with mw:Error types and do special things globally for them. Or, however they want to do it. Doing this lets us specify a common / uniform format for all erroneous wikitext constructs. See the Parsoid DOM spec section I linked to earlier.

Optional use of mw:Placeholder to let us roll out these changes independent of editing clients

As indicated in https://www.mediawiki.org/wiki/Specs/HTML/2.1.0#Expectations_of_editing_clients, adding a "mw:Placeholder" typeof would ensure that editing clients like VE simply ignore structures they aren't (yet) familiar with and let them upgrade their code on their own timeline without Parsoid and VE to synchronize their development. This facility was agreed upon in the early days of Parsoid & VE development for this purpose.

So, to support this, you would add "mw:Error mw:Placeholder" to <sup> tags that represent erroneous citations.

This comment was removed by Sbailey.
Sbailey updated the task description. (Show Details)
Sbailey updated the task description. (Show Details)

I'd strongly suggest using the i18n key as the primary id for errors. We can also add other keys/parameters if they turn out to be useful (probably driven by what UX VE would like to show), but the minimum would be the i18n key, which is already "standardized", more or less, and is machine-readable by VE etc in a way that the localized strings would not be.

I think we should document typeof="mw:Error" and data-mw="{errors: [ { key: 'ext-cite-bad-ref', ... }, { key: 'gallery-bad-foo' }, ... ]}" as a separate "Reporting Errors" section in the spec, then we can just reference this from the part discussion Cite (and Gallery and media markup, etc etc).

I think we might consider compromise re "putting all errors in the references section" -- we may want the <sup> to have text "error" or something, rather than just a number. I think that will make the error more noticable, so that an author or editor is sure to see it --- the references section can be pretty far down the page, and link colors are so inconsistent on the web that I'm not sure that "just" making the citation number (say) red/bold is going to be a clear enough signal that the citation has a bug. I could be convinced otherwise, though: text insertion via CSS a :before matcher might yield the same result, and that would move the localization to the CSS and out of the parsoid output, which would be Good.

So in terms of writing the spec, I think I'd just say that the structure of the <ref> is consistent regardless of whether it is an error or not, and the about and other attributes will allow you to link it to the appropriate entry in the <references/> list, but that the user should *not* make any assumptions about the *content* of the <sup> tag if there's an error; we reserve the right to add localized markup inside describing the error (or not).

"and the english string is installed in the reference itself by using a snapshot of cores Cite/i18n/en.json strings indexed by the appropriate error message key." I don't think this needs to be in the spec. As soon as the Cite code is moved back into its own extension it will be able to use the MW localization framework to insert the "proper" text. Using the English strings is just a short term hack (and one that's likely to annoy non-enwiki users, so hopefully a very short term hack).

Sorta off-topic: phab's formatting for source code is really terrible. The HTML examples will be much more readable on-wiki, where <syntaxhighlight> is available, and we can/should add whitespace as necessary to make the HTML structure clear.

Before I update the actual proposal, I am experimenting with changes to the "ref" and "reference" html generated.
The ref now does not include the class="error mw-ext-cite-error" and is set to the normal class="mw-ref" allowing normal display.
But it does include the data-mw={"errors":[{"key":{"cite_error_ref_no_input"}]} so code that cares can know there is an error and the i18n key is there to allow message text to be displayed internationally compliant.

Updated the specification to match recent discussion and the current implementation of patch https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/584746 which addresses aspects of Phab ticket T51538

Both ref and reference will include a data-mw='{"errors":[{"key":"cite_error_ref_no_input"}]}.

Should be just the ref

@Sbailey, can you please update the wiki HTML spec page ( https://www.mediawiki.org/wiki/Specs/HTML/2.1.0/Extensions/Cite ) with a concise spec with examples? Thanks!

Those wiki spec pages are about the output HTML, so implementation details and other internal considerations can be skipped there.

I am going to consider this done. We can always tweak the error representation based on feedback.