Page MenuHomePhabricator

Investigation "extends attribute": Review previous implementation
Closed, ResolvedPublic

Description

Review implementation (incl. possible other options explored previously for being able to add a page number, chapter, etc to a reused reference) in light of any parser changes, new knowledge, or any other changes - is it still the most feasible and advisable approach? What are potential benefits/drawbacks?

https://de.wikipedia.org/wiki/Wikipedia:Technische_W%C3%BCnsche/Topw%C3%BCnsche/Erweiterung_der_Einzelnachweise

Event Timeline

lilients_WMDE renamed this task from Review implementation of extends attribute to Investigation "extends attribute": Review previous implementation.Apr 5 2023, 10:10 AM

I would like to lay out the chain of logical arguments that led to the decision to name the attribute <ref extends="…">. Unfortunately this was discussed in several places over the course of a few years. There are several Phabricator tickets where parts have been discussed, but never a full list.

We can split the discussion and discuss two questions separately:

  1. How the information should be structurally represented in wikitext.
  2. How a new attribute should be named.

The following list is about the first question. To make this separation more visible I use NEW_ATTRIBUTE as a placeholder and mention actual names only as examples.

1. Just add a page attribute

This is the oldest idea users actually asked for in multiple places. It appears to be trivial:

<ref name="book">the book</ref>
<ref name="book" NEW_ATTRIBUTE="page 3" />

This does have several problems:

  • It's hard to find a good attribute name that's not to specific. For example, it can't be page="3" because not everything does have pages – the bible being a famous example.
  • While it's technically possible to allow templates and other wikitext in the attribute (as done in e.g. <mapframe text="…">) it's bad design:
    • It's uncommon. There is no other XML style language (e.g. HTML) where attributes can recursively contain code again.
    • It causes confusing encoding issues. The content between the double quotes must be encoded in a different way than in any other wikitext context. For example, what if you need a double quote?
    • This is especially a problem because the wikitext parser is not an actual XML parser and behaves surprisingly different in such situations. Some wikitext features are straight up impossible in an attribute, or extremely error-prone.
    • This means only a subset of wikitext features can be used in an attribute. That's ok for a map caption but probably not for references.
    • It requires a separate, effectively recursive call to the parser. This is bad for performance and overall complexity.

In the discussion back then it was clear this should be avoided.

2. Content inside <ref>…</ref>

The problems above go away when we put the content where it is in a normal reference:

<ref name="book">the book</ref>
<ref name="book" NEW_ATTRIBUTE>page 3 (can be wikitext)</ref>

Now both <ref> behave the same. Both can use the same wikitext features. Both use the same syntax, encoding, and so on.

Note this new attribute doesn't have a value. It's only a marker that says "this is a sub reference that belongs to a named parent". It could be something like <ref name="book" at>page 3</ref> or possibly even <subref name="book">page 3</ref>.

New problem: It's not possible to reuse the sub reference. It can't have a name because the name attribute is already used to point back to the parent.

One way to solve this is to live with it and disallow reusing sub references. This was discussed back then but rejected. It was made a requirement to be able to reuse sub references.

3. subname

We could add a new attribute to name sub references:

<ref name="book">the book</ref>
<ref name="book" NEW_ATTRIBUTE="p3">page 3</ref>

This causes new problems:

  • It's a second syntax for a feature that already exists. That's confusing.
  • The name="…" attribute does have a different meaning when it appears in a sub reference. It makes the sub reference behave more like a reuse of the parent – which it kind of is, but isn't the same time.
  • It's generally hard to tell what a sub reference is.
  • The sub reference must be named. Otherwise there would be no way to even tell that it is a sub reference.
  • What if I want to reuse the sub reference? Should it be <ref name="p3" /> as if it's a normal reference? What was the point of the new attribute then? Or continue with <ref name="book" NEW_ATTRIBUTE="p3" /> similar to how the group attribute behaves?
  • What if we want to combine this with the existing group attribute? Do we need to write <ref name="book" group="group2" NEW_ATTRIBUTE="p3" /> to reuse a sub reference? That's … long.

Therefor:

4. extends
<ref name="book">the book</ref>
<ref NEW_ATTRIBUTE="book">page 3</ref>

The problems described above are gone. We use the name="…" attribute to be able to give the sub reference a new name, as if it's a normal reference. We can combine it with the group attribute. It also behaves almost identical to the existing follow="…" attribute.

The name was discussed and we landed on extends.

FAQ

What about a new tag in addition to <ref>?

That's effectively covered by point 2 above. The only additional information a new tag (e.g. <subref>) conveys is 1 bit: if it's either one or the other tag. That alone is not enough. We need to mention the name of the parent. These two requirements combined lead to one of these:

  1. <subref name="parent">: "name" is an existing attribute and doesn't identify the sub-reference as such. The new tag name does.
  2. <ref name="parent" subref>: As above, but we use an additional attribute with no value to identify the sub-reference as such.
  3. <ref subref="parent">: The new attribute combines both functions here. It identifies the sub-reference as such and links to the parent.

The main problems with a new tag are:

  • How to reuse a sub-reference?
    • When it stays a <subref name="…" />, why does that difference need to be emphasized? Shouldn't it behave like any other reference?
    • When we switch back to <ref name="…" />, what was the point of switching from <ref> to <subref> in the first place? Even more important, a <ref name="…" /> that points to a non-existing <ref> (because you somehow have to know that it points to a <subref>) would also heavily confuse a lot of existing tools.
  • A huge amount of tools, gadgets, etc. works with <ref> tags. A new tag would not work anywhere, even be marked as an error in some cases. On the other hand, unknown attributes are typically ignored. Reusing the existing <ref> tag means most tools will continue to work with no or only minor adjustments. For example, just counting <ref> works exactly as before. Tools that display references in some way mostly work out of the box – it's just that sub-references appear incomplete. This will be easy to spot in most cases (at least much more obvious than a new tag that doesn't appear anywhere) and typically easy to fix.
Why not have the content in an attribute?

The problems with this are explained in point 1 above. But yes, we can discuss this again. An attribute also has some benefits.

Why extends?

Numbering

The choice to number subreferences like "1.1" is not entirely satisfactory. Some issues:

  • Conflicts with default numbering for ref reuse (most wikis seem to override this). For example, this shows two normal reuses in blue and an indented subreference with the same number:

image.png (62×201 px, 3 KB)

  • Conflicts with customized numbering, for example "a.1" and "a.2"
  • Subref reuse is another level of bad, and currently leads to numbering like "1.1.0" and "1.1.1", or worse.