Page MenuHomePhabricator

Pages should have metadata about their correct title (capitalisation, special characters, etc)
Open, LowPublic

Description

As discussed at bug 50452 comment 9 it would be very useful for a page to have associated metadata about its title.

For most pages and by default* this would be identical to the all-lowercase rendering of its stored title, but where pages correctly start with a lowercase letter (on wikis where the first character is case insensitive), contain special characters or conflict with interwikis/namespaces, etc. the metadata would record the correct page title. It could also be used for titles that should be italicised.
e.g. on en.wp:
Page Title - Correct Title
[[IPad]] - iPad
[[Benzo(a)pyrene]] - Benzo[a]pyrene
[[Pilot No. 5]] - Pilot #5
[[D Ream]] - D:Ream
[[Computer]] - computer [indicating a common noun]
[[Amy Studt]] - Amy Studt [indicating the title is proper noun]
[[Animal Farm]] - <i>Animal Farm</i> [italicised proper noun, the tags are not literal]

The displayed title would be taken from this metadata and as such would effectively supercede the DISPLAYTITLE magic word and the [[template:correct title]] family of templates (and the equivalents on other wikis).

VisualEditor and any other tools that aid linking would be able to read the metadata and use it to determine what the default displayed name should be when linking to that article.

*Actually the default should be configurable as I guess languages like German that capitalise common nouns may want the default to be capitalised and the upper/lower case distinction probably doesn't make sense in all scripts.


Version: 1.22.0
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=50452
https://bugzilla.wikimedia.org/show_bug.cgi?id=56868
https://bugzilla.wikimedia.org/show_bug.cgi?id=49076

Details

Reference
bz53566

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:05 AM
bzimport added a project: MediaWiki-General.
bzimport set Reference to bz53566.
bzimport added a subscriber: Unknown Object (MLST).

I'm confused. Is this asking to move {{DISPLAYTITLE:...}} out of wikitext?

Displaytitle is a form of metadata, and it gets stored in the db separately, and can be queried separately, etc.

According to James F at bug 50452 comment 9 "for this to be used in VisualEditor it would need to be a proper feature and not one hacked into a template"

The idea is not just to store what the title should be displayed as when it doesn't match the default, but to record what the title actually is in all cases. Presently there is afaik no way to tell that for example [[Parish ale]] is a common noun, [[Parish Walk]] is a proper noun and [[Parish Bar]] is a proper noun that should be italicised.

(In reply to comment #1)

I'm confused. Is this asking to move {{DISPLAYTITLE:...}} out of wikitext?
Displaytitle is a form of metadata, and it gets stored in the db separately,
and can be queried separately, etc.

Chris has it right - it's effectively asking to implement DISPLAYTITLE properly, rather than DISPLAYTITLEIFNEEDEDTOBEANOVERRIDE.

The idea is not just to store what the title should be displayed as when it
doesn't match the default, but to record what the title actually is in all
cases. Presently there is afaik no way to tell that for example [[Parish
ale]]
is a common noun, [[Parish Walk]] is a proper noun and [[Parish Bar]] is a
proper noun that should be italicised.

I don't think I properly understand the problem this is trying to solve. To be honest it sounds like a solution looking for a problem.

Basically what I'm asking:
*Does it really make sense to store what type of word the title is, instead of just how to display it. Mapping types of words -> how to display them sounds like something that would vary a lot by culture (Or even in english wikis that have different traditions)
*What problem (Other than perhaps ideological) does moving the data out of templates actually solve?

(In reply to comment #4)

The idea is not just to store what the title should be displayed as when it
doesn't match the default, but to record what the title actually is in all
cases. Presently there is afaik no way to tell that for example [[Parish
ale]]
is a common noun, [[Parish Walk]] is a proper noun and [[Parish Bar]] is a
proper noun that should be italicised.

*Does it really make sense to store what type of word the title is, instead
of just how to display it.

Sorry that was my poor explanation. For tools (such as VisualEditor) to offer a sensible default for the display of a link the tool needs to know how the title should be displayed. In mid sentence:
*The vicar drank some [[parish ale]] and declared it "rather good"
*The vicar competed in the 2013 [[Parish Walk]], raising money for the church roof.
*The vicar enjoyed listening to the ''[[Parish Bar]]'' album while driving.

At present only the last of these has need of a {{DISPLAYTITLE}} because the unitalicised format with an initial capital letter is correct when it appears as a page title. This tells us nothing about how it should be used in other contexts, and the DISPLAYTITLE for the third example tells us nothing about capitalisation mid-sentence. In other words we want to store the information about how the title should be used in all cases, not just when it doesn't match the default.

It seems to me that there are three options for how to structure this metadata.
The first is to simply store the format, e.g. "title: ''Parish Bar''" or "title: parish ale".
The second is to define classes of titles and how they are displayed and assign each article title to one such class. e.g. for Parish Walk "title class: proper noun" and for Parish Bar: "title class: musical album title"
The third option is to store a class (without that defining the display) and the display: "title: ''Parish Bar''; class: musical album title".

The classes and (associated) displays would need to be configurable per wiki for either of those options to work.

I've also realised that separate fields would be good for "title" and "disambiguator", e.g. for the article at [[Mercury (element)]]: "title: mercury"; "disambiguator: element"; class: "common noun" or for [[Wellington, Somerset]]: "title: Wellington"; "disambiguator: Somerset"; "class: place name"

The advantage of storing this metadata is that it allows for a large amount of semantic information about the title which can be used not only for linking but potentially for customised display options and doubtless more that I haven't thought of.

*What problem (Other than perhaps ideological) does moving the data out of
templates actually solve?

I'm told that it needs to be moved out of templates for VE to support. I don't know why.
More philosophical, but I was under the impression that the long term goal was to separate metadata from content?

The advantage of storing this metadata is that it allows for a large amount of
semantic information about the title which can be used not only for linking but
potentially for customised display options and doubtless more that I haven't
thought of.

That's certainly true.

My initial reaction is that the user is probably in a better position to decide how to capitalize/italicize the title in a given context than software would be, but if such formatting of titles is consistent, I suppose it could make sense to automatically do it.

I'm told that it needs to be moved out of templates for VE to support.

It should not need that to *use* the data. To effectively edit the displaytitle data, it may need that.

I don't
know why.
More philosophical, but I was under the impression that the long term goal was
to separate metadata from content?

Some folks have that goal. Personally I think that doing so would reduce the power of mediawiki significantly, although my opinion may be a minority one. (And probably offtopic for this bug report)

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 13 2016, 10:13 AM