Page MenuHomePhabricator

Remove "auto-number headings" preference
Closed, ResolvedPublic

Description

NOTE: The intent of this task has been announced to the community on TechNews 26/2021. The consultation can be found at Performance_Dependent_User_Preferences. No objections were raised there, but there was some push back on this ticket later. The resulting discussion led to the creation of a JavaScript snippet that reproduces this feature: https://www.mediawiki.org/wiki/Snippets/Auto-number_headings. This snippet can be used to create a gadget on wikis that want it.

Summary:
The "auto-number headings" option severely degrades performance for users who have it enabled and consumes cache capacity for little benefit. The value of the feature, when working as intended, also seems dubious. It is probably best to simply remove the option.

Context:
The "auto-number headings" controls whether section numbers are shown in section headings. The default behavior is to show numbering in the table of contents, but not in the actual section headings.

Problems:
Usually, the HTML generated by the wikitext parser is cached for later re-use. This is done because parsing is relatively slow - depending on the size and complexity of the page, it can take several seconds. However, when the "auto-number headings" option is used, this caching becomes ineffective: since very few people use the option, the chances for a "cache miss" are high (cache fragmentation).

In effect, users that have "auto-number headings" set to a non-standard value are much more likely to hit an uncached page, forcing them to wait until the page has been rendered for them, which may take several seconds depending on the size and complexity of the page. That appears a very high cost for rather little gain.

Usage:
An ad-hoc analysis among users of en.wikipedia.org who have edited in the last 30 days shows that about 0.4% of active editors have the "auto-number headings" option enabled.

Solution:
If there is no great demand for this feature, we should simply remove it. If the feature is to be kept, it should be implemented in CSS: the numbering would always be generated, but the browser would be instructed to not display them. Appropriate CSS rules for showing the numbers could easily be generated from the user settings on the fly. Numbering in ToC would remain.

One possible case for having section numbering would be for print: here the numbers may be useful to find a given section, based on the numbering presented in the table of contents. However, the current behavior is the opposite: for printing, the numbers are hidden from the table of contents as well.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I support removing many of these rarely used user prefs that cause HTML to vary by these prefs. But, independent of that this one is easily controlled by CSS without modifying the HTML.

Providing the replacement CSS in this task prior to discussion would probably go a long way to making this a gentle change over.

span.tocnumber { display: none; } should do it right?

span.tocnumber { display: none; } should do it right?

Nah, this has to be a misunderstanding. This task has nothing to do with the toc.

Without wpnumberheadings:

Without wpnumberheadings.png (953×1 px, 169 KB)

With wpnumberheadings:

With wpnumberheadings.png (949×1 px, 169 KB)

Yes, instead it'd be:

span.mw-headling-number { display: none; }

… shipped be default to all users, and then over-ridden with a user script using !important or something similarly ghastly.

span.tocnumber { display: none; } should do it right?

Nah, this has to be a misunderstanding. This task has nothing to do with the toc.

Sorry yes .. I mixed up my TOC and section headings. But, looks like @Jdforrester-WMF has me covered. :)

Ah, then there are two implementation paths.

  1. Serve the numbers to everyone with CSS to hide and then users opt in by unhiding.
  2. Let people add numbers with CSS counter.

I was anticipating #2 actually, and I'm pretty sure that would be generally preferable on the MediaWiki side also.

Maybe it's just not used as people aren't aware of this helpful option.

  1. Let people add numbers with CSS counter.

I was anticipating #2 actually, and I'm pretty sure that would be generally preferable on the MediaWiki side also.

It would probably be preferable, but the CSS rules might need to be rather ornate to ensure that subsections/etc are numbered correctly. There's a failure case where the CSS numbering on the heading tag disagrees with the server-side generated numbering (in the TOC).

I'd strongly encourage us to use the same mechanism for both. We can either (a) save some bandwidth on the article page (at the cost of some cached CSS) by using CSS for both the TOC and the heading numbering (with a display: none on the latter or gated on some class on <body> which isn't present by default), or (b) include the numbering from the server on both the TOC and heading, but again with a display: none in the default CSS for the heading.

Option (a) is technically superior, in terms of bandwidth, but (b) may have benefits for accessibility and is probably easier to implement.

Isn't this visible by default?

Cannot reproduce. This is not about the ToC box itself, this is about the headings. (Preferences > Appearance > Advanced Options > Auto-number headings)

Ah, I think the TOC numbering is sufficient.

BPirkle triaged this task as Medium priority.Jun 22 2021, 9:02 PM
BPirkle moved this task from Inbox to Feature Requests to Review on the Platform Engineering board.

One possible way forward might be to combine two approaches: both CSS content and shipping a bit more HTML for all users, but not the whole heading number element. For example, if there would be an attribute data-tocnumber="1.2" on all <hX> tags, then users with this option can be shipped CSS like [data-tocnumber]::before { content: attr(data-tocnumber) ' ' } without the need for any display: none rules for all other readers.

This would also make the whole preference a bit more usable, given that TOC number would not be selectable (hence saving a bit effort for people when trying to get a link to the heading).

The option to auto-number headings is very useful. I stronly oppose any plans to remove this feature. If you want to remove it, ask the communities locally but don't decide this covertly on Phabricator.

The option to auto-number headings is very useful. I stronly oppose any plans to remove this feature. If you want to remove it, ask the communities locally but don't decide this covertly on Phabricator.

We asked on Meta via Tech News. I realize that this is not obvious to everyone, but it's the best channel I know to get information to all the different communities in their local language. I don't know how to talk to 900 communities in 100 languages directly. Do you have a suggestion?

Can you describe how you use the section headings, and what makes them particularly useful to you? If there is sufficient demand, it's possible to re-implement this using a different mechanism, as discussed above. So far, you seem to be the only one who feels strongly they need this.

The option to auto-number headings is very useful. I stronly oppose any plans to remove this feature. If you want to remove it, ask the communities locally but don't decide this covertly on Phabricator.

We asked on Meta via Tech News. I realize that this is not obvious to everyone, but it's the best channel I know to get information to all the different communities in their local language. I don't know how to talk to 900 communities in 100 languages directly. Do you have a suggestion?

Can you describe how you use the section headings, and what makes them particularly useful to you? If there is sufficient demand, it's possible to re-implement this using a different mechanism, as discussed above. So far, you seem to be the only one who feels strongly they need this.

The numbers in front of the headings increase clarity and improve navigation. Please just let this feature as it is. As I understand the text in the box above this feature does not cause big problems. It just - and only maybe! - causes some little performance issues. The main reason for disabling this seems to be that it is used only by a few users (which may be because it is normally deactivated until someone activates it manually). But for those few users it is very useful. Removing this feature would be a loss for those users without any big benefit.

As I understand the text in the box above this feature does not cause big problems.

No. The task description above says "severely degrades performance" and "cache fragmentation".

As I understand the text in the box above this feature does not cause big problems.

No. The task description above says "severely degrades performance" and "cache fragmentation".

I can't confirm this. I don't have any performance issues caused by this feature and I never heard of anyone who had some. Are there any bug reports or discussions in which someone mentions problems caused by this feature?

The numbers make it much easier to find informations, I would miss them. 0,4% of active authors is quite a large number for a option which must be actively activated. If there are performance problems switch to a solution with CSS, but do not remove this useful feature. Are there any problems for users who don't enable this? I very seldom had problems, but I use this nearly every day.

I strongly agree with my fellow users from German Wikipedia. We are very much used to this feature which is rather useful for reading and writing long articles. Also I have no issues as to website performance etc. Please keep this feature.

Before removing so important a feature the community should be asked for advice on-wiki, please. The sentence „Es gibt die Option, mehr zu erfahren und Feedback zu geben“ in https://meta.wikimedia.org/wiki/Tech/News/2021/26/de to a common user does not read like there was decisive community consultation you should not miss if you want to keep the feature from being removed.

Thanks. But you might remember the Hitchhiker's Guide to the Galaxy and the part about the planning charts that had been on display somewhere around Alpha Centauri for some fifty earth years without anyone ever realising it? ;)

Thanks. But you might remember the Hitchhiker's Guide to the Galaxy and the part about the planning charts that had been on display somewhere around Alpha Centauri for some fifty earth years without anyone ever realising it? ;)

At least the page didn't say "Beware of the Leopard”...

Seriously though, I agree that the flow of communication could be improved, but it seems to me it will have to be the individual communities who organize this. From the perspective of a single project, it seems like it would be simple for WMF staff to post to e.g. WP:FZW. But realistically, we can't do that for all the projects in all the languages. The best we can do it be transparent on meta and on phab, and encourage the wiki communities to establish a system of communication that works for them.

Perhaps this is an opportunity to improve this system on the German Wikipedia. Looks like Raymond could use some help keeping Wikipedia:Projektneuheiten updated.

I can't confirm this. I don't have any performance issues caused by this feature and I never heard of anyone who had some. Are there any bug reports or discussions in which someone mentions problems caused by this feature?

This is a widely and long known issue yes. Just try resetting your preferences and ull notice that overall many pages will load faster more often. That’s because suddenly u will get prerendered results from a cache more often. Like before, you were drawing from a cache pool of 0,5% of users generating prerendered results and after reset you are pulling from say 95% of users generating prerendered results.

Okay, even if there are some performance issues, let us decide ourselves if we want to live with these issues. The problems with this features affect only those who have activated it, did I understand that correctly?

@Chaddy: There is a server side where things are hosted. There is also a client side where things are shown in your web browser. These are two different things.

@Chaddy: There is a server side where things are hosted. There is also a client side where things are shown in your web browser. These are two different things.

Yes, I know. This doesn't answer my question.

The problems with this features affect only those who have activated it, did I understand that correctly?

Yes and no. Right now ALL logged-in users get uncached page views because of features like this one. The ultimate goal here is to allow cached access for ALL logged-in users. If you live in a part of the world with good internet connection and good connectivity with our primary data centers, performance is not a huge problem for you personally and you will not see a huge difference. But for all the people with less connectivity having cached responses will be a dramatic improvement.

Removing this particular feature will not yet allow us to serve cached responses to all logged-in users - it's not the last blocker, nor it is the hardest one. With this one 0.4% of users will start sharing the cache, the load will go slightly down, performance for you will improve a little bit. Nothing fascinating will happen. But all these are not the reasons for this change. The reason is that it's a stepping stone on the path to the big prize - enabling CDN-cached access for all logged in users. It will not happen instantly after we drop this particular feature, but we need to drop or completely reengineer it, sooner or later, if we want more than half of the world to enjoy performance similar to what the people with good internet connection have.

The problems with this features affect only those who have activated it, did I understand that correctly?

Yes and no. Right now ALL logged-in users get uncached page views because of features like this one. The ultimate goal here is to allow cached access for ALL logged-in users. If you live in a part of the world with good internet connection and good connectivity with our primary data centers, performance is not a huge problem for you personally and you will not see a huge difference. But for all the people with less connectivity having cached responses will be a dramatic improvement.

Removing this particular feature will not yet allow us to serve cached responses to all logged-in users - it's not the last blocker, nor it is the hardest one. With this one 0.4% of users will start sharing the cache, the load will go slightly down, performance for you will improve a little bit. Nothing fascinating will happen. But all these are not the reasons for this change. The reason is that it's a stepping stone on the path to the big prize - enabling CDN-cached access for all logged in users. It will not happen instantly after we drop this particular feature, but we need to drop or completely reengineer it, sooner or later, if we want more than half of the world to enjoy performance similar to what the people with good internet connection have.

Okay, so why not writing a whole new version of this feature?

Another long time user of this feature here.

As it long as the current system is replaced by CSS counting I won’t complain. Potential problems I see:

The page title is an h1 caption, and this must not be counted (has always an id firstcaption, as it seems). Usually only h2 levels follow, which get the highest countning level, but there are also pages with more than h1 caption, and the current system sets them to the highest counting level in these cases (see for example de:Wikipedia:Auskunft, de:Wikipedia:Fragen zur Wikipedia and Commons:Village pump; there are more for sure, but is is not possible to search for these with wikisearch).

In Vector and maybe also the other skins the navigation elements contain an h3 element, this must also not be counted, of course. At least some of these (didn’t check thoroughly) come even before the h1 id="firstHeading".

If I see it correctly the counters should be made dependent of id bodyContent.

Okay, so why not writing a whole new version of this feature?

We may, but we have to weigh the effort against the benefit. How long would it take use to do this (and not work on other things in this time), how many people would benefit from that work, and how does that impact their contribution to the projects.

We can't implement everything that would be nice to have for someone. One of the things that is holding MediaWiki development back is that we have too many niche features that make the code complicated. Of course, sometimes a feature is really important even though it's only used by 0.1% of our users. But often, it isn't.

I think in this case, the effort for doing this in CSS is pretty low, so we might just do it. But I may be wrong about the effort, and it's not my decision to make.

Another long time user of this feature here.

As it long as the current system is replaced by CSS counting I won’t complain.

If we keep support in MediaWiki, we will probably just continue to do our own counting, and use CSS to hide or show the numbers (instead of caching two separate versions of the page).

But a fully CSS based (or JS based) solution could still be implemented in a local Gadget if the feature is removed from MediaWiki entirely.

Okay, even if there are some performance issues, let us decide ourselves if we want to live with these issues. The problems with this features affect only those who have activated it, did I understand that correctly?

They directly/severely affect those of have activated it (they often have to wait for the page to render). But the cpu cycles and cache capacity this uses is taken away from others. So it affects everyone - just a tiny bit, but for millions and millions of pages views every day. These things add up.

@daniel I do not know, how well the gadget is written, but would it be possible to rewrite it in a way that it inserts these data-something attributes as suggested by @stjn instead of the additional HTML elements it inserts now, and this for all users, hence not being a gadget anymore? And the central gadget itself would be converted into quite short CSS, so everyone who doesn’t has enabled it does not notice anything. The size of every page would be larger for everyone, though (but for us who use this feature it would be smaller); the question is, how much (perhaps: data-toc-no="", for each entry 15 characters including leading space char, but without the number, and data-sect-no="", all in all 16 chars).

Or, at least, if the gadget itself is the culprit for the slow down, ensure that there are sufficient ID and classes for toc entries and sections to make proper counting without errors possible. (The empasisis is on ensure: There are already a lot of id and classes, but I didn’t check, whether these are enough for the task.)

Quick summary of an implementation idea based on a discussion on Slack with @Catrope and @DLynch:

If we go with the CSS based implementation, the best approach is probably to follow the ideas of @stjn and @Speravir: we'd include a data-sectionnumber attribute in the HTML, and ship a rule like .show-section-numbers [data-tocnumber]::before { content: attr(data-tocnumber) ' ' } to everyone. The we can use OutputPage::addBodyClass('show-section-numbers') to enable display, based on user preferences, probably from Article::view() and friends.

Alternatively, a static file resource module can be conditionally added in UserStylesModule.

Quick summary of an implementation idea based on a discussion on Slack with @Catrope and @DLynch:

If we go with the CSS based implementation, the best approach is probably to follow the ideas of @stjn and @Speravir: we'd include a data-sectionnumber attribute in the HTML, and ship a rule like .show-section-numbers [data-tocnumber]::before { content: attr(data-tocnumber) ' ' } to everyone. The we can use OutputPage::addBodyClass('show-section-numbers') to enable display, based on user preferences, probably from Article::view() and friends.

Alternatively, a static file resource module can be conditionally added in UserStylesModule.

I thought about this a bit and I was pro this change but after thinking more, I think it doesn't make sense to bloat our HTML making it bigger for everyone for sake a very small usecase that is not being used more than 5% of the time.

You can simply write a js to do this. I just wrote a one-liner for it:

$( '.mw-parser-output').children().each(function(i, a) { if ( a.tagName.startsWith('H') ) { $(a).children().first().prepend( $('<span class="mw-headline-number"></span>').text($('#toc a[href="#' + a.children[0].id + '"] .tocnumber').text() + ' ')) }});

Go ahead and run it on any page you like. It can be simply turned it a user-script or a gadget.

I highly recommend simply dropping the feature given that the alternative (the above js) has been already provided.

You can simply write a js to do this. I just wrote a one-liner for it:

$( '.mw-parser-output').children().each(function(i, a) { if ( a.tagName.startsWith('H') ) { $(a).children().first().prepend( $('<span class="mw-headline-number"></span>').text($('#toc a[href="#' + a.children[0].id + '"] .tocnumber').text() + ' ')) }});

Go ahead and run it on any page you like. It can be simply turned it a user-script or a gadget.

While I guess I agree in principle, this only works on ASCII IDs for some reason. (Latest Firefox on Windows 8.1.)

Thanks for spotting the issue. This would work for non-ASCII headings too:

$( '.mw-parser-output').children().each(function(i, a) { if ( a.tagName.startsWith('H') ) { $(a).children().first().prepend( $('<span class="mw-headline-number"></span>').text($('#toc a[href="#' + $(a).find('.mw-headline').attr('id') + '"] .tocnumber').text() + ' ')) }});

Change 725441 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Remove \"auto-number headings\" preference

https://gerrit.wikimedia.org/r/725441

Thanks for spotting the issue. This would work for non-ASCII headings too:

$( '.mw-parser-output').children().each(function(i, a) { if ( a.tagName.startsWith('H') ) { $(a).children().first().prepend( $('<span class="mw-headline-number"></span>').text($('#toc a[href="#' + $(a).find('.mw-headline').attr('id') + '"] .tocnumber').text() + ' ')) }});

Formatted:

$('.mw-parser-output').children().each(function (i, elem) {
  if (elem.tagName.startsWith('H')) {
    $(elem).children().first().prepend(
      $('<span class="mw-headline-number"></span>').text(
        $('#toc a[href="#' + $(elem).find('.mw-headline').attr('id') + '"] .tocnumber').text() + ' '
      )
    )
  }
});

Some suggestions:

  • The loop over every top-level content child can be eliminated by searching for the headings directly in the original selector.
  • Use native built-in a.firstElementChild, instead of $(a).children().first() or $(a).find('.mw-headline') which afaik do the same thing here but in two different ways. Note that jQuery .children() skips over text/comment nodes which I assume is intended. This in constrast to jQuery .contents() where the first would equate to a.firstChild.
  • Use $.escapeSelector() in the TOC number lookup, to avoid problems with special characters.

These changes bring it down to:

$('.mw-parser-output :is(h1,h2,h3,h4,h5,h6)').each(function (i, elem) {
  var headline = elem.firstElementChild;
  if (headline) {
    var num = $('#toc a[href="#' + $.escapeSelector($(headline).attr('id')) + '"] .tocnumber').text();
    $(headline).prepend($('<span class="mw-headline-number"></span>').text(num + ' '));
  }
});

Going a bit further:

  • The traversal to the headline and subsequent condition can also be eliminated, by including .mw-headline in the original selector.
  • Use the native .id property directly.
  • Guard against TOC heading number not being found.
$('.mw-parser-output :is(h1,h2,h3,h4,h5,h6) .mw-headline').each(function (i, headline) {
  var num = $('#toc a[href="#' + $.escapeSelector(headline.id) + '"] .tocnumber').text();
  if (num) {
    $(headline).prepend($('<span class="mw-headline-number"></span>').text(num + ' '));
  }
});

Or a version without using jQuery, if you don't mind using DOM4 and ES6 methods that work all current browsers:

  • Use native prepend() directly.
  • Use native CSS.escape() directly.
  • Use native NodeList.forEach().
  • Use Object.assign().
document.querySelectorAll('.mw-parser-output :is(h1,h2,h3,h4,h5,h6) .mw-headline').forEach(function (headline) {
  var num = document.querySelector('#toc a[href="#' + CSS.escape(headline.id) + '"] .tocnumber');
  if (num) headline.prepend(Object.assign(document.createElement('span'), {
    className: 'mw-headline-number',
    textContent: num.textContent + ' '
  }));
});

And lastly, if we don't need to customise the span styles, then let prepend() create the text node for us:

document.querySelectorAll('.mw-parser-output :is(h1,h2,h3,h4,h5,h6) .mw-headline').forEach(function (headline) {
  var num = document.querySelector('#toc a[href="#' + CSS.escape(headline.id) + '"] .tocnumber');
  if (num) headline.prepend(num.textContent + ' ');
});

Note that these snippets only work on pages with TOC. Pages with few headings don't have TOC.

Note that these snippets only work on pages with TOC. Pages with few headings don't have TOC.

Fixed by using CSS counters. https://www.mediawiki.org/wiki/Snippets/Auto-number_headings

It seems from the last few comments that the general attitude here has turned towards a pure-CSS/JS solution if this feature is kept in any way, but if it's actually still up in the air, I just wanted to point out that the data-* solution should probably use an mw- prefix (i.e. data-mw-tocnumber or what-have-you), to reduce the chances of conflicts with data-* attributes being used on-wiki (this also mirrors the same being done with MediaWiki-side class names these days AFAIK).

Dinoguy1000 renamed this task from Remove "auto-number headings" preference to Remove "auto-number headings" preference.Oct 5 2021, 2:04 AM

Change 725441 merged by jenkins-bot:

[mediawiki/core@master] Remove \"auto-number headings\" preference

https://gerrit.wikimedia.org/r/725441

Ladsgroup claimed this task.

This is done.

daniel updated the task description. (Show Details)

Change 793499 had a related patch set uploaded (by Jforrester; author: Diesel kapasule):

[mediawiki/core@master] RELEASE-NOTES-1.38: Mention that the "auto-number headings" feature was dropped

https://gerrit.wikimedia.org/r/793499

Change 793499 merged by jenkins-bot:

[mediawiki/core@master] RELEASE-NOTES-1.38: Mention that the "auto-number headings" feature was dropped

https://gerrit.wikimedia.org/r/793499

Change 802607 had a related patch set uploaded (by Jforrester; author: Diesel kapasule):

[mediawiki/core@REL1_38] RELEASE-NOTES-1.38: Mention that the "auto-number headings" feature was dropped

https://gerrit.wikimedia.org/r/802607

Change 802607 merged by jenkins-bot:

[mediawiki/core@REL1_38] RELEASE-NOTES-1.38: Mention that the "auto-number headings" feature was dropped

https://gerrit.wikimedia.org/r/802607