Page MenuHomePhabricator

Pages corresponding to tables of contents' second links are not provided the first link as the "prev" parameter for headers generated with <pages>
Open, In Progress, Needs TriagePublic

Description

"Module:Header template" on a given Wikisource supplies a number of parameters automatically based on an index page if they are not otherwise supplied in the <pages header=1/> tag, including "prev", "next", "current", and so on. If the page using that tag corresponds to the second link in the corresponding index's table of contents, however, then the first link is not supplied as the "prev" parameter. This results in situations such as that in this page, which according to the book's table of contents should be noted in the header as following this page but currently isn't.

I suspect that changing ">" to ">=" on line 300 of "includes/Parser/PagesTagParser.php" would address this issue.

Event Timeline

@Mahir256 The first mainspace link on the Index: is skipped because it is assumed to be the title of the work. Unless I am misunderstanding your issue, putting a wikilink in the উইকিউপাত্ত আইটেম field should fix it.

putting a wikilink in the উইকিউপাত্ত আইটেম field should fix it.

Yes, I'm aware that throwing in an extra link is one way to sidestep the issue, but I wonder if the underlying assumption is worth hardcoding in the extension at all.

Yes, I'm aware that throwing in an extra link is one way to sidestep the issue, but I wonder if the underlying assumption is worth hardcoding in the extension at all.

This behaviour is as designed and documented, so it's not a matter of "sidestepping the issue". I'm sure that there are other ways it could be implemented too, but that would require a better definition of what the actual problem with the current behaviour is, and some sketch of a hypothetical better solution.

that would require a better definition of what the actual problem with the current behaviour is

As part of an effort to increase integration of the Bengali Wikisource with Wikidata, we have begun migrating most information within index pages that can be migrated to Wikidata items. As a result, the only free-text field in ProofreadPage that is frequently non-empty before the pagelist is the "wikidata-item" field, holding the Qid of the work's item. The "wikidata-item" field is used directly within our implementations of "Module:Header template" and "MediaWiki:Proofreadpage index template" in retrieving the migrated information for display in the appropriate places. Turning that field into a link makes processing it, even if it's merely a matter of stripping the link syntax, just a bit more complicated than it needs to be. If the information above the pagelist field is already on the Wikidata item, then adding a link to any of the fields above the pagelist becomes redundant.

and some sketch of a hypothetical better solution.

The simple fix I propose is a direct solution to the problem and to the extent I am aware of would not adversely affect most other generated headers. (I'd be interested in seeing where an adverse effect might occur, if at all.)

Turning that field into a link makes processing it, even if it's merely a matter of stripping the link syntax, just a bit more complicated than it needs to be.

Is there any particular reason why you can't add a link to the mainspace page where the work is transcluded to one of the other index fields and use a dedicated field for the QID?

to the extent I am aware of would not adversely affect most other generated headers.

It would break every single other use of <pages … header=1 /> because they rely on the documented behaviour.

Is there any particular reason why you can't add a link to the mainspace page where the work is transcluded to one of the other index fields and use a dedicated field for the QID?

We are using the "wikidata-item" field as the dedicated field for the Qid. As (implied, maybe should have been expressed better?) in the sentence following that to which you replied, a link to where the work is transcluded would have been the use for the "title" field before, but that link has been migrated to a sitelink on a Wikidata item, which can be retrieved using Lua's mw.wikibase functionality, thus making it redundant to retain it in the index page.

It would break every single other use of <pages … header=1 /> because they rely on the documented behaviour.

I'm not sure if you're trying to imply that every single other use of <pages header=1/> appears on a page corresponding to the second link in the index page and thus actually exhibits that behavior. Even if you aren't, however, the change I propose would realistically affect at most (# of index pages) pages--each of which would have exactly one "second link"--rather than the more enormous and troublesome (# of pages using <pages>).

The relevant piece of logic which this task considers was introduced in this commit, three years before Wikidata's launch and thus three years before the idea of pulling in information for an index page would have been conceived. We on the Bengali Wikisource, in the interest of modernizing the functionality of our site beyond the old days of direct {{header}} use and pre-Scribunto capabilities, are trying to embrace the benefits that Wikidata provides to the extent that it can support the metadata of index pages, and this necessarily extends from the slow discontinuation of certain index fields to the integration of their replacements in main namespace pages, in addition to the sometimes semi-automatic maintenance of some other namespaces. I can accept that such an attitude as we espouse may be foreign and uncomfortable to certain Wikisourcerors, but that doesn't mean that the software—which I feel the need to assert we wish to share with those Wikisourcerors—should continue to hold other communities back (or that the effort to defend what's being requested in this task is misplaced).

After some searches on s:en:, mw:, and Gerrit, the only acknowledgment of this behavior I was able to find is a comment from August 2019 in a Scriptorium by the objector to this task describing this behavior in response to someone else who was confused by it. Perhaps there is some non-formal record of this choice someplace which is expected to be visible but was not at the time I performed my searches (and perhaps I am not as good at searching for things as others), but absent this or an assessment that something particularly devastating might result from the addition of that equals sign, I contend that the particular one-time adjustment originating from that change is entirely manageable, at least compared to other software-related adjustments (such as the refreshment of Page:s across wikis to store proofreading status as a page property) which have had to be executed previously.

If @Tpt still finds this task and the proposal within it entirely objectionable, I will have no problem with his declining it. I would like there to be some consideration, however, of whether we ought to allow communities to dispense with the assumption that information aside from a Qid above a certain point must include some sort of link, no matter how unnecessary it might be, to conform to what they might consider parachronistic behavior.

We on the Bengali Wikisource … are trying to embrace the benefits that Wikidata provides to the extent that it can support the metadata of index pages

That's a laudable goal and one in which you are not alone (there are multiple efforts across the Wikisourcen to do the same, and there are non-Wikisource-specific technical efforts that will be important enabling technologies here that are currently being worked on). However, for one Wikisource to impose their idea of what the One True Way™ is for all the other Wikisources is not a particularly constructive approach. Dismissing as irrelevant the cleanup that other volunteers will have to do across all other projects because you want perfect technical purity in your local use case is extremely unlikely to be well received.

I would strongly recommend that you rethink this approach.

Noting that no formal documentation of the behavior of concern in this task was yet provided by this task's objector, despite the assertion made that it exists:

for one Wikisource to impose their idea of what the One True Way™ is for all the other Wikisources is not a particularly constructive approach.

Nowhere was I trying to imply that the change should be applied on account of some unsubstantiated notion of superiority, without adequate preparation on the other Wikisources. If this is a particularly breaking change, let this be pursued with the caution required for it, rather than suggesting that things like this not be pursued at all.

Dismissing as irrelevant the cleanup that other volunteers will have to do across all other projects because you want perfect technical purity in your local use case is extremely unlikely to be well received.

Nowhere was I trying to imply that, even with the aforementioned caution, the change I propose is a trivial thing to address; all I am saying is that compared to at least one other massive change that was once made affecting all Wikisources, there are considerably fewer pages across Wikisources that are actually affected by it, and that, due to how this behavior manifests, that number is necessarily limited (a rather high upper bound) by the number of index pages on a given Wikisource (for each of which there may be at most one mainspace page with a <pages header> tag with no explicit "prev" parameter, this page being that referred to by the second link on said index page).

If my initial assessment of how many pages this affects was inadequate for you, here's a somewhat better assessment:

  • Considering the English Wikisource only, of its 23,033 index pages, only 5,420 have tables of contents in them. Among those, of the 1,578 index pages which don't transclude a page from the Page: namespace, 536 have no links at all. (The current upper bound of the number of pages affected is thus 4,884.)
  • (This comment will be updated as more analyses are run.)

I would strongly recommend that you rethink this approach.

With respect to this Phabricator task, I refer you to the first point under "Reason" in this text.

Noting that no formal documentation of the behavior of concern in this task was yet provided by this task's objector, despite the assertion made that it exists:

You're looking for this page.

@Bodhisattwa @jayantanth If the current PRP behaviour is holding back progress on Bengali Wikisource I suggest one of you pare it down to a neutral "Here's what we want to achieve, and this is what we're having trouble with" description in a new Phab task. If (big if) what I am extracting from the above is correct, I feel confident there are ways this can be achieved without disrupting other projects. For example, a config variable to enable or disable the behaviour per project, so Bengali Wikisource and any others that want to go entirely to Wikidata for their Index: pages can do so as and when they are ready to do so (there are other possible approaches too). As I mentioned up above somewhere, I know there are other Wikisources that are working on this so it is likely having this configurable will be of interest to more projects either short term or at some point in the foreseeable future.

Noting that no formal documentation of the behavior of concern in this task was yet provided by this task's objector, despite the assertion made that it exists:

You're looking for this page.

@Bodhisattwa @jayantanth If the current PRP behaviour is holding back progress on Bengali Wikisource I suggest one of you pare it down to a neutral "Here's what we want to achieve, and this is what we're having trouble with" description in a new Phab task. If (big if) what I am extracting from the above is correct, I feel confident there are ways this can be achieved without disrupting other projects. For example, a config variable to enable or disable the behaviour per project, so Bengali Wikisource and any others that want to go entirely to Wikidata for their Index: pages can do so as and when they are ready to do so (there are other possible approaches too). As I mentioned up above somewhere, I know there are other Wikisources that are working on this so it is likely having this configurable will be of interest to more projects either short term or at some point in the foreseeable future.

@Xover , I have filed a new ticket T285738 as per your suggestion and tried to keep it as neutral as possible stating only the problem we are facing. Hope it helps.

@Xover , I have filed a new ticket T285738 as per your suggestion and tried to keep it as neutral as possible stating only the problem we are facing. Hope it helps.

I may have misunderstood what you are trying to achieve. The new task describes trouble with the prev/next links, for which the response is "you need to put a mainspace link on the Index: title field". This is the documented behaviour and so the problem you're seeing is "user error" (the software is working as designed, so this is not an error or a bug in the software).

What I thought I had understood was that your actual overall goal is getting rid of locally stored data in the Index:, and especially the title link, and this by-design/documented behaviour was preventing you from achieving that goal. In other words, the software is working correctly, but the way it works is getting in the way of this other thing you want to achieve.

Iff I've understood correctly (and I could certainly have misunderstood), the formulation would then be something like "We want to move all Index: fields to Wikidata, but we can't currently do that because dropping the title field link breaks the next/prev links in automatic header generation. We are looking for some way to enable us to drop that link but keep automatic headers functioning."

If my understanding is accurate, that's an issue other Wikisources with similar efforts have already, or will soon, run into, so my guess is that you'd get plenty of support for it and any solution will be reusable for several Wikisources going forward.

Thank you @Mahir256 and @Xover for this discussion.

If I try to summarize it:

  1. It is useful to be able to have the first "prev" link working for Index: pages that does not contain a "title" value because of the Wikidata migration.
  2. There are thousands of works which rely on the old behavior of the first link to ns0 being ignored when generating the prev/next links (at least in the French and English Wikisources).

I see a way that might allow us to get out of this problem: Instead of ignoring the first link to ns0, why not just make ProofreadPage only consider links in the "table of contents" field of the Index page, without skipping the first one?
This might be implemented by annotating the relevant field in "Mediawiki:Proofreadpage_index_data_config" with something like "data": "toc". If such a field is present in the configuration this field would be the only one considered for prev/next links. If this field is not present the current behavior would be kept, at least during the time required to migrate all Wikisources. This way would not break the "ignore the title" feature while allowing to fix the "prev" links for wikis that makes use of Wikidata.

@Mahir256 and @Xover what do you think about this idea? If there a use case I missed?

what do you think about this idea? If there a use case I missed?

I can't speak to technical feasibility of the approach or whether this would be adequate to let Bengali Wikisource do what they are aiming for, but apart from that this sounds like an eminently workable solution. If feasible as described this approach would "just work" for all the per-project variants I can currently think of.

In particular, this sounds like it would provide a very easy and smooth migration path at the per-project level.

But aiui the granularity would be per-project and not per-index. How would a project start experimenting with this or avoid the need for a big bang migration? Use two toc fields in the index, one for the traditional way and one for a "Wikidata-based toc"? That would necessitate PRP differentiate between "present" and "non-empty" for the index data field tagged with "data": "toc", I think. That sounds a little more complicated, but is perhaps not too much so?

But aiui the granularity would be per-project and not per-index. How would a project start experimenting with this or avoid the need for a big bang migration? Use two toc fields in the index, one for the traditional way and one for a "Wikidata-based toc"? That would necessitate PRP differentiate between "present" and "non-empty" for the index data field tagged with "data": "toc", I think. That sounds a little more complicated, but is perhaps not too much so?

If a Wikisource want to do that, it would just require to create a new field in "Mediawiki:Proofreadpage_index_data_config" and annotate it with "data": "toc". This way only index pages which would have "opt-in" by moving the toc to the new field would use the new approach. Anyway, I have a hard time figuring out what would break if a Wikisource just add "data": "toc" to the current toc field configuration. The "title" link is I believe nearly always in a "title" field and not in the TOC so it would be still ignored with the new approach.

Change 704564 had a related patch set uploaded (by Tpt; author: Tpt):

[mediawiki/extensions/ProofreadPage@master] Build prev/next link only from the TOC

https://gerrit.wikimedia.org/r/704564

Tpt claimed this task.

Yes. To make it work properly "data": "toc" should be added to the relevant field in Mediawiki:Proofreadpage_index_data_config.

Maybe the issue didn't go actually.

If we put the main page of the book at the beginning of the TOC like this TOC, the 2nd page becomes the 3rd page in TOC and thus it looked like that the issue was resolved.

For TOCs where the main page of the book is still not added at the beginning, the issue is still there. Eventually, we will add all main pages of the book in all TOCs, but maybe not every Wikisource will not follow that pattern, so the issue still needs to get resolved.

Sorry, I should not have closed this issue. The change that should fix this problem is not merged yet: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ProofreadPage/+/704564
So, it makes sense that the problem does not seem solved on Wikisource.

Sorry, I should not have closed this issue. The change that should fix this problem is not merged yet: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ProofreadPage/+/704564
So, it makes sense that the problem does not seem solved on Wikisource.

@Tpt, I guess, it is still not merged.

Change 704564 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] Build prev/next link only from the TOC

https://gerrit.wikimedia.org/r/704564

The change fixing this issue is now merged.

Now that we are on wmf.9, I see that for this index, with "data": "toc" set for the table of contents ("Remarks"), and without using the trick mentioned in T285610#7267935 on that index, the second chapter still does not link to the first chapter or vice versa. Is there something we need to refresh on our end for the effects of this patch to be visible?

@Mahir256 The change I wrote is in fact buggy. It's looking for a field named "toc" and not a field which data field is "toc". I'm writing a fix right now. Sorry for that.

Change 741682 had a related patch set uploaded (by Tpt; author: Tpt):

[mediawiki/extensions/ProofreadPage@master] Fixes lookup of field per key

https://gerrit.wikimedia.org/r/741682

Change 741682 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] Fixes lookup of field per key

https://gerrit.wikimedia.org/r/741682