The current section topics algorithm doesn't have a way to generate section identifiers.
These are likely to be required by front-end clients to effectively retrieve a given section.
See comment in T312900#8105248.
Tasks
- Check whether mediawikiparserfromhell has the same behavior as the MediaWiki API for section identifiers, see example
Highlights
- I couldn't find an evident way to replicate the MediaWiki API behavior in mediawikiparserfromhell (mwp)
- it's also known that mwp can't handle syntax elements produced by a template transclusion, see limitations. This is likely to entail that sections transclusion raised in T312900#8105248 can't be easily handled
Proposal
A minimum viable solution is to use the section absolute index as a simple identifier. This would be identical to the MediaWiki API index key, see for instance the section object in the example call:
{ "toclevel": 1, "level": "2", "line": "Discografia", "number": "3", "index": "16", "fromtitle": "Ramones", "byteoffset": 94346, "anchor": "Discografia" }
This is the 16th section that appears in the Ramones page on itwiki, regardless of its hierarchy level.
The example dataset row given in T312900: [M] Design database model for section topics pipeline would become:
snapshot | wiki_db | page_namespace | revision_id | page_qid | page_id | page_title | section_id | section_title | topic_qid | topic_title | topic_score |
2022-07-11 | enwiki | 0 | 1066420146 | Q4464287 | 10391760 | Work_(painting) | 1 | background and influences | Q543626 | Lazzaroni_(Naples) | 2 |
Important note
Current Research code extracts sections at hierarchy level 2 exclusively.
Given the following dummy wikitext:
section zero == section one == ... === section one one === ... == section two == ... === section two one === ... ==== section two one one ==== ... ==== section two one two ==== ... == section three == ...
the code will extract a total of 3 sections, each holding their subsections:
[ '== section one ==\n...\n=== section one one ===\n...', '== section two ==\n...\n=== section two one ===\n...\n==== section two one one ====\n...\n==== section two one two ====\n...', '== section three ==\n...' ]