The current section topics algorithm doesn't have a way to generate section identifiers.
These are likely to be required by front-end clients to effectively retrieve a given section.
See comment in T312900#8105248.
==Tasks==
- [] Check whether mediawikiparserfromhell has the same behavior as the MediaWiki API for section identifiers, see [example](https://it.wikipedia.org/w/api.php?action=parse&prop=sections&page=Ramones)- [x] Check whether mediawikiparserfromhell has the same behavior as the MediaWiki API for section identifiers, see [example](https://it.wikipedia.org/w/api.php?action=parse&prop=sections&page=Ramones)
==Highlights==
- I couldn't find an evident way to replicate the MediaWiki API behavior in mediawikiparserfromhell (mwp)
- it's also known that mwp can't handle syntax elements produced by a template transclusion, see [limitations](https://github.com/earwig/mwparserfromhell#limitations). This is likely to entail that sections transclusion raised in T312900#8105248 can't be easily handled
==Proposal==
A minimum viable solution is to use the section absolute index as a simple identifier. This would be identical to the MediaWiki API `index` key, see for instance the section object in the [example call](https://it.wikipedia.org/w/api.php?action=parse&prop=sections&page=Ramones):
```
{
"toclevel": 1,
"level": "2",
"line": "Discografia",
"number": "3",
"index": "16",
"fromtitle": "Ramones",
"byteoffset": 94346,
"anchor": "Discografia"
}
```
This is the **16th** section that appears in the [Ramones](https://it.wikipedia.org/wiki/Ramones) page on itwiki, regardless of its hierarchy level.
---
The example dataset row given in {T312900} would become:
| snapshot | wiki_db | page_namespace | revision_id | page_qid | page_id | page_title | **section_id** | section_title | topic_qid | topic_title | topic_score
| 2022-07-11 | enwiki | 0 | 1066420146 | [Q4464287](https://www.wikidata.org/wiki/Q4464287) | 10391760 | [Work_(painting)](https://en.wikipedia.org/wiki/Work_(painting)) | **1** | [background and influences](https://en.wikipedia.org/wiki/Work_(painting)#Background_and_influences) | [Q543626](https://www.wikidata.org/wiki/Q543626) | [Lazzaroni_(Naples)](https://en.wikipedia.org/wiki/Lazzaroni_(Naples)) | 2
==Important note==
Current #research [code](https://gitlab.wikimedia.org/mnz/section-alignment/-/blob/main/notebooks/sections.ipynb) extracts sections at hierarchy level **2** exclusively.
Given the following dummy wikitext:
```
section zero
== section one ==
...
=== section one one ===
...
== section two ==
...
=== section two one ===
...
==== section two one one ====
...
==== section two one two ====
...
== section three ==
...
```
the code will extract a total of **3** sections, each holding their subsections:
```
[
'== section one ==\n...\n=== section one one ===\n...',
'== section two ==\n...\n=== section two one ===\n...\n==== section two one one ====\n...\n==== section two one two ====\n...',
'== section three ==\n...'
]
```