Page MenuHomePhabricator

Document (and/or tweak) TOCData API representation
Open, Needs TriagePublic

Description

In T328605: Export TOCData in the action API we started exporting TOCData as an alternative to the old 'sections' legacy data.

The format we use for TOCData internally was tuned for writing parser tests, and so is pretty terse, omitting a lot of fields when they are redundant/unnecessary. We should document this -- and perhaps if it is *too* terse then create a more verbose serialization for the benefit of the action API.

Details

Event Timeline

You have a bug. In this API query, parsing a page with a whitespace-only section heading, the output from prop=sections correctly has line and anchor set to the empty string while the output from prop=tocdata incorrectly omits these properties entirely.

"sections": [
    {
        "toclevel": 1,
        "level": "2",
        "line": "",
        "number": "1",
        "index": "1",
        "fromtitle": "API",
        "byteoffset": 0,
        "anchor": "",
        "linkAnchor": ""
    }
],
"tocdata": {
    "sections": [
        {
            "tocLevel": 1,
            "hLevel": 2,
            "number": "1",
            "index": "1",
            "fromTitle": "API",
            "codepointOffset": 0
        }
    ],
    "extensionData": []
}

The linkAnchor also seems to be omitted in general, but that's defensible as a removal of redundancy in the new format since it's generally the same as anchor (unless % characters are involved).

Each field in the TOCData representation has a default value if it is not present; the default value for 'line' and 'anchor' is the empty string. The default value for 'linkAnchor' is 'anchor'. https://github.com/wikimedia/mediawiki-services-parsoid/blob/08d22d0c01e135ddc850fcbeb50def29904cfe4d/src/Core/SectionMetadata.php#L340

This output is correct, but this behavior needs to be documented.

This output is correct, but this behavior needs to be documented.

Documenting the misbehavior with respect to line and anchor wouldn't really improve the situation. At best it would change it from "surprising bug" to "intentionally poor API design".

Change #1275559 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Improve documentation for action=parse&prop=tocdata

https://gerrit.wikimedia.org/r/1275559

Change #1275559 merged by jenkins-bot:

[mediawiki/core@master] Improve documentation for action=parse&prop=tocdata

https://gerrit.wikimedia.org/r/1275559