Endpoint to make it easier for apps to display talk pages.
The endpoint takes a talk page title (and optional revision id) and returns a structured representation of the talk page, in JSON, preserving only certain elements.
Payload is structured as `replies` within `topics`:
**Topics:**
{F29470254}
Topics correspond to sections:
- `id` comes from the section id
- `depth` corresponds to section depth - i.e. the H tag's number (or 1 if from root level)
- `html` comes from section H tag contents
- `shas.html` is sha on `html` appended to `id`
- `shas.indicator` is sha on this topic's replies shas appended to this topic's `html` (easy way to know if any reply has changed)
**Replies:**
{F29470282}
Replies correspond, as best as can be determined, to individual messages within a topic. The primary challenge is teasing out replies from one another:
- `html` is the body of the reply, the boundaries of which are determined by a combined heuristic of message `depth` and user "signature" detection (user and user talk page links and timestamps are considered for this heuristic)
- `depth` corresponds to the level of indentation of the reply (as indicated by depth of nesting within depth indicating tags - ie `DL`, `UL`, `OL`, or similar depth-indicating wiki markup - ie `:`
- `sha` is sha of this reply's index appended to its `html`
A subset of markup will be preserved:
- presently `B`, `I`, `A`, `SUP`, `SUB`, `UL`, `OL` and `LI` are preserved
- Other tags' content is converted to plain text.
- Certain tags will be converted one of the preserved tags:
-- `BIG`, `CODE` and `DT` are converted to `B`
-- `IMG` are converted to `A` linking to the image
-- `DL` are converted to `UL`
-- `DD` are converted to `LI`
-----
The left side of the images below are examples from a complex user talk page.
The right side shows simple ajax output [from [[ https://gerrit.wikimedia.org/r/#/c/mediawiki/services/mobileapps/+/509898/44/test/lib/talk/visualizer.html | visualizer.html ]]] of the WIP endpoint's data (as defined above) for the same part of the page.
Note the indentation of both `topics`, as outlined in blue, and `replies`, as outlined in red, at their correct respective depths. Also make note of replies (red outline) being correctly distinguished from one another:
{F29470313}
Note the preservation of indentation inside a reply, as seen in two places inside the first (outlined in red) reply:
{F29470322}
Tables are obviously not going to look the same when we're only preserving their text, but, aside from the lack of pie, the gist of the message is discernible:
{F29470330}
Topics from H3 (or greater) are correctly set to depth corresponding to the H tag - as seen on the second topic here (outlined in blue):
{F29470340}
`CODE` tag contents are converted to `B` so they stand out. The table is not great, but basically readable.
{F29470352}
The gist of this table is better preserved (than the earlier table screenshot):
{F29470370}
Both ordered and unordered lists in one topic:
{F29470381}