Endpoint to make it easier for apps to display talk pages.
The endpoint takes a talk page title (and optional revision id) and returns a structured representation of the talk page, in JSON, preserving only certain elements.
Payload is structured as replies within topics:
Topics correspond to sections:
- id comes from the section id
- depth corresponds to section depth - i.e. the H tag's number (or 1 if from root level)
- html comes from section H tag contents
- shas.html is sha on html appended to id
- shas.indicator is sha on this topic's replies shas appended to this topic's html (easy way to know if any reply has changed)
Replies correspond, as best as can be determined, to individual messages within a topic. The primary challenge is teasing out replies from one another:
- html is the body of the reply, the boundaries of which are determined by a combined heuristic of message depth and user "signature" detection (user and user talk page links and timestamps are considered for this heuristic)
- depth corresponds to the level of indentation of the reply (as indicated by depth of nesting within depth indicating tags - ie DL, UL, OL, or similar depth-indicating wiki markup - ie :
- sha is sha of this reply's index appended to its html
A subset of markup will be preserved:
- presently B, I, A, SUP, SUB, UL, OL and LI are preserved
- Other tags' content is converted to plain text.
- Certain tags will be converted one of the preserved tags:
- BIG, CODE and DT are converted to B
- IMG are converted to A linking to the image
- DL are converted to UL
- DD are converted to LI
- List item n-deep nesting is preserved (including with the tags above which are converted to lists)
The left side of the images below are examples from a complex user talk page.
The right side shows simple ajax output [from visualizer.html] of the WIP endpoint's data (as defined above) for the same part of the page.
Note the indentation of both topics, as outlined in blue, and replies, as outlined in red, at their correct respective depths. Also make note of replies (red outline) being correctly distinguished from one another:
Note the preservation of indentation inside a reply, as seen in two places inside the first (outlined in red) reply:
Tables are obviously not going to look the same when we're only preserving their text, but, aside from the lack of pie, the gist of the message is discernible:
Topics from H3 (or greater) are correctly set to depth corresponding to the H tag - as seen on the second topic here (outlined in blue):
CODE tag contents are converted to B so they stand out. The table is not great, but basically readable.
The gist of this table is better preserved (than the earlier table screenshot):
Both ordered and unordered lists in one topic:
List item n-deep nesting is preserved:
Superscript, subscript and DLs becomes ULs: