Page MenuHomePhabricator

Talkpage endpoint
Closed, ResolvedPublic

Description

Endpoint to make it easier for apps to display talk pages.

The endpoint takes a talk page title (and optional revision id) and returns a structured representation of the talk page, in JSON, preserving only certain elements.

Payload is structured as replies within topics:

Topics:

Screen Shot 2019-06-11 at 7.59.16 PM.png (656×1 px, 100 KB)

Topics correspond to sections:

  • id comes from the section id
  • depth corresponds to section depth - i.e. the H tag's number (or 1 if from root level)
  • html comes from section H tag contents
  • shas.html is sha on html appended to id
  • shas.indicator is sha on this topic's replies shas appended to this topic's html (easy way to know if any reply has changed)

Replies:

Screen Shot 2019-06-11 at 8.06.47 PM.png (334×1 px, 129 KB)

Replies correspond, as best as can be determined, to individual messages within a topic. The primary challenge is teasing out replies from one another:

  • html is the body of the reply, the boundaries of which are determined by a combined heuristic of message depth and user "signature" detection (user and user talk page links and timestamps are considered for this heuristic)
  • depth corresponds to the level of indentation of the reply (as indicated by depth of nesting within depth indicating tags - ie DL, UL, OL, or similar depth-indicating wiki markup - ie :
  • sha is sha of this reply's index appended to its html

A subset of markup will be preserved:

  • presently B, I, A, SUP, SUB, UL, OL and LI are preserved
  • Other tags' content is converted to plain text.
  • Certain tags will be converted one of the preserved tags:
    • BIG, CODE and DT are converted to B
    • IMG are converted to A linking to the image
    • DL are converted to UL
    • DD are converted to LI
  • List item n-deep nesting is preserved (including with the tags above which are converted to lists)

The left side of the images below are examples from a complex user talk page.
The right side shows simple ajax output [from visualizer.html] of the WIP endpoint's data (as defined above) for the same part of the page.

Note the indentation of both topics, as outlined in blue, and replies, as outlined in red, at their correct respective depths. Also make note of replies (red outline) being correctly distinguished from one another:

Screen Shot 2019-06-11 at 8.18.15 PM.png (1×1 px, 661 KB)

Note the preservation of indentation inside a reply, as seen in two places inside the first (outlined in red) reply:

Screen Shot 2019-06-11 at 8.19.57 PM.png (1×1 px, 762 KB)

Tables are obviously not going to look the same when we're only preserving their text, but, aside from the lack of pie, the gist of the message is discernible:

Screen Shot 2019-06-11 at 8.21.41 PM.png (1×1 px, 746 KB)

Topics from H3 (or greater) are correctly set to depth corresponding to the H tag - as seen on the second topic here (outlined in blue):

Screen Shot 2019-06-11 at 8.23.03 PM.png (1×1 px, 724 KB)

CODE tag contents are converted to B so they stand out. The table is not great, but basically readable.

Screen Shot 2019-06-11 at 8.26.56 PM.png (907×1 px, 409 KB)

The gist of this table is better preserved (than the earlier table screenshot):

Screen Shot 2019-06-11 at 8.29.58 PM.png (908×1 px, 393 KB)

Both ordered and unordered lists in one topic:

Screen Shot 2019-06-11 at 8.31.41 PM.png (908×1 px, 407 KB)

List item n-deep nesting is preserved:

Screen Shot 2019-06-11 at 8.46.46 PM.png (1×1 px, 523 KB)

Superscript, subscript and DLs becomes ULs:

Screen Shot 2019-06-11 at 8.48.41 PM.png (791×1 px, 273 KB)

Event Timeline

These were a couple of ideas we kicked around for a contract. Main point is to have both display text and unaltered text returned. The display text has already filtered out things like templates and images, and we append a reply to the unaltered text to send back to the server so we aren't overwriting with filtered data.

talk.json shows a nested parent/child relationship (though not necessarily needed for display as long as we have depth)
talk2.json is more of a flat structure

A few requests for the spec:

  • should be possible to request talk topics for a specific revision. When I post a new talk topic i should get the new revision id and i should be able to request the talk topics for that new revision so I can refresh the UI.
  • should autosign write operations (making sure not to double sign) - we have code doing this in MobileFrontend and would love to move this to the server.
LGoto triaged this task as Medium priority.Apr 23 2019, 8:25 PM

@Jdlrobson This endpoint doesn't post/write.

Questions for @JMinor & @cmadeo:

Should we preserve superscript & subscript tags? (in addition to bold, italic, anchor & list items):

Screen Shot 2019-05-03 at 11.05.17 AM.png (534×3 px, 299 KB)

For code tag, which normally appears as a monospace font with a light gray background, should we convert the code text to bold so it stands out?:

Screen Shot 2019-05-03 at 11.14.42 AM.png (582×3 px, 421 KB)

Tables don't look especially good as text. Roll with it for now?:

Screen Shot 2019-05-03 at 11.04.47 AM.png (1×3 px, 597 KB)

@Mhurd except for tables, how hard would it be to support superscript and subscript tags as well as code blocks?
I think if it's easy to do, these would all be great to have but for me they're not top priority.

@cmadeo on the endpoint side, for superscript/subscript, I don't think it would be too tough to preserve them. I'm unsure though how tricky the native side handling of this would be though... hopefully not too bad... iirc attributed strings have a baselineOffset property which may help.

For the code block, similarly, the endpoint side isn't too bad, but we'd need to decide whether the native presentation simply bolds or italic's such blocks or if we want to get fancier and actually set the attributed string paragraph styling to mimic the html look...

WIP on my fork:

https://github.com/wikimedia/mediawiki-services-mobileapps/compare/master...montehurd:talk

(includes a couple temp .on-save files and one custom run command in the package.json that I'll remove when done)

Change 509898 had a related patch set uploaded (by Mhurd; owner: Mhurd):
[mediawiki/services/mobileapps@master] Initial work for endpoint to deliver structured user talk page data to apps.

https://gerrit.wikimedia.org/r/509898

Mhurd updated the task description. (Show Details)
Mhurd updated the task description. (Show Details)
Mhurd updated the task description. (Show Details)
Mhurd updated the task description. (Show Details)
Mhurd updated the task description. (Show Details)
Mhurd updated the task description. (Show Details)

Just a heads-up, per @bearND after the endpoint is merged (at which point it's immediately available in staging here http://appservice.wmflabs.org/en.wikipedia.org/v1/page/talk/User_talk:Brion_VIBBER) they still have a couple (relatively minor) things on their end that need to happen before it will appear in production restbase:

  • write swaggers spec
  • some restbase code to expose the endpoint
Mhurd updated the task description. (Show Details)

@bearND

Regarding future use of the endpoint for article talk pages, things look... pretty ok? I found and fixed one bug, but otherwise, with the exception of the giant templates at the top of some articles, things looks pretty much as expected:

  • Talk:cat

Screen Shot 2019-06-11 at 9.42.51 PM.png (1×1 px, 1 MB)

  • Giant template at top of Talk:cat (excluding these would be super easy, but the current app design is partially collapsing the first section iirc, so may not be a big deal?):

Screen Shot 2019-06-11 at 9.42.35 PM.png (1×1 px, 798 KB)

  • Talk:dog

Screen Shot 2019-06-11 at 9.43.39 PM.png (1×1 px, 867 KB)

  • Talk:horse

Screen Shot 2019-06-11 at 10.00.11 PM.png (1×1 px, 663 KB)

Mhurd updated the task description. (Show Details)

Change 509898 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Add talk page endpoint

https://gerrit.wikimedia.org/r/509898

Now live at staging! http://appservice.wmflabs.org/en.wikipedia.org/v1/page/talk/User_talk:Brion_VIBBER

Moved to "Waiting for Build" awaiting the R.I. tasks I mentioned here https://phabricator.wikimedia.org/T221148#5252175 < these need to happened before it will appear in live restbase.

JMinor raised the priority of this task from Medium to High.
JMinor claimed this task.

Well done and nicely documented!