Page MenuHomePhabricator

Talkpage endpoint
Closed, ResolvedPublic

Description

Endpoint to make it easier for apps to display talk pages.

The endpoint takes a talk page title (and optional revision id) and returns a structured representation of the talk page, in JSON, preserving only certain elements.

Payload is structured as replies within topics:

Topics:


Topics correspond to sections:

  • id comes from the section id
  • depth corresponds to section depth - i.e. the H tag's number (or 1 if from root level)
  • html comes from section H tag contents
  • shas.html is sha on html appended to id
  • shas.indicator is sha on this topic's replies shas appended to this topic's html (easy way to know if any reply has changed)

Replies:


Replies correspond, as best as can be determined, to individual messages within a topic. The primary challenge is teasing out replies from one another:

  • html is the body of the reply, the boundaries of which are determined by a combined heuristic of message depth and user "signature" detection (user and user talk page links and timestamps are considered for this heuristic)
  • depth corresponds to the level of indentation of the reply (as indicated by depth of nesting within depth indicating tags - ie DL, UL, OL, or similar depth-indicating wiki markup - ie :
  • sha is sha of this reply's index appended to its html

A subset of markup will be preserved:

  • presently B, I, A, SUP, SUB, UL, OL and LI are preserved
  • Other tags' content is converted to plain text.
  • Certain tags will be converted one of the preserved tags:
    • BIG, CODE and DT are converted to B
    • IMG are converted to A linking to the image
    • DL are converted to UL
    • DD are converted to LI
  • List item n-deep nesting is preserved (including with the tags above which are converted to lists)

The left side of the images below are examples from a complex user talk page.
The right side shows simple ajax output [from visualizer.html] of the WIP endpoint's data (as defined above) for the same part of the page.

Note the indentation of both topics, as outlined in blue, and replies, as outlined in red, at their correct respective depths. Also make note of replies (red outline) being correctly distinguished from one another:

Note the preservation of indentation inside a reply, as seen in two places inside the first (outlined in red) reply:

Tables are obviously not going to look the same when we're only preserving their text, but, aside from the lack of pie, the gist of the message is discernible:

Topics from H3 (or greater) are correctly set to depth corresponding to the H tag - as seen on the second topic here (outlined in blue):

CODE tag contents are converted to B so they stand out. The table is not great, but basically readable.

The gist of this table is better preserved (than the earlier table screenshot):

Both ordered and unordered lists in one topic:

List item n-deep nesting is preserved:

Superscript, subscript and DLs becomes ULs:

Details

Related Gerrit Patches:
mediawiki/services/mobileapps : masterAdd talk page endpoint

Event Timeline

Mhurd created this task.Apr 16 2019, 9:24 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 16 2019, 9:24 PM

These were a couple of ideas we kicked around for a contract. Main point is to have both display text and unaltered text returned. The display text has already filtered out things like templates and images, and we append a reply to the unaltered text to send back to the server so we aren't overwriting with filtered data.

talk.json shows a nested parent/child relationship (though not necessarily needed for display as long as we have depth)
talk2.json is more of a flat structure

JoeWalsh updated the task description. (Show Details)Apr 17 2019, 4:58 PM
phuedx added a subscriber: phuedx.Apr 18 2019, 9:12 AM

A few requests for the spec:

  • should be possible to request talk topics for a specific revision. When I post a new talk topic i should get the new revision id and i should be able to request the talk topics for that new revision so I can refresh the UI.
  • should autosign write operations (making sure not to double sign) - we have code doing this in MobileFrontend and would love to move this to the server.
LGoto triaged this task as Medium priority.Apr 23 2019, 8:25 PM
Mhurd added a comment.May 3 2019, 6:11 PM

@Jdlrobson This endpoint doesn't post/write.

Mhurd added subscribers: cmadeo, JMinor.EditedMay 3 2019, 6:34 PM

Questions for @JMinor & @cmadeo:

Should we preserve superscript & subscript tags? (in addition to bold, italic, anchor & list items):

For code tag, which normally appears as a monospace font with a light gray background, should we convert the code text to bold so it stands out?:

Tables don't look especially good as text. Roll with it for now?:

cmadeo added a comment.May 6 2019, 5:43 PM

@Mhurd except for tables, how hard would it be to support superscript and subscript tags as well as code blocks?
I think if it's easy to do, these would all be great to have but for me they're not top priority.

Mhurd added a comment.May 6 2019, 6:20 PM

@cmadeo on the endpoint side, for superscript/subscript, I don't think it would be too tough to preserve them. I'm unsure though how tricky the native side handling of this would be though... hopefully not too bad... iirc attributed strings have a baselineOffset property which may help.

For the code block, similarly, the endpoint side isn't too bad, but we'd need to decide whether the native presentation simply bolds or italic's such blocks or if we want to get fancier and actually set the attributed string paragraph styling to mimic the html look...

Mhurd added a comment.EditedMay 10 2019, 2:17 AM

WIP on my fork:

https://github.com/wikimedia/mediawiki-services-mobileapps/compare/master...montehurd:talk

(includes a couple temp .on-save files and one custom run command in the package.json that I'll remove when done)

Mhurd updated the task description. (Show Details)May 11 2019, 1:56 AM

Change 509898 had a related patch set uploaded (by Mhurd; owner: Mhurd):
[mediawiki/services/mobileapps@master] Initial work for endpoint to deliver structured user talk page data to apps.

https://gerrit.wikimedia.org/r/509898

Mhurd updated the task description. (Show Details)May 13 2019, 5:38 PM
Mhurd updated the task description. (Show Details)
Mhurd updated the task description. (Show Details)May 14 2019, 6:20 PM
Mhurd updated the task description. (Show Details)
Mhurd updated the task description. (Show Details)May 14 2019, 6:23 PM
Mhurd updated the task description. (Show Details)Jun 12 2019, 2:59 AM
Mhurd updated the task description. (Show Details)Jun 12 2019, 3:05 AM
Mhurd updated the task description. (Show Details)Jun 12 2019, 3:11 AM
Mhurd updated the task description. (Show Details)
Mhurd updated the task description. (Show Details)Jun 12 2019, 3:18 AM
Mhurd updated the task description. (Show Details)
Mhurd updated the task description. (Show Details)Jun 12 2019, 3:22 AM
Mhurd updated the task description. (Show Details)Jun 12 2019, 3:24 AM
Mhurd updated the task description. (Show Details)Jun 12 2019, 3:27 AM
Mhurd updated the task description. (Show Details)Jun 12 2019, 3:30 AM
Mhurd updated the task description. (Show Details)
Mhurd updated the task description. (Show Details)Jun 12 2019, 3:33 AM
Mhurd updated the task description. (Show Details)Jun 12 2019, 3:36 AM
Mhurd updated the task description. (Show Details)
Mhurd updated the task description. (Show Details)Jun 12 2019, 3:42 AM
Mhurd updated the task description. (Show Details)Jun 12 2019, 3:44 AM
Mhurd updated the task description. (Show Details)Jun 12 2019, 3:47 AM
Mhurd updated the task description. (Show Details)Jun 12 2019, 3:50 AM
Mhurd added a subscriber: bearND.Jun 12 2019, 4:18 AM

Just a heads-up, per @bearND after the endpoint is merged (at which point it's immediately available in staging here http://appservice.wmflabs.org/en.wikipedia.org/v1/page/talk/User_talk:Brion_VIBBER) they still have a couple (relatively minor) things on their end that need to happen before it will appear in production restbase:

  • write swaggers spec
  • some restbase code to expose the endpoint
Mhurd updated the task description. (Show Details)Jun 12 2019, 4:20 AM
Mhurd updated the task description. (Show Details)
Mhurd added a comment.EditedJun 12 2019, 4:46 AM

@bearND

Regarding future use of the endpoint for article talk pages, things look... pretty ok? I found and fixed one bug, but otherwise, with the exception of the giant templates at the top of some articles, things looks pretty much as expected:

  • Talk:cat

  • Giant template at top of Talk:cat (excluding these would be super easy, but the current app design is partially collapsing the first section iirc, so may not be a big deal?):

  • Talk:dog

  • Talk:horse

Mhurd updated the task description. (Show Details)Jun 12 2019, 4:50 AM
Mhurd updated the task description. (Show Details)

Change 509898 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Add talk page endpoint

https://gerrit.wikimedia.org/r/509898

Mhurd added a comment.Jun 12 2019, 9:01 PM

Now live at staging! http://appservice.wmflabs.org/en.wikipedia.org/v1/page/talk/User_talk:Brion_VIBBER

Moved to "Waiting for Build" awaiting the R.I. tasks I mentioned here https://phabricator.wikimedia.org/T221148#5252175 < these need to happened before it will appear in live restbase.

JMinor removed Mhurd as the assignee of this task.Jun 28 2019, 4:47 PM
JMinor raised the priority of this task from Medium to High.
JMinor closed this task as Resolved.Jul 10 2019, 3:56 PM
JMinor claimed this task.

Well done and nicely documented!