Page MenuHomePhabricator

Investigate technical needs for native mobile diffs, including supporting APIs
Open, NormalPublicSpike

Description

[to be updated]

Event Timeline

LGoto created this task.Tue, Jul 23, 6:48 PM
Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptTue, Jul 23, 6:48 PM
JMinor renamed this task from Compare revision spike to Investigate technical needs for native mobile diffs, including supporting APIs.Tue, Jul 23, 9:24 PM
JMinor triaged this task as Normal priority.
JMinor added subscribers: JoeWalsh, Tsevener, NHarateh_WMF.

Some API notes so far:

The MediaWiki API we have to work with is https://www.mediawiki.org/wiki/API:Compare. This returns changes formatted inside an html blob with table rows and cell tags. This formatting is added in MediaWiki's TableDiffFormatter class. It looks like mobile web overrides this in it's extension with InlineDiffFormatter & InlineDifferenceEngine. I can't find an API endpoint support for inline however.

One other thing I've found is MediaWiki's ArrayDiffFormatter. This seems like it could spit out diff changes in a more structured format for us, but I don't think it's accessible from an API endpoint.

We could make an MCS endpoint that strips table formatting from the compare API response, but I think better would be to see if ArrayDiffFormatter can give us what we need and expose that. Also note there's a UnifiedDiffFormatter too.

First pass at a structured contract. This is based on the changes here.

Single-column mobile view:

also mocked up a double-column option in case we decide to take it that far:

I chatted with @JoeWalsh and we discussed a few things related to this:

  • There is no structured diff data API because of performance reasons. The diffing modules used in production are C++ code that generates HTML directly. Fun times.
  • I'm not sure if Special:MobileDiff has an API, I would assume it doesn't (but ping @phuedx). In any case, we can get the HTML using the fun apioutput skin: https://en.wikipedia.org/wiki/Special:MobileDiff/910663282?useskin=apioutput
    • For any 2 random range of revisions on inline diffs? To be investigated.
  • There is Visual_diffs that VE did. It is also available as a beta feature on desktop that you can enable on preferences. The way it works is by fetching two parsoid HTML revisions and computing the inline diff on the client itself. I'm assuming with JS. It may be worth going asking the Editing-team if the visual diffs feature is self contained enough that it could be API-ified or bundled in the App as a JS library to run on a webview (or maybe as an API like page/mobile-html). Ping @marcella

@Jhernandez thanks for your notes! I'm attaching my finalized contract anyways just for documentation purposes but understood that it won't work with the C++ code.

I looked at the Special:MobileDiff apioutput skin and I think I can glean most of what I need from it. The two things that I don't have are nice-to-have's in this order of importance:

  • A section title (or ID) for each change. This is to display to the user what section the change occurred in, in case the two context lines on either side do not show the section.
  • Extra context chunks to display to the user in a collapsible/expandable format. I figure this is the bigger ask but if it seems like something that's possible let me know and I can get more details on how many lines would be needed. The expectation is not to let the user view the whole article but just provide some extra context lines beyond the default 2.

I'm not sure if VE's visual diffs will work - the ask is to show a wikitext/markup diff rather than a diff of the rendered result.

It does seem like I'm able to get a range of revisions on inline diffs - I'm using this as an example (there's 1 revision in between): https://test.m.wikipedia.org/wiki/Special:MobileDiff/375751...399929?useskin=apioutput

  • There is no structured diff data API because of performance reasons. The diffing modules used in production are C++ code that generates HTML directly. Fun times.

Perhaps we could request that wikidiff2 be modified to output a data structure representing the diff and/or the HTML representing the diff, i.e. extract the HTML outputting into a Formatter class.

Unfortunately, I can find no such API.

Unfortunately, I can find no such API.

Fortunately, I was wrong!

I believe that the compare API will return a diff for a single revision or a revision range, e.g. using the revisions used in comments above):

Note well that the API outputs HTML.

Perhaps we could request that wikidiff2 be modified to output a data structure representing the diff and/or the HTML representing the diff

Data structure representing the diff would be preferable to HTML so we aren't parsing through HTML (or both but, seems like HTML is already covered in the compare API).

After chatting with @pheudx sounds like there's no guarantee that Special:MobileDiff will remain in place and unchanged so we shouldn't depend on it. Assuming we can't get a structured diff from anywhere, is it possible to get these nice-to-have's out of the compare API? Also tagging @Mholloway @Eevans @WDoranWMF in case they have thoughts.

  • A section title/ID for each change. This is to display to the user what section the change occurred in, in case the two context lines on either side do not show the section.
  • Extra context chunks to display to the user in a collapsible/expandable format. I figure this is the bigger ask but if it seems like something that's possible let me know and I can get more details on how many lines would be needed. The expectation is not to let the user view the whole article but just provide some extra context lines beyond the default 2.

@Jdlrobson also tagging Jon - it was pointed out to me that the work in https://phabricator.wikimedia.org/T117279 could help us here. We are working on supporting inline diffs in iOS - it looks like this would save us a step with pulling the separated diff from API:Compare and merging them to be inline. Is there any ETA on when https://phabricator.wikimedia.org/T117279 might make it into prod?

@Tsevener no plan right now. I'd love to work on this, but it's not on our product roadmap so far. Without that, this is reliant on me finding a partner in crime to help me during 10% time (with code review and pushing towards this goal) this might happen sooner. I'd also happily help out if there was a cross team initiative.

“if the visual diffs feature is self contained enough that it could be API-ified or bundled in the App as a JS library to run on a webview (or maybe as an API like page/mobile-html)”

The answer to that is the thing I’m supposed to be checking into re: visual diff performance on T229467. If it’s in a place where it can be put in a web worker then the answer is fairly-inherently “yes”.

My vote would be to use Visual Diffs if it can easily be made to work for you, and otherwise to look seriously at updating wikidiff2 as @phuedx suggested to break HTML styling out into a Formatter class. Regarding the latter, I would much rather see us invest our effort in modernizing the core platform than paper over its shortcomings with another Node.js service.

T117279 looks interesting as well and seems to have a lot of support. Without a deeper dive I'm not sure how that would be helped or hindered by updating wikidiff2. In general I'd prefer to see issues fixed as far upstream as possible.

Come to think of it, I vaguely remember @jlinehan poking around in wikidiff2 for some reason a little while back, so let's ping him for comment, too. :)

Just as a reminder, Toni mentioned:

I'm not sure if VE's visual diffs will work - the ask is to show a wikitext/markup diff rather than a diff of the rendered result.

So it seems like visual diffs are out for the moment.

The lack of a wikitext structured data diffs API is definitely a core platform deficiency. I'm uncertain as to how to proceed given none of the options are particularly great, and going in to a "make diffs better project" is a big project with lots of ramifications. I'm pinging @WDoranWMF just as a FYI, this is a need from the product teams and a gap on the core platform's APIs, maybe you can surface it to the PMs in your team.

Tsevener added a comment.EditedMon, Aug 19, 2:32 PM

@Mholloway @Jhernandez I have been exploring adding the wikidiff2 c++ engine in our Xcode project and it seems to be working well so far. It's not the best long-term solution but it does allow me to change the formatting without cross-platform ramifications, and I have access to the inline diff. I'm no C++ developer but if we decide to continue down this path maybe I could submit a PR to wikidiff2 of my tweaks and someone could take it from there? It's essentially a (hopefully temporary) fork local on the device so, another not-particularly-great solution.

@Tsevener are you tweaking it to output different HTML or to output structured data?

We did a spike some time ago, and found it non-trivial to make those diff engines return structured data.

@Jhernandez well, not really structured. I just replaced the html with json formatting but it's all still being passed around through the same execute method that returns a String. App-side I'm able to instantiate an object from the string returned through a json serializer. Here's example from my InlineDiffJSON class vs InlineDiff.

InlineDiff:

void InlineDiff::printContext(const String & input)
{
	printWrappedLine("<div class=\"mw-diff-inline-context\">", input, "</div>\n");
}

InlineDiffJSON:

void InlineDiffJSON::printContext(const String & input)
{
    if (!noResults)
        result += ",";
    printWrappedLine("{\"type\": \"context\", \"text\": ", "\"" + escape_json(input) + "\"", ", \"highlight-ranges\": []}");
    noResults = false;
}

If the input contained a \" it corrupted the json hence the need for an escape_json method. Anyways it probably won't fly for an API, I was going for the bare minimum changes in the gateway classes to the diff engine.