Page MenuHomePhabricator

Create an output API for Earwig's Copyvio Detector Tool
Closed, ResolvedPublic8 Estimated Story Points

Description

We want to be able to dynamically retrieve the comparison from Earwig's Copyvio Detector Tool from the new Tool Labs interface for EranBot, but this requires an output API on Earwig's side. The API should accept a page title and a single URL and return all the info needed to construct a comparison showing the matching text between the WP article and the possible plagiarism source.

For now, the API should probably just return HTML for the diff, rather than trying to come up with some sort of JSON abstraction.

Event Timeline

DannyH moved this task from New & TBD Tickets to Up Next (June 3-21) on the Community-Tech board.
DannyH set the point value for this task to 8.

Caveat #1: "There is currently no way to get the contents of the article or suspected source, nor can you get the data behind the visual comparison available from the main tool. This may be changed in a future version if there is sufficient demand for it."

@Earwig, do you have any thoughts on how to go about implementing this?

Probably not too crazy, but it depends on the way you want the results presented.

Either way, it's kind of hard to think about this until T125459 is dealt with...

In our sprint meeting today, Niharika asked if we should try using a third party library for generating the comparisons, instead of building an API on Earwig's.

She's investigating:
https://packagist.org/packages/adaptive/php-text-difference

@Earwig: Now that the API stuff is resolved, any more thoughts on this? Is this something that you might be interested in working on or would it be better for us to work on it (with your input)?

I can do the implementation, but it would be helpful to get some suggestions for the output format.

Basically we want the API to return all the HTML that is currently in the 2 cv-chain-detail divs. One should be marked as the article (in the API data scheme) and the other should be marked as the source. We can then reproduce the CSS on our end to style the HTML. You don't need to worry about abstracting the output content (other than splitting it into article and source). Let's keep it simple.

@Earwig do you think it's worthwhile to add CORS support? We can get by using a different browser than our normal one and disabling web security, but obviously not ideal :)