Page MenuHomePhabricator

Create an output API for Earwig's Copyvio Detector Tool
Closed, ResolvedPublic8 Story Points

Description

We want to be able to dynamically retrieve the comparison from Earwig's Copyvio Detector Tool from the new Tool Labs interface for EranBot, but this requires an output API on Earwig's side. The API should accept a page title and a single URL and return all the info needed to construct a comparison showing the matching text between the WP article and the possible plagiarism source.

For now, the API should probably just return HTML for the diff, rather than trying to come up with some sort of JSON abstraction.

Event Timeline

kaldari created this task.Apr 18 2016, 6:18 PM
DannyH updated the task description. (Show Details)Apr 18 2016, 6:21 PM
DannyH moved this task from Untriaged to Estimated on the Community-Tech board.
DannyH set the point value for this task to 8.
Earwig added a comment.EditedApr 18 2016, 10:04 PM

So http://tools.wmflabs.org/copyvios/api, but a solution for caveat #1?

Caveat #1: "There is currently no way to get the contents of the article or suspected source, nor can you get the data behind the visual comparison available from the main tool. This may be changed in a future version if there is sufficient demand for it."

@Earwig, do you have any thoughts on how to go about implementing this?

Earwig added a comment.May 2 2016, 9:07 PM

Probably not too crazy, but it depends on the way you want the results presented.

Either way, it's kind of hard to think about this until T125459 is dealt with...

In our sprint meeting today, Niharika asked if we should try using a third party library for generating the comparisons, instead of building an API on Earwig's.

She's investigating:
https://packagist.org/packages/adaptive/php-text-difference

Restricted Application added a subscriber: JEumerus. · View Herald TranscriptMay 18 2016, 8:54 PM

@Earwig: Now that the API stuff is resolved, any more thoughts on this? Is this something that you might be interested in working on or would it be better for us to work on it (with your input)?

kaldari updated the task description. (Show Details)Jun 7 2016, 5:42 PM
Earwig added a comment.Jun 7 2016, 6:38 PM

I can do the implementation, but it would be helpful to get some suggestions for the output format.

Basically we want the API to return all the HTML that is currently in the 2 cv-chain-detail divs. One should be marked as the article (in the API data scheme) and the other should be marked as the source. We can then reproduce the CSS on our end to style the HTML. You don't need to worry about abstracting the output content (other than splitting it into article and source). Let's keep it simple.

@Earwig do you think it's worthwhile to add CORS support? We can get by using a different browser than our normal one and disabling web security, but obviously not ideal :)

DannyH moved this task from Estimated to Archive on the Community-Tech board.Jun 20 2016, 8:19 PM
MusikAnimal moved this task from Backlog to Done on the CopyPatrol board.Dec 6 2016, 5:25 AM