Make plagiabot API output report id, matching urls, % match, and number of words as separate data
Closed, DeclinedPublic5 Estimated Story Points
Actions

Assigned To

None

Authored By

	kaldari
	Aug 28 2015, 9:37 PM

Description

See https://github.com/valhallasw/plagiabot/issues/2.

Current API output looks like:

[{'lang': 'en', 'page_ns': '0', 'diff_timestamp': '20150816171215', 'ithenticate_id': '19033870', 'project': 'wikipedia', 'report': '<div class="mw-ui-button">[tools.wmflabs.org/eranbot/ithenticate.py?rid=19033870 report]</div>\n* I 64% 61 words at [http://8xm.tv/birthday-of-a-legend-and-his-daughter/ http://8xm.tv/birthday-of-a-legend-and-his-daughter/] <div class="mw-ui-button">[tools.wmflabs.org/copyvios?lang={{subst:CONTENTLANG}}&project={{lc:{{ns:Project}}}}&title=&oldid=676384764&action=compare&url=http://8xm.tv/birthday-of-a-legend-and-his-daughter/ Compare]</div>', 'diff': '676384764'}]

Example API request: http://tools.wmflabs.org/eranbot/plagiabot/api.py?action=suspected_diffs&page_title=Rajesh_Khanna&report=1

Current code lives at: https://github.com/valhallasw/plagiabot/blob/master/webservice/api.py

We should modify the API to also output report id, matching urls, % match, and number of words as separate data. This will likely require some complicated Regexs on the raw results. If it looks like there's any way to get more granular data from the Turnitin/ithenticate API (http://www.ieee.org/documents/iThenticateAPIGuide.pdf), we may want to use that instead. The existing raw 'report' node in the plagiabot API should be preserved for backwards-compatibility.

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T116957 Plagiarism detection tools for text (tracking)
Resolved	• Fhocutt	T110144 Integrate Turnitin (as used in Plagiabot) into Copyvio Detector tool [AOI]
Declined	None	T110743 Make plagiabot API output report id, matching urls, % match, and number of words as separate data

Event Timeline

kaldari created this task.Aug 28 2015, 9:37 PM

kaldari raised the priority of this task from to Needs Triage.

kaldari updated the task description. (Show Details)

kaldari added a project: Community-Tech.

kaldari subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 28 2015, 9:37 PM

kaldari triaged this task as Medium priority.Aug 28 2015, 9:37 PM

kaldari moved this task from New & TBD Tickets to Ready on the Community-Tech board.

kaldari set Security to None.

kaldari added a parent task: T110144: Integrate Turnitin (as used in Plagiabot) into Copyvio Detector tool [AOI].Aug 28 2015, 9:41 PM

kaldari updated the task description. (Show Details)

kaldari moved this task from Ready to Blocked on the Community-Tech board.Aug 29 2015, 12:43 AM

Moving this task to In Analysis pending an answer to the question about the public assessibility of Turnitin sources for diff analysis: T110144#1573302

kaldari mentioned this in T108422: [AOI] Investigation: Can we improve Copyvio Detector?.Sep 9 2015, 8:27 PM

kaldari moved this task from Blocked to Older: Team Work on the Community-Tech board.Sep 11 2015, 9:13 PM

kaldari updated the task description. (Show Details)Sep 25 2015, 8:38 PM

kaldari updated the task description. (Show Details)Sep 25 2015, 8:42 PM

kaldari updated the task description. (Show Details)Sep 25 2015, 8:48 PM

kaldari updated the task description. (Show Details)

kaldari updated the task description. (Show Details)Sep 25 2015, 8:52 PM

kaldari added a project: Community-Tech-Sprint.Sep 29 2015, 5:28 PM

kaldari edited a custom field.Sep 29 2015, 5:49 PM

kaldari moved this task from Older: Team Work to Needs Discussion on the Community-Tech board.Sep 30 2015, 8:31 PM

• Fhocutt claimed this task.Oct 9 2015, 2:30 AM

• Fhocutt moved this task from Ready to In Development on the Community-Tech-Sprint board.

• DannyH renamed this task from [AOI] Make plagiabot API output report id, matching urls, % match, and number of words as separate data to Make plagiabot API output report id, matching urls, % match, and number of words as separate data [AOI].Oct 28 2015, 7:05 PM

• DannyH removed a project: Community-Tech.Oct 28 2015, 7:09 PM

• DannyH added a parent task: T116957: Plagiarism detection tools for text (tracking).Oct 28 2015, 7:40 PM

kaldari added a project: Community-Tech.Oct 28 2015, 10:45 PM

kaldari removed a project: Community-Tech.

After talking with Frances, we agreed that it would probably make more sense to implement this on the Copyvio Detector tool side, rather than the API side, since the Copyvio Detector tool is the only thing that needs this and it's going to have to be implemented as a RegEx hack either way.

kaldari moved this task from In Development to Q1 2018-19 on the Community-Tech-Sprint board.Nov 2 2015, 9:26 PM

• DannyH edited projects, added Community-Tech; removed Community-Tech-Sprint.Nov 10 2015, 7:56 PM

• DannyH moved this task from Needs Discussion to Archive on the Community-Tech board.Nov 12 2015, 6:36 PM

Reopening, since this will be useful for the new copy and paste bot interface as well.

kaldari removed • Fhocutt as the assignee of this task.Apr 12 2016, 9:29 PM

kaldari moved this task from Archive to Up Next (June 3-21) on the Community-Tech board.

kaldari added a subscriber: • Fhocutt.

kaldari edited subscribers, added: eranroz; removed: • Fhocutt.

Niharika subscribed.Apr 13 2016, 3:36 AM

• DannyH renamed this task from Make plagiabot API output report id, matching urls, % match, and number of words as separate data [AOI] to Make plagiabot API output report id, matching urls, % match, and number of words as separate data.Apr 18 2016, 10:41 PM

• DannyH subscribed.

Niharika edited projects, added Community-Tech-Sprint; removed Community-Tech.Apr 29 2016, 10:54 AM

Niharika moved this task from Q1 2018-19 to Ready on the Community-Tech-Sprint board.Apr 29 2016, 1:29 PM

In T110743#2201095, @kaldari wrote:

Reopening, since this will be useful for the new copy and paste bot interface as well.

Ryan, since we're making use of direct DB access in the app, is this task still useful?
Should we do this in the CopyPatrol app instead of trying to modify Plagiabot code?

@Niharika: Yeah, I think doing this directly in the CopyPatrol app is fine. I'll reclose.

kaldari edited projects, added Community-Tech; removed Community-Tech-Sprint.May 2 2016, 4:51 PM

kaldari moved this task from Up Next (June 3-21) to Archive on the Community-Tech board.

Make plagiabot API output report id, matching urls, % match, and number of words as separate dataClosed, DeclinedPublic5 Estimated Story PointsActions

Description

Related ObjectsSearch...

Event Timeline

Make plagiabot API output report id, matching urls, % match, and number of words as separate data
Closed, DeclinedPublic5 Estimated Story Points
Actions

Related Objects
Search...