See https://github.com/valhallasw/plagiabot/issues/2.
Current API output looks like:
[{'lang': 'en', 'page_ns': '0', 'diff_timestamp': '20150816171215', 'ithenticate_id': '19033870', 'project': 'wikipedia', 'report': '<div class="mw-ui-button">[tools.wmflabs.org/eranbot/ithenticate.py?rid=19033870 report]</div>\n* I 64% 61 words at [http://8xm.tv/birthday-of-a-legend-and-his-daughter/ http://8xm.tv/birthday-of-a-legend-and-his-daughter/] <div class="mw-ui-button">[tools.wmflabs.org/copyvios?lang={{subst:CONTENTLANG}}&project={{lc:{{ns:Project}}}}&title=&oldid=676384764&action=compare&url=http://8xm.tv/birthday-of-a-legend-and-his-daughter/ Compare]</div>', 'diff': '676384764'}]
Example API request: http://tools.wmflabs.org/eranbot/plagiabot/api.py?action=suspected_diffs&page_title=Rajesh_Khanna&report=1
Current code lives at: https://github.com/valhallasw/plagiabot/blob/master/webservice/api.py
We should modify the API to also output report id, matching urls, % match, and number of words as separate data. This will likely require some complicated Regexs on the raw results. If it looks like there's any way to get more granular data from the Turnitin/ithenticate API (http://www.ieee.org/documents/iThenticateAPIGuide.pdf), we may want to use that instead. The existing raw 'report' node in the plagiabot API should be preserved for backwards-compatibility.