Value proposition
As a user who patrols articles frequently, I often need to find the user who added an offensive/incorrect sentence in an article so I can contact them about it. Browsing through the revision history, it can take a long while to figure out who added the sentence. A tool which can help immediately figure out the author of a highlighted sentence, along with information about which revision it was added in will be helpful to my work.
Who wants it?
There is no doubt about how badly this tool is wanted. It has been requested in the 2015, 2016 and 2017 wishlist surveys, where it finally made it to #4. It has also been requested in Phabricator many times. A partial list of past tickets are linked from T2639: [Epic] Add feature annotate/blame command, to indicate who last changed each line / word, which itself has been open since 2004.
Existing tools
- WikiBlame by User:Flominator. (Documentation. Code)
- External tool (not hosted on ToolForge)
- Can perform linear and binary search on articles for a given wiki, article and phrase.
- Takes input about number of revisions to check, number of revisions to skip, date to check article since. Presumably this is to make the query faster.
- ArticleBlamer on XTools
- Has been removed in the past
- WikiReplay
- Edit by edit replay of the evolution of a page
- SO COOL
- WikiBlame by MatmaRex (Code)
- The heroku instance for the app seems not to work currently
- Artikel Statistik user script
- WhoColor browser extension
- Ongoing development supported by GESIS - Leibniz Institute for the Social Sciences
- There are Tampermonkey and Greasemonkey browser scripts for enabling the tool on-wiki
- It's fast and works even for big articles.
- Several languages are already supported with plans to support more.
- Has an API.
- Based on the above factors, we have decided to use WikiWho as the backend for this project.
Project requirements
MVP
- A browser extension that...
- Allows a user to click on any word in an article and provides information about the author, revision number (with link), date, percentage ownership of the article etc.
- Highlights all content by a single user at time. Note that only one author's text is highlighted at a time.
- Does not highlight transcluded content and table content
Potential future improvements
- Making this into a gadget
- Support for templates, transclusions and tables
FAQ
- What all does the API accept and return?
- There are two APIs:
- WikiWho API: Tokenizes the wikitext for a given revision and returns the revision id and author id for each token.
- WhoColor API: Convenience API that generates the following - extended html with span tags around article tokens (not wikitext), array of editors, array of tokens (wikitext) along with revision and editor information
- There are two APIs:
- What's the extended html?
Extended html is produced by wrapping spans around content tokens and special wikitext markup is converted to html tags. For example ''' is transformed to <b>. Here's a snippet for the article Native carrot:
<b><span class=\"editor-token token-editor-111359\" id=\"token-3\">Native</span> <span class=\"editor-token token-editor-111359\" id=\"token-4\"> carrot</span></b> <span class=\"editor-token token-editor-14423536\" id=\"token-8\"> is</span> <span class=\"editor-token token-editor-14423536\" id=\"token-9\"> a</span> <span class=\"editor-token token-editor-14423536\" id=\"token-10\"> common</span> <span class=\"editor-token token-editor-14423536\" id=\"token-11\"> name</span>
You can see the entire extended html in the api response. The token-editor- in the class is to indicate which editor added that token. Each token is also given a unique id.
- Do we need to fetch the entire blame map for a page in one go?
- Looks like the answer is yes. Here's an example API call and here's the API spec.