Add feature annotate/blame command, to indicate who last changed each line / word
Open, LowPublic


Author: inedible_bulk

I have had many times where I would continuously go through a history to find
out who added an offending line, or a curious line which I need to contact them
about. As various people mentioned here (Such as --TK) did not sign, it would
take a while to figure out exactly who TK was. It would be rather nice to be
able to highlight/search a line, and it would tell me times that that line was
affected, which would allow me to easily find who added said line.

This is a feature request, and as so, I labeled it an enhancement, as there's no
easy way to request features. Apologies if I did this wrong. I also searched
"Line" and only found a very few bugs, none of which like this.

Version: unspecified
Severity: enhancement
See Also:


Unknown Object (Diffusion Commit)
bzimport set Reference to bz639.
bzimport added a subscriber: Unknown Object (MLST).
bzimport created this task.Oct 3 2004, 9:47 PM
brion added a comment.Oct 4 2004, 12:15 AM

CVS has this on a line-by-line basis, so it's theoretically doable for a word-oriented check (since we have
paragraph-oriented text, 'lines' are whole paragraphs and that's not as useful). However I suspect it's
optimized by CVS's diff-based storage.

This would be spiffy indeed, but it's likely an expensive operation. (Particularly as some pages have
thousands of revisions.) Something to keep in mind for the future.

Also note that when text is rearranged the results may be misleading.

inedible_bulk wrote:

I've not seen the CVS in action (Unless that's the test wiki). Basically
tracking the origins of a paragraph would be a great improvement as it is, the
line basis is just nitpicky as I really had meant paragraph _i guess_ to begin

I just wanted to track the history of a line, being a comment by a person, which
would in itself also be a paragraph. As words can be added to lines at any time
(and i mean non-paragraph, wordwrapped lines) as well as the length (see
wordwrapping), this would be very processor intensive, and only slightly more
useful than paragraph history tracing.

brion added a comment.Oct 5 2004, 5:35 AM

I should clarify that I'm talking about CVS itself, the 'cvs annotate' command. It gives output like this,
marking each line with the revision number, user, and date that that line was last changed:

1.1 (eloquenc 28-Feb-04): if ( "" == $title && "delete" != $action ) {
1.58 (zhengzhu 22-Sep-04): $wgTitle = Title::newFromText( wfMsgForContent( "mainpage" ) );
1.10 (vibber 08-Mar-04): } elseif ( $curid = $wgRequest->getInt( 'curid' ) ) {
1.1 (eloquenc 28-Feb-04): # URLs like this are generated by RC, because rc_title isn't always
1.10 (vibber 08-Mar-04): $wgTitle = Title::newFromID( $curid );
1.1 (eloquenc 28-Feb-04): } else {
1.1 (eloquenc 28-Feb-04): $wgTitle = Title::newFromURL( $title );
1.1 (eloquenc 28-Feb-04): }

inedible_bulk wrote:

(In reply to comment #3)

I should clarify that I'm talking about CVS itself, the 'cvs annotate'

command. It gives output like this,

marking each line with the revision number, user, and date that that line was

last changed:

Ah, I understand now. Yes, this feature (In paragraph form if not lineform)
would be excellent in mediawiki/wikipedia.

brion added a comment.Mar 7 2005, 10:30 PM
  • Bug 1652 has been marked as a duplicate of this bug. ***

andrea.m wrote:

*** Bug 1827 has been marked as a duplicate of this bug. ***

robchur wrote:

(In reply to comment #2)

I've not seen the CVS in action (Unless that's the test wiki).

Just in case you weren't aware, CVS (Concurrent Versions System) is the source
control tool used by the developers. The annotate command is often used to find
out who broke what part of the code. :)

ezyang wrote:

*** Bug 4796 has been marked as a duplicate of this bug. ***

ezyang wrote:

This feature is called blame in Subversion. I don't think it's feasible on a per
sentence basis, and we shouldn't worry about getting that out first. I really
think this would be useful.

Unfortunantely, it does seem to be an expensive operation (even Subversion says
so). How would it work? Hmm...

If we had delta based histories, getting a blame operation would be a simple
matter of scrolling backwards in the history in increments, matching the diffs
to current lines until all the lines had been matched, and then spitting that
out. However, we have a sort of compressed fulltext history thing, with diffs
computed on the fly (correct me if I'm wrong).

So, it would indicate to me, that the solution would be to generate these delta
histories when a blame is requested, and then keep it on file for the rest of
eternity. This, however, increases redundancy, and has its own synchronization
problems. Perhaps a move to delta compression is in order? Or has it already


::is thoroughly confused, but would really like the feature::

brion added a comment.Jan 30 2006, 1:43 AM

At WikiSym, a guy was showing off some work he was doing on this kind of stuff.
He was basically running the comparisons offline and building a parallel
database which could be then queried quickly. Once built, additional diffs can
be added in pretty fast as well, at least in theory.

ezyang wrote:

Implementation of blame

So, what this attachment does is it creates a blame() function, which takes an
array of revisions, and computes the diff in the form of an Annotation object.
See the SimpleTest testcase: it works. It's horrible code though, but I was
hoping to get it running on the Toolserver (unfortunantely, pulling revisions
from the database is also a horribly complicated problem, albeit one that can
be bypassed).

attachment index.php ignored as obsolete

ezyang wrote:

Defines Annotation class for annotating based on revisions

Much cleaner code, having been rewritten. A test suite is also going to be
uploaded for it. Still needs integration and a AnnotationPrinter.

Attached: Annotation.php

ezyang wrote:

Test suite for Annotation package.

Test suite for the annotation package. After all, TDD is good.

Attached: Annotation.test.php

ezyang wrote:

With the implementation of the Annotation in place, there are several more tasks
to do:

  1. Hook this code up to a special page
  2. Create a new table annotations for storing the cached annotations
  3. Create a maintenance script that will munch through all pages and generate

all initial annotations

  1. Create an AnnotationPrinter
  2. Add a hook to edit saves that recompiles the annotation

2, 3 and 5 are necessary in order to make this sort of extension efficient
enough for a huge wiki like English Wikipedia.

Any comments???

ayg wrote:

*** Bug 7366 has been marked as a duplicate of this bug. ***

ezyang wrote:

I've decided to unassign the bug to me. This is a very tricky piece of software
to implement and I don't think I'd be most qualified to do it. That's not to say
that the code isn't any good, but it still needs to be integrated with MediaWiki.

gribeco wrote:

I really would like to know who was the *first* to introduce a given
sentence/paragraph, so I can hunt down copyright violators and kill them =)

ayg wrote:

That requires considerably more complexity. You have to decide what happens
when lines are split or merged or moved, to begin with.

I think that running an annotation on a page every time it's saved would make
saving /very/ slow on pages with large histories. My suggestion would be /only/
updating the annotation for the changed lines, rather than redoing the entire

Maybe a crazy idea, but anyway: I started using git (the version control tool used
for the linux kernel) two weeks ago and am already amazed at it's power and
flexibility. It's very fast and has good tools for searching through history.
Maybe the whole Wikipedia history could be imported into git? After that, new page
saves would be added as new commits; as this is very fast in git, it won't represent
a problem for the servers.

To make the git idea more practical, it would also be possible to have a git repository for each
wikipedia page; git is very space efficient, so this would not be a problem (I think it would
probably need less space than the DB) and the repositories could be stored on different servers.
As pages are effectively independent from each other, so a shared repository wouldn't have many

robchur wrote:

That would require gutting MediaWiki's internals, breaking compatibility with
huge amounts of other implementations; requiring the use of another piece of
software, and *could* introduce serious performance problems, despite the "speed
of git", as it were. The current use of the database is optimised in various
places for speed and overall load balancing as it is.

A "blame" command would be nice to have, but it's going to need a sane
implementation, not a radical reorganising of literal terabytes of information.

sean_woolcock wrote:

I have had many times where I would continuously go through a history to find
out who added an offending line, or a curious line which I need to contact them

Me too; it sucks!

But note that a full-on CVS/Subversion line-by-line "annotate"
command is more than this feature really needs to be. All you
really need is a box where you can type some text, and click
"Find first version of this article containing this text".

The code could just look at revisions of the article in
a binary-search fashion, so it would be fast. Here's a
quick implementation in Perl:

ayg wrote:

Binary search is unacceptable for this. It can return incorrect results in the case of reversions.

robchur wrote:

*** Bug 9455 has been marked as a duplicate of this bug. ***

bugs wrote:

I'll repost my request 9455 here, as it's rather simpler to implement than the

original request, and possibly less expensive:

It would be useful to be able to search in the prior revisions of a page in two

  • Search backwards to find the first time when a specified piece of text appears

(ie, when it was added)

  • Search backwards to find the last time that a specified piece of text appears

(ie, when it was removed)

Ideally one day it would be great to be able to click on text and see who added.
But in the meantime, it would be great to simply be able to search for a phrase
like "He was a supporter of Hitler." and to be able to leap to the revision when
that text first appeared.

(a slightly souped up version might show a condensed history consisting of
groups of revisions where the phrase appears at least once followed by groups of

revisions where it doesn't appear at all)

I notice that it would not be susceptible to whole paragraphs being moved around
as Brion commented. Since we would only be detecting whether the given phrase
exists or not, two successive diffs where the phrase existed (but in different
locations) would be treated the same. It ought to be less expensive as there is
no diffing involved: just a simple text search: Does the phrase exist in
revision T-1? No. Does the phrase exist in revision T-2? No. Does the phrase
exist in T-3? Yes. Stop.

ayg wrote:

*** Bug 10031 has been marked as a duplicate of this bug. ***

There is an extension [1] that does this now. WONTFIX?


ayg wrote:

No. This is an important feature for reasonably effective version control and should be in core if at all possible.

inedible_bulk wrote:

I was checking out the article on Noah Webster for americanized words, and noticed that the section on it seemed to incorrectly reference american words as british, and vice versa. I wasn't sure where the problem lied (was it specifying them wrong or had they been swapped), so I checked a bit older version which had them correctly. It took a few nexts (as I had not realized it was so recent) to find the culprit:

Some users might have just thought that it was possibly old vandalism and just corrected it by hand. The problem there, as evidenced by the edit I link, is that there was more vandalism than just the section I had noticed it in. The benefit of a blame system shines here, where I can see which revision the edit occurred in and spot additional, previously hidden, edits.

I'm back at my bug, 3 years and 6 dupes later, and I can't really see what the exact status of this bug is. I do like the new partial undo feature though, that is really nice.

ayg wrote:

The most important point for this bug is that it's not at all simple to do with a relational database system. If we had something like git or Bazaar as a backend for revision storage, it would be trivial. The interesting questions at this point seem to be

1a) If someone were to implement version storage for MediaWiki on top of something like git or Bazaar in a manner that doesn't sacrifice existing efficiency, is the Wikimedia Foundation willing to put in the time and effort to transfer the major projects? Or even the minor projects, to start with? (Probably not going to get an unambiguous "yes" here without progress on (2a).)

1b) If so, is anyone willing to do it? (So far, no, and probably not going to be yes unless (1a) is fulfilled.)

2a) Is it possible to implement blame efficiently and scalably on top of an RDBMS? (No evidence for a yes to this that I've seen: Ambush Commander admits that his work is not efficient enough for use right now.)

2b) If so, is anyone willing and able to do it? (So far, no, and definitely not going to be yes unless (2a) is fulfilled.)

The picture is unlikely to change at any time in the foreseeable future, unless we get someone to step forward and put in a lot of work that may or may not end up amounting to something. Put another way, in standard open-source fashion: if you really want it, you're going to have to write it yourself.

  • Bug 13927 has been marked as a duplicate of this bug. ***
  • Bug 18810 has been marked as a duplicate of this bug. ***
demon added a comment.Jul 14 2009, 5:38 PM
  • Bug 18218 has been marked as a duplicate of this bug. ***

I felt also interested on it, but thinking on the day-by-day edits on a Wiki, I think that a blame/annotate SVN/CVS-like feature is not feasible in a MediaWiki installation, specially in a public one where vandalism is common.

The annotation feature makes sense on a controlled development system where changes are not very huge. But here at Wikimedia (and other public wikis) where we deal with vandalism, it's common for vandals to blank pages or large sections of a page. That defeats the whole annotation system, since all lines would be marked as changed.

Instead, the idea of Steve Bennett at Comment 26 (posted on Bug 9455) would be more useful here, which only needs a text or pattern search of every revision text. That could also be implemented using JavaScript, retrieving every revision text trough the API and doing the search.

Bug 9455 was closed as resolved duplicated of this one, but I think it's worth to reopen it and probably think of implementing it if this one wouldn't be implemented.

psychonaut wrote:

(In reply to comment #28)

There is an extension [1] that does this now. WONTFIX?


The WikiTrust userscript also has this functionality:

epriestley closed this task as "Resolved" by committing Unknown Object (Diffusion Commit).Mar 4 2015, 8:20 AM
Ciencia_Al_Poder reopened this task as "Open".Mar 4 2015, 8:43 PM

epriestley closed this task as "Resolved" by committing rPHABeb010b2efc71: Group inline transactions in Pholio.


Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 22 2015, 3:25 PM
Meno25 removed a subscriber: Meno25.Feb 22 2016, 5:48 PM

Add Comment