Page MenuHomePhabricator

Search within history of articles
Open, LowestPublic

Description

Author: edupedrito

Description:
Hello:

I think it would be very useful to have the possibility to search within the history of the articles of Wikipedia. This way, for example, one could avoid writing something discarded reasonably several times on past editions.

This new feature could be a new namespace box situated next to the ones already avaiable:
(Main) Talk User User talk Wikipedia Wikipedia talk Image Image talk MediaWiki MediaWiki talk Template Template talk Help Help talk Category Category talk Portal Portal talk

I hope this new feature will be soon avaiable in all the language versions of Wikipedia.

Thanks and regards


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/wiki/Special:Search?search=itssearchexample&go=Go
See Also:
T2639: [Epic] Add feature annotate/blame command, to indicate who last changed each line / word

Details

Reference
bz10643

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 9:48 PM
bzimport set Reference to bz10643.
bzimport added a subscriber: Unknown Object (MLST).

ayg wrote:

If not implemented with sufficient intelligence, this would increase the size of the search index by a factor of about 16 (for enwiki, as an example). If done intelligently (only indexing deltas) I figure it would only be two to four times the size, but that would probably not be particularly easy. Either way I don't foresee this happening soon. Note that if implemented, this would largely obviate the need for bug 639.

Too expensive at present, but we'd love to have this eventually.

  • Bug 13850 has been marked as a duplicate of this bug. ***

micheljull wrote:

Thanks Brion, sorry for the duplicate. A possible way to make up for the lack of this feature until it's implemented occurs to me, a "download complete history (so many KB)" link, allowing to dump the whole history to the user's desktop as a (zipped?) html file concatenating all versions (ideally but not necessarily with diffs highlighted), which he could then search at leisure on his computer.

You can in fact download the full history of a given page via Special:Export. It's not necessarily super pretty, but should work.

micheljull wrote:

Thanks Brion, just tried it, it worked, and indeed I was able to search the downloaded xml quite easily directly in Firefox, but unfortunately:

1/ it lists only revisions 1 to 100 (latest revision listed in Anode article is ~2 years old)

2/ it's not easy to access (had never used the toolbox before, took me some time to find that export page)

A full download, via a link in the article's history page labeled e.g. :

Full history in xml format (* revisions, * kB)

would be wonderful.

  • Bug 15019 has been marked as a duplicate of this bug. ***

anon.hui wrote:

(In reply to comment #1)

If not implemented with sufficient intelligence, this would increase the size
of the search index by a factor of about 16 (for enwiki, as an example). If
done intelligently (only indexing deltas) I figure it would only be two to four
times the size, but that would probably not be particularly easy. Either way I
don't foresee this happening soon. Note that if implemented, this would
largely obviate the need for bug 639.

We can simply implement the infrastructure in core mediawiki, and leave the indexing task to
the search extension like Lucene-search.
This will let the search extension to decide that it would like to provide the historical search or not.

(In reply to comment #2)

Too expensive at present, but we'd love to have this eventually.

Too expensive is about indexing task?
Is it also too expensive to just implement the infrastructure?

The Diffindexer (https://github.com/whym/diffindexer) in combination with Wikihadoop (https://github.com/whym/wikihadoop) offers exactly this functionality.

  • Bug 24641 has been marked as a duplicate of this bug. ***

[Removing RESOLVED LATER as discussed in
http://lists.wikimedia.org/pipermail/wikitech-l/2012-November/064240.html .
Reopening and setting priority to "Lowest".
For future reference, please use either RESOLVED WONTFIX (for issues that will
not be fixed), or simply set lowest priority. Thanks a lot!]

As you all know, this functionality is already offered on a per-page basis by WikiBlame: http://wikipedia.ramselehof.de/wikiblame.php
I'm not convinced that this is something for Special:Search and it's surely more closely related to the history topic, hence changing component.

  • Bug 59620 has been marked as a duplicate of this bug. ***

@Nemo_bis I reckon it would be more useful to have the search box within the wiki itself, not as an external tool?