Page MenuHomePhabricator

MWUpdateDaemon doesn't delete old entries properly
Closed, InvalidPublic

Description

When pages get edited multiple times during indexing, duplicate entries end up getting inserted.

Lucene's inability to read and write on the same index makes this unnecessarily difficult to do right. Placing
updates on our own per-database queues, replacing duplicates during that time, and then applying
updates direct instead of through an in-memory directory might work reasonably well.


Version: unspecified
Severity: normal
URL: http://en.wikipedia.org/wiki/Special:Search?search=hurricane+dennis&fulltext=Search

Details

Reference
bz2794

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:40 PM
bzimport set Reference to bz2794.

As a temporary workaround, I've hacked the daemon to skip over duplicate results.
(They need to be adjacent in the results to actually get skipped over.)

dto wrote:

(In reply to comment #0)

When pages get edited multiple times during indexing, duplicate entries end up

getting inserted.

Does this still happen?
And does it happen only for -rebuild, since the article is deleted before it's
added again when doing an increment?

Thanks.

[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]