Slow indexing of Lexemes for wbsearchentities
Closed, ResolvedPublicBUG REPORT
Actions

Description

Steps to Reproduce:

Create a new lexeme.

Attempt to use that new lexeme in some other place, e.g., by P5238

Actual Results:

The new lexeme does not show up immediate in the drop down menu generated with content from a API wbsearchentities response. After some minutes the new lexeme apparently does show up.

Expected Results:

The new lexeme are indexed within seconds and is available to the dropdown menu.

Details

	Subject	Repo	Branch	Lines +/-
	Combine DB lookups with elastic	mediawiki/extensions/WikibaseLexemeCirrusSearch	master	+44 -18

Customize query in gerrit

Related Objects

Mentioned In: rEWLCf2095a2548f2: Combine DB lookups with elastic
Mentioned Here: T224425: MW Job consumers sometimes pause for several minutes
P5238 (An Untitled Masterwork)

Event Timeline

Fnielsen created this task.Dec 10 2019, 10:32 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 10 2019, 10:32 AM

Addshore added a project: Discovery-Search.Dec 10 2019, 10:43 AM

Addshore added a subscriber: Lydia_Pintscher.Dec 10 2019, 10:57 AM

WMDE-leszek subscribed.Dec 10 2019, 11:12 AM

This seems not to be confined to lexemes but also the case for Q-items.

Daniel_Mietchen subscribed.Dec 10 2019, 1:17 PM

Could you precise what search string are you using?
wbsearchentities should be using the mysql database when searching using entity ids where the lag should be relatively small.
On the other hand the search index will take some time to udpate (job queue lag + elasticsearch refresh interval) so searches based on labels/aliases may not react immediately after an entity is added.

Both the entity IDs and the label searches were slow. The search is now fast again, so I can make it "Resolved" (or something else?). I suppose it could have been a temporary lag in elasticsearch.

@Fnielsen thanks for letting us know, if search by entity ID is slow again please re-open this issue with a link to the entity you created so that we can correlate with the metrics we monitor.
For label search we are currently experiencing recurrent lag on the jobqueue that could make it rather bad (several minutes per T224425).

The indexing in connection with lexemes are now slow again. For instance, during creation of L270066, the form L270066-F3 is not available for use.

EBernhardson moved this task from needs triage to Wikibase Search on the Discovery-Search board.May 14 2020, 10:51 PM

VIGNERON subscribed.Jul 3 2020, 10:08 AM

Alicia_Fagerving_WMSE subscribed.Jul 3 2020, 10:37 AM

Nemo_bis subscribed.Jul 3 2020, 11:48 AM

@dcausse could you have another look into this? Looking at job queue stats, the rate of the various Cirrus jobs seems stable, but I'm not familiar with all the details.

I think we've talked once or twice before about a time to update metric that can identify these issues. We have a document hinting process that informs the DataSender how to ship things, we could add an additional hint with timestamp on documents created by a new revision and report the difference between "now" and that timestamp when it is provided. This would give us critical information such as when a regression occured.

The related task (T224425) where the job queue would stop processing our jobs for some number of minutes, and then start again later, looks to have been resolved last week. I'm optimistic that the resolution of that task means this is also fixed, but I have yet to identify any hard data that could show it was bad before, and its measurably better today.

The job execution rates do look good this week, I'm not seeing any pauses, but unfortunately the historical graphite data doesn't have enough precision to compare against two weeks ago.[1] We could likely see some of the pauses at 5 minute resolution, although it might not as clearly drop to 0. I'm certain we wont see much of these pauses at 15 minute resolution.

[1] From profile::graphite::base:

Retain aggregated data at a one-minute resolution for one week; at
five-minute resolution for two weeks; at 15-minute resolution for
one month; one-hour resolution for one year, and 1d for five years.

Lydia_Pintscher renamed this task from Slow indexing for wbsearchentities to Slow indexing of Lexemes for wbsearchentities.Jul 16 2020, 12:14 PM

Lydia_Pintscher added a project: Wikidata Lexicographical data.

Now that lexicographical data is becoming more and more popular we're seeing more requests about this. Would be <3 if we can solve it as it's quite frustrating for the editors.

The same doesn't happen with items - they can be used immediately

Items used to have the same problem, looking back through the code history it looks like we added an 'instant index new' option to CirrusSearch, but that still wasn't sufficient for the problem. Some other workaround was put in place, the related cirrus commit says "The wikidata results are now augmented by the sql database, meaning instant indexing no longer necessary there". Can the sql augmenting be enabled for lexemes as well? I'm not sure where/how that is done.

Oh interesting.
@Addshore do you know?

Linking to senses also seems to work immediately - I was able to create a statement linking to one of the lexeme's senses even though I can't link to the lexeme itself yet.

Sense lookups are not supported by WikibaseLexemeCirrusSearch so I suppose that they use the wb_term mysql table. I thought that we combined mysql+elastic lookups so that when an ID is searched mysql can respond but looking at WikibaseLexemeCirrusSearch it was not implemented there. It should be a couple lines to add such support.

Change 615404 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/WikibaseLexemeCirrusSearch@master] Combine DB lookups with elastic

https://gerrit.wikimedia.org/r/615404

gerritbot added a project: Patch-For-Review.Jul 22 2020, 7:19 AM

dcausse triaged this task as Medium priority.Jul 22 2020, 7:19 AM

dcausse edited projects, added Discovery-Search (Current work); removed Discovery-Search.

dcausse moved this task from Incoming to Needs review on the Discovery-Search (Current work) board.

Change 615404 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexemeCirrusSearch@master] Combine DB lookups with elastic

https://gerrit.wikimedia.org/r/615404

dcausse mentioned this in rEWLCf2095a2548f2: Combine DB lookups with elastic.Jul 23 2020, 3:41 PM

ReleaseTaggerBot added a project: MW-1.36-notes (1.36.0-wmf.2; 2020-07-28).Jul 23 2020, 4:00 PM

Maintenance_bot removed a project: Patch-For-Review.Jul 23 2020, 4:10 PM

EBernhardson moved this task from Needs review to To Be Deployed on the Discovery-Search (Current work) board.Jul 23 2020, 8:29 PM

dcausse moved this task from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.Aug 3 2020, 5:04 PM

Gehel closed this task as Resolved.Aug 17 2020, 12:39 PM

I can confirm that the issue is no longer present: one can enter the L-identifier and there is no longer a delay in what the popup displays.

Alicia_Fagerving_WMSE unsubscribed.Aug 25 2020, 2:54 PM

Slow indexing of Lexemes for wbsearchentitiesClosed, ResolvedPublicBUG REPORTActions

Description

Details

Related Objects

Event Timeline

Slow indexing of Lexemes for wbsearchentities
Closed, ResolvedPublicBUG REPORT
Actions