Page MenuHomePhabricator

Problems with search in newish wikibase instance
Closed, ResolvedPublicBUG REPORT


Steps to replicate the issue (include links if applicable):

What happens?:

  • I notice similar behavior in type-ahead search
  • Mouse over behavior on recent changes throws up an error popup in holding over items that don't seem to be in the search index.
  • I can't find any correlation to when the items were written.
  • Some items eventually turn up in search.
  • No problems retrieving items via SPARQL
  • No problems accessing items with WikibaseIntegrator and modifying

What should have happened instead?:

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Right now, there are 8k+ MediaWiki jobs in the queue for I started a job to process these (which likely resolves the issue):

$ ./                                                                                                             production
job.batch/run-all-mw-jobs-vbrnn created

Thank you for taking a look at this issue. A bunch of items that I pushed yesterday evening seem to be coming up in search just fine as well. I assume that regular processing for the search index should happen automatically and be close to real time. I built infrastructure years ago between MongoDB and Elasticsearch where we had to deal with scaling issues of keeping an index up to date in real time with the data store. I haven't dug enough into your architecture here yet to understand it, but I can imagine it's a challenge.

I'm super stoked about what you guys are doing with this project! I look forward to continuing to leverage it in our use cases and contribute where I can. Way cool!

@Skybristol There is another ticket about that architectural issue you are mentioning in case you are interested in following along:

This is continuing to be an issue for me on another use case I am working through. I'm building out information associated with the Facility Registration System operated by the US Environmental Protection Agency. This information is incompletely and inadequately organized into Wikidata, so I am working on methods to process source data and organize into items. I've started on the two classification systems in use which are also incompletely referenced in Wikidata.

As I'm pushing items into a new instance with code, I find a major time lag in when the items show up in the UI, presumably via indexing. This has applied to both items and properties. There is a lag in when new properties show up on Special:ListProperties. I generally do see items (but I haven't checked comprehensively) on Special:RecentChanges, but I can't turn those same items up in a UI search. I have been able to return items and properties right away via SPARQL (I assume that's not index dependent).

This is also impacting implementation of formatter URL, which I implemented for five different external ID properties after pushing a batch of 2K+ items. I now have some items where the formatter URL settings have taken effect and others that are still catching up. I've seen chatter in Telegram on formatter URL perhaps not working properly to build links in the UI from ExternalID properties until indexing catches up.

This isn't a massive deal in some use cases; I don't have to worry about parallelizing my processes if it's going to take items a while to be fully integrated anyway. But I have stuff coming up I want to be able to do where I need to push millions of items and claims (2.7M EPA facilities). It would be good to know if this is pushing the beta too far and if the team is working out Elasticsearch configuration and scaling issues in a way we might be able to replicate in alternate dedicated Wikibase infrastructure. Is something like the RaiseWikibase approach going to work against a instance, or am I going to blow something up?

Evelien_WMDE claimed this task.