Page MenuHomePhabricator

geokb.wikibase.cloud instance clogged up with backlogged jobs again
Closed, ResolvedPublicBUG REPORT

Description

This is a specific issue related to the underlying problem described in T330389: Run Mediawiki Jobs in the background not at the end of requests

Steps to replicate the issue (include links if applicable):

  • Push a bunch of edits, new item creation, claim creation, etc. to a wikibase.cloud instance
  • For example, I pushed thousands of person and organization records yesterday to geokb.wikibase.cloud

What happens?:

  • Mediawiki API shows jobs backing up (e.g., geokb)
  • SPARQL queries start not returning all records that are in the wikibase instance

What should have happened instead?:

  • With only minimal delay if any, the wikibase instance should be fully functional (SPARQL query, search, type-ahead/mouseover in the UI, etc.) with all new items and claims.

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

Here's some visible behavior in the system arising from the problem. If you check out the WhatLinksHere for one of our science centers, you'll see a listing of people affiliated with that organization. Some of the items in the list show up without labels like this one. Those same entities are not returned in a SPARQL search such as trying to retrieve everyone employed by our organization.

If someone is able to kick something off to run all the backed up jobs, I'd really appreciate it. I have to do more stuff with a bunch of these entities now in terms of building out additional claims from other data sources.

@Skybristol We're currently facing issues with our SQL replica, it's probably not advisable to create a job while these issues have not been resolved. I'll put this on the backlog for when the incident is over.

Hi @Fring. Any progress on the SQL replica problems or other related issues? I'm definitely seeing some performance problems in the instance in work I'm trying to do today with a new class if entities I'm working through. Perhaps I need to just cease working with this until the problem is resolved?

@Skybristol While this is still ongoing unfortunately, the system should be "safe" to use.

@Evelien_WMDE Yes. The GeoKB instance does seem to be working better now. Elasticsearch responses seem to be what I would expect, and SPARQL queries are returning what they should. Thank you! I was definitely having some performance problems there for a few days. I've continued to do work on aspects of the model since @Fring indicated things should be "safe," but I'm happy to have it humming along again!

I'll try not to break anything else, but I do have a lot in the backlog I'm working to get organized. One of the next things on my list is to work further on how we are bringing multiple datasets together on characterizing mines/mining projects and the mining facilities (tunnels, pits, tailings, mills, etc.) associated with them. This is thousands of entities and claims. Wikibase is a really key piece of the architecture as it lets us organize competing claims (all reasonable at some level) about the same things with associated qualifers and references, letting use needs dictate what we query for based on analytical purpose.

I'd appreciate a heads up on any performance gotchas I should know about. I don't want to cause undue stress. :-)

Hi there @Skybristol alright thanks for the context!
I don't think you're to blame for breaking things :) the GeoKB instance is definitely one of the bigger and more active Wikibases we're hosting. We also definitely noticed you're working through that backlog you mentioned in the last few days, and it's actually a great help for us to scale the platform and put measures in place to "unblock" your kind of use case. We try to give a heads up for when to expect performance issues, though at the moment as we scale, there are a few things we're still learning that catch us off guard. So bear with us, and please do keep reporting these kinds of issues so we can address them in a timely matter.

Hi @Evelien_WMDE. Thank you for connecting on this. I really appreciate what WMDE is working to build out with this infrastructure.

There is an underlying information architecture challenge with these different domain knowledge graphs that I wonder if someone is working to address. Many of the items I've brought into the Wikibase via bot code are standard reference materials we need to link to from the things we are primarily focused on. We could really use some kind of federation technique that would let properties in one wikibase link to items in another knowledgebase. We can do some things with SPARQL and namespaces, but we often need to establish very specific types of relationships at the statement level with qualifiers and references.

We can sometimes almost (but not quite) count on collections of reference items from Wikidata such as named places, persons, organizations, etc. The "almost" comes in where we don't know if we quite agree with the definitions someone used in establishing items in Wikidata or their semantics in terms of classification. Or we find missing items that would need to be added. This often comes up because the users who originated or contributed to items in Wikidata did so because they needed the items to link to but were not necessarily invested in building the best representation for those items for use outside their own use cases.

If we specify that a property is of type, external ID, and use formatter URL to specify the appropriate namespace of another wikibase instance (including Wikidata), this gives us part of what we need. But we don't really get graph-based query capability with this approach (at least not that I've been able to determine). We need more of a federated graph technique.

Within the wikibase.cloud environment, it would be interesting to see different communities develop with a focus on establishing the best representation of a given domain of entities many other groups are going to need to use. These groups could first look to see if a little bit of work within Wikidata might establish a usable reference source and/or set up a new representation with associated bot code in a specific wikibase.cloud instance. Other instances could then leverage the shared community source rather than building out their own representation. To make this really functional, however, it seems like we'd need some kind of capabilities such as search index federation to make "foreign reference items" behave more like local items in a given wikibase instance.

Evelien_WMDE claimed this task.