Page MenuHomePhabricator

Move Wikidata term store to separate database cluster
Open, Needs TriagePublic

Description

“Currently term store is reaching 340GB in wikidata and slowly reaching the wb_terms era”, so @Ladsgroup wants to “[split] s8 into a core cluster and a dedicated cluster for term store (tentatively called x3)”.

This is the general task to achieve that; T351802: Wikibase: Introduce separate database configuration for term store covers the necessary code changes in Wikibase; the Wikimedia production / operations / DBA side can happen either in this task or in additional subtasks. (Feel free to edit this task as needed.)

Event Timeline

For further clarification, it would be good to hear from @Ladsgroup about the urgency of this on their side. This will also help @WMDE-leszek asses the priorities of the respective code changes in T351802: Wikibase: Introduce separate database configuration for term store. Regardless as I take it, there isn't much we should be doing on the Wikidata team side of things, or am I mistaken?

For further clarification, it would be good to hear from @Ladsgroup about the urgency of this on their side.

Clarification, we will not do anything until at least start of next US FY (as we need to budget a couple more dbs for extra headroom) so we have at least six months. It is not urgent but I don't want us to do it in an emergency/rushed manner either. It's more of an "early alarm". Hope that clears things.

(No idea if this is a Schema-change-in-production or something similar…)

Nope, this is for now an Epic, we need to add cloud replicas, backups, etc.

Clarification, we will not do anything until at least start of next US FY (as we need to budget a couple more dbs for extra headroom) so we have at least six months.

100% thank you! I'll copy it over to the task the wmde teams are discussing in.

Could anyone update this?
Given https://www.wikidata.org/wiki/User:ASarabadani_(WMF)/Growth_of_databases_of_Wikidata has been scaring the community I suggest we make it really clear how urgent or not this really is.

Can the current setup handle a 2x og revisions, items and statements or not?

I for one would like to import millions of citations from Wikipedias since 2019 but have been holding back because of the WDQS scalability issues which have now hopefully been effectively mitigated by the split.

Additionally millions of statements on existing citation items (authors, full text URLs etc. to improve the items)

Also there is talk in the community about importing all the streets of whole countries so that could easily result in millions more items.

Some members would like to import all chemicals in the world and reports we are currently missing most of them. = Millions more items.

In short the community wants a system like Wikipedia where they don't have to bother about worrying about catastrophic failures.

Do we have that? No? Can we have that? If yes, when?
Funding shouldn't be a problem, the Wikipedias are dependent on Wikidata, I very much assume the board is prepared to fund whatever it takes to make Wikidata scale and prevent catastrophic failures. Also WMF is rich with millions in the bank so that is also not a likely bottleneck.

I'm not sure your comment is related to the task here. For example, importing citations is something that wikidata community needs to agree on first as a valid usecase for wikidata (many people in the community prefer to set up a dedicated wikicite for it instead of putting everything and anything on wikidata. That's the whole idea behind SDC and federation). Then we can discuss further actions.

After thinking around T351802#10453261, i came here

Nope, this is for now an Epic, we need to add cloud replicas, backups, etc.

How will these tables be exposed to cloud when it all moves to a new cluster?
Will they remain as part of wikidatawiki there?

They will be part of the new cluster, that will get replicated to wiki replicas.

@Jakob_WMDE and @Ollie.Shotton_WMDE Hi, do you think it'd be useful to switch production too? It'd be noop until we set up x3 but that'll switch a lot of code paths to reduce moving parts when we are switching to x3

@Jakob_WMDE and @Ollie.Shotton_WMDE Hi, do you think it'd be useful to switch production too? It'd be noop until we set up x3 but that'll switch a lot of code paths to reduce moving parts when we are switching to x3

Yeah, it's enabled on beta now and it should just work™, but it wouldn't hurt to make use of the virtual domains config in production too.

Change #1126002 had a related patch set uploaded (by Jakob; author: Jakob):

[mediawiki/extensions/WikibaseCirrusSearch@master] Generalize WikibasePrefixSearcher

https://gerrit.wikimedia.org/r/1126002

Change #1126003 had a related patch set uploaded (by Jakob; author: Jakob):

[mediawiki/extensions/WikibaseCirrusSearch@master] Enable highlighting for other fields in ElasticTermResult

https://gerrit.wikimedia.org/r/1126003

Change #1126004 had a related patch set uploaded (by Jakob; author: Jakob):

[mediawiki/extensions/WikibaseCirrusSearch@master] Add InlabelSearch

https://gerrit.wikimedia.org/r/1126004

^ Ignore these patches. I copied the wrong ticket ID. Such Monday...