Page MenuHomePhabricator

Investigate separating wbc_entity_usage out to a separate mariadb shard
Closed, ResolvedPublic

Description

We should check whether it's possible to move wbc_entity_usage to an own database shard, away from the main wiki databases.

Open questions:

  1. There might be places where the information is currently being JOINed against the other tables (API query modules?), we need to look into that and see how hard it would be to do that programmatically.
  2. How many writes are currently happening on the tables? Can that (plus an order of magnitude?) more easily be sustained if the tables are separate from the main wiki databases?

Question 1. is for the Wikidata team to answer, while 2. probably needs to be answered by a DBA.

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

hoo renamed this task from Investigate separating wbc_entity_usage out to a separate DB shard to Investigate separating wbc_entity_usage out to a separate mariadb shard.Aug 12 2017, 3:45 PM
hoo added a subscriber: Halfak.

What do you exactly mean with a "database shard"?
Could you provide some more context for this proposal?

What do you exactly mean with a "database shard"?
Could you provide some more context for this proposal?

This is about moving all (or at least the most busy) wbc_entity_usage tables out of the main wiki databases to another database shard (Jaime suggested that at some point).

Given this might be needed at some point, I would like for the Wikidata team to investigate what it will take us to support this (software wise, not looking at the actual infrastructure, yet).

Yes, a dedicated database service for this could make sense to separate writes if tracking changes starts taking most of the database writes. I suggested this as a means to offload some of the load if it starts to become a problem- but I do not know what are the needs in terms of joins and all, so this would be a nice topic to research. In some cases it is not an issue of performance (we haven't hit any issue yet), but of efficiency. Normal shards are replicated up to 20 times, and for certain usages that may be a waste (e.g. with some x1 services are ok being replicated just 3-5 times, like notifications, which are very heavy on writes).

@hoo, I think @Manuel got a bit alarmed because he may start to think you are requesting right now infrastructure, as you added the DBA tag (which normally we just use for actionable tickets/requests). I think if we move it to DBA (blocked external column) or MediaWiki-libs-Rdbms it may be clearer that you only need feedback at some point and have it on the radar, but no immediate server actionables are needed yet.

Thanks Jaime!
@hoo and myself just synced in person here :-)
All clear!

So, @Marostegui, @daniel and I were talking about this today.

We all agreed that this is something worth looking into, but there are various aspects that need investigation, some from the Wikidata team, some probably from the DBAs. I'll update the task description to reflect that.

hoo claimed this task.

I looked into this some more today an there are only three places in Wikibase where we join the wbc_entity_usage table against other tables:

  • SpecialEntityUsage (which should be easy to migrate)
  • ApiListEntityUsage (which should be rather easy to migrate as well)

Due to this, the discussions I had at Wikimania and positive response from Daniel and Lydia, I conclude that we should go for this and move the table to a dedicated DB shard.