Page MenuHomePhabricator

[EPIC] Kill the wb_terms table
Open, Needs TriagePublic

Description

Progress 2019

  • Early 2019, investigate normalization of wb_terms
  • T219175 - March 2019 Trail blaze #1 - Migrating data from wb_terms to a new schema.
  • TBD - Read from new schema in production
    • TBA tickets for reading for each usecase (for wikidata & wikibase), T198866 will be relevant here?
  • TBD - Use Elastic search for some bulk term lookups?
  • TBD - Drop the wb_terms table in production

Tickets that can be closed once the table is dead

Potentially clossable?

Related Objects

StatusAssignedTask
OpenNone
OpenNone
OpenNone
Resolvedalaa_wmde
Resolvedalaa_wmde
Resolvedalaa_wmde
ResolvedNone
DeclinedNone
Declinedalaa_wmde
ResolvedLadsgroup
OpenNone
ResolvedLadsgroup
ResolvedJeroenDeDauw
DeclinedNone
ResolvedNone
ResolvedNone
ResolvedLadsgroup
ResolvedNone
InvalidNone
DeclinedNone
ResolvedLucas_Werkmeister_WMDE
InvalidNone
DeclinedNone
ResolvedLadsgroup
OpenNone
ResolvedJeroenDeDauw
DeclinedNone
Resolvedalaa_wmde
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
OpenNone
ResolvedNone
InvalidNone
Resolvedalaa_wmde
OpenLadsgroup
InvalidNone
OpenLadsgroup
ResolvedLadsgroup
OpenNone
OpenNone
ResolvedAddshore
ResolvedAddshore
ResolvedMarostegui
ResolvedAddshore
ResolvedMarostegui
ResolvedMarostegui

Event Timeline

Addshore created this task.Oct 31 2018, 2:35 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 31 2018, 2:35 PM
Addshore claimed this task.Oct 31 2018, 2:36 PM
Restricted Application added a project: User-Addshore. · View Herald TranscriptOct 31 2018, 2:36 PM
Addshore updated the task description. (Show Details)Oct 31 2018, 2:36 PM
Addshore updated the task description. (Show Details)Oct 31 2018, 2:47 PM
Addshore updated the task description. (Show Details)Oct 31 2018, 5:36 PM
Addshore moved this task from Unsorted 💣 to Next on the User-Addshore board.Jan 16 2019, 3:15 PM
Addshore reassigned this task from Addshore to alaa_wmde.Mar 19 2019, 9:51 AM
Addshore removed alaa_wmde as the assignee of this task.Mar 25 2019, 3:53 PM
Addshore added a subscriber: alaa_wmde.

A first round of work is starting on this EPIC.
The work can be tracked onhttps://phabricator.wikimedia.org/project/profile/3972/

The specific big tickets currently are:

I'm going to expand the description of this task now

Addshore updated the task description. (Show Details)Mar 25 2019, 3:55 PM
Addshore updated the task description. (Show Details)Mar 25 2019, 4:15 PM
Addshore updated the task description. (Show Details)Mar 25 2019, 4:38 PM
Addshore updated the task description. (Show Details)

I see you are going to ask DBAs at T219145, that is great.

As a heads up, because I saw there is a chance of starting using other data stores (which by itself is not an issue), instead of or in addition to MySQL to give a heads up to service operations SREs and probably search too- in the past there has been misunderstandings with sending important/data that has to be persisted to datastores that ops cannot guarantee persistance/are not properly replicated between dcs (e.g. redis). Please do not go beyond the research phase without taking SREs with your proposal to avoid future misunderstandings about storage requirements. CC @Joe @Gehel

I see you are going to ask DBAs at T219145, that is great.
As a heads up, because I saw there is a chance of starting using other data stores (which by itself is not an issue), instead of or in addition to MySQL to give a heads up to service operations SREs and probably search too- in the past there has been misunderstandings with sending important/data that has to be persisted to datastores that ops cannot guarantee persistance/are not properly replicated between dcs (e.g. redis). Please do not go beyond the research phase without taking SREs with your proposal to avoid future misunderstandings about storage requirements. CC @Joe @Gehel

Yup, I have already spent some time talking to the search team, and need to write up a spec of exactly what we want, why, how we will use it etc.
Persistence and freshness are things that would not be needed for the elastic search work mentioned in the task description, and the fallback would always be to sql, but looking up many entity "terms" for many languages elastic is much faster than doing that directly in SQL, even if we then have to lookup any ones missing or out of date from elastic back in SQL.
I should be writing this up this week