Page MenuHomePhabricator

(Potential) ORES database schema improvements
Closed, ResolvedPublic

Description

I just took a quick look at the ORES database schema an noticed a few things. Maybe not all of them are applicable, but please have a look at them.

This is especially important as ORES is going to create multiple ores_classification rows for each revision in Wikidata's recentchanges table (thus many rows and especially a lot of writes/ reads).

  • ores_classification and ores_model need to have a primary key (that is a must before deployment for all tables).
  • ores_classification.ores_model and ores_classification.ores_model_version could probably be removed in favor of a key pointing to ores_model. (Especially in case there are significantly less rows expected in ores_model than in ores_classification)
    • For that you probably want to have one table in ores_model for each version of each model (and not just one for the current versions).
    • If you don't do that, please still make sure you rename one of them to not have the same field names in more than one table.
  • ores_classification.ores_rev should be unsigned to match rev_id
  • You might want to use different name prefixes for fields in both tables (and not just the generic ores_). See also my point about ores_model and ores_model_version above.
  • The index ores_is_predicted probably is not of much use alone and should be dropped (queries will use ores_winner wont they?). This obviously depends on what queries you plan to do.

Files I reviewed:
https://github.com/wikimedia/mediawiki-extensions-ORES/blob/8a08f7f0a28030aa9f607a1ae3c037e9a521dbf8/sql/ores_model.sql
https://github.com/wikimedia/mediawiki-extensions-ORES/blob/8a08f7f0a28030aa9f607a1ae3c037e9a521dbf8/sql/ores_classification.sql

Event Timeline

hoo claimed this task.
hoo raised the priority of this task from to Needs Triage.
hoo updated the task description. (Show Details)
hoo added subscribers: Lydia_Pintscher, Aklapper, Lucie and 2 others.
hoo renamed this task from ORES database schema review to (Potential) ORES database schema improvements.Jan 22 2016, 5:52 PM
hoo added a project: DBA.
hoo set Security to None.
hoo updated the task description. (Show Details)

Okay, I made some changes and pushed them for review

  • Primary keys are added now
  • ores_rev is unsigned now
  • prefixes have changed to "oresm" (for ores_model) and "oresc" (for ores_classification). I don't think it's the best prefix in the world but I couldn't came up with anything better
  • The index is dropped now

Anyway: Since @awight is the original writer of these sql files, I would love to see his comments :)

Legoktm subscribed.

@hoo: I believe everything you brought up has been addressed, do you have some time take a quick look?

I had a quick look and I think the things I brought up above have been resolved.

There are some minor things that could maybe still be improved, but the most significant things have been resolved as far as I can tell.