Page MenuHomePhabricator

DBA Review of Tables that ORES Extension will create
Closed, ResolvedPublic

Description

As part of supporting filtering on recent changes via ORES T390058: Add new filters to Recent Changes, we'd like to request a DBA review of the tables that ORES extension will create.

It's worth mentioning that the tooling allows someone to run a script to create the table on multiple wikis at once. We're hoping that getting a positive review on this will allow us to scale this to other wikis with minimal review overhead per wiki we'd like to enable the filter at.

Wikis to add ORES tables to

wiki90d editcountedits/dayadmin count
idwiki2560579128450846
cywiki1354552015050511
simplewiki924602410273321
bewiki47918655324210
kkwiki31781803531312
nnwiki35225383913914
mkwiki43797224866312
lawiki37366194151721
afwiki27060793006714
tewiki42003404667011
mrwiki24024532669310
swwiki13897961544215
mlwiki40781264531214
iswiki18366462040712
pawiki775906862110
hawiki581065645610
tlwiki19238162137510
bnwiki75533848392615
trwiki3191693635463223
azwiki68187637576413

(from T392144#10804184)

Also add tables to the table catalog T363581: Build a machine-readable catalogue of mariadb tables in production

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
DMburugu renamed this task from Recent Changes - Review of Tables that ORES Tooling will create to DBA Review of Tables that ORES Tooling will create.Apr 4 2025, 1:22 PM
DMburugu renamed this task from DBA Review of Tables that ORES Tooling will create to DBA Review of Tables that ORES Extension will create.
DMburugu updated the task description. (Show Details)

Some discussion on how to approach this is happening. Jason will update here.

Note that @Ladsgroup advised that deploying the topic model would be messy/complicated with the current recentchanges design; so we should avoid that one in particular for now. But, we can deploy ores with the revert risk models on all but the busiest wikis.
I created a list of wikis that we can deploy the language agnostic revert risk model here:

1# language agnostic revert risk enabled wikis:
2# https://meta.wikimedia.org/wiki/Machine_learning_models/Production/Language-agnostic_revert_risk
3
4revertrisk_language_agnostic_wikis = ["aa", "ab", "ace", "ady", "af", "ak", "als", "alt", "am", "ami", "an", "ang", "anp", "ar", "arc", "ary", "arz", "as", "ast", "atj", "av", "avk", "awa", "ay", "az", "azb", "ba", "ban", "bar", "bat-smg", "bcl", "be", "be-tarask", "be-x-old", "bg", "bh", "bi", "bjn", "blk", "bm", "bn", "bo", "bpy", "br", "bs", "bug", "bxr", "ca", "cbk-zam", "cdo", "ce", "ceb", "ch", "cho", "chr", "chy", "ckb", "co", "cr", "crh", "cs", "csb", "cu", "cv", "cy", "da", "dag", "de", "din", "diq", "dsb", "dty", "dv", "dz", "ee", "el", "eml", "en", "eo", "es", "et", "eu", "ext", "fa", "fat", "ff", "fi", "fiu-vro", "fj", "fo", "fr", "frp", "frr", "fur", "fy", "ga", "gag", "gan", "gcr", "gd", "gl", "glk", "gn", "gom", "gor", "got", "gpe", "gsw", "gu", "guc", "gur", "guw", "gv", "ha", "hak", "haw", "he", "hi", "hif", "ho", "hr", "hsb", "ht", "hu", "hy", "hyw", "hz", "ia", "id", "ie", "ig", "ii", "ik", "ilo", "inh", "io", "is", "it", "iu", "ja", "jam", "jbo", "jv", "ka", "kaa", "kab", "kbd", "kbp", "kcg", "kg", "ki", "kj", "kk", "kl", "km", "kn", "ko", "koi", "kr", "krc", "ks", "ksh", "ku", "kv", "kw", "ky", "la", "lad", "lb", "lbe", "lez", "lfn", "lg", "li", "lij", "lld", "lmo", "ln", "lo", "lrc", "lt", "ltg", "lv", "lzh", "mad", "mai", "map-bms", "mdf", "mg", "mh", "mhr", "mi", "min", "mk", "ml", "mn", "mni", "mnw", "mr", "mrj", "ms", "mt", "mus", "mwl", "my", "myv", "mzn", "na", "nah", "nan", "nap", "nds", "nds-nl", "ne", "new", "ng", "nia", "nl", "nn", "no", "nostalgia", "nov", "nqo", "nrm", "nso", "nv", "ny", "oc", "olo", "om", "or", "os", "pa", "pag", "pam", "pap", "pcd", "pcm", "pdc", "pfl", "pi", "pih", "pl", "pms", "pnb", "pnt", "ps", "pt", "pwn", "qu", "rm", "rmy", "rn", "ro", "roa-rup", "roa-tara", "ru", "rue", "rup", "rw", "sa", "sah", "sat", "sc", "scn", "sco", "sd", "se", "sg", "sgs", "sh", "shi", "shn", "si", "simple", "sk", "skr", "sl", "sm", "smn", "sn", "so", "sq", "sr", "srn", "ss", "st", "stq", "su", "sv", "sw", "szl", "szy", "ta", "tay", "tcy", "te", "test", "test2", "tet", "tg", "th", "ti", "tk", "tl", "tly", "tn", "to", "tpi", "tr", "trv", "ts", "tt", "tum", "tw", "ty", "tyv", "udm", "ug", "uk", "ur", "uz", "ve", "vec", "vep", "vi", "vls", "vo", "vro", "wa", "war", "wo", "wuu", "xal", "xh", "xmf", "yi", "yo", "yue", "za", "zea", "zh", "zh-classical", "zh-min-nan", "zh-yue", "zu"]
5
6revertrisk_multilingual_wikis = ['ka', 'lv', 'ta', 'ur', 'eo', 'lt', 'sl', 'hy', 'hr', 'sk', 'eu', 'et', 'ms', 'az', 'da', 'bg', 'sr', 'ro', 'el', 'th', 'bn', 'no', 'hi', 'ca', 'hu', 'ko', 'fi', 'vi', 'uz', 'sv', 'cs', 'he', 'id', 'tr', 'uk', 'nl', 'pl', 'ar', 'fa', 'it', 'zh', 'ru', 'es', 'ja', 'de', 'fr', 'en']
7
8# top 25 wikis in terms of recentchanges; to avoid
9# see https://phabricator.wikimedia.org/P74607
10avoid_wikis = [ "sv", "ko", "pt", "pl", "fa", "frwiktionary", "ca", "uk", "mgwiktionary", "ur", "ja", "zh", "arz", "enwiktionary", "it", "de", "thwiktionary", "vi", "es", "ru", "ar", "fr", "en", "wikidatawiki", "commonswiki", ]
11
12language_agnostic_deployable = list(set(revertrisk_language_agnostic_wikis) - set(revertrisk_multilingual_wikis) - set(avoid_wikis))
13
14print(language_agnostic_deployable)
15
16# ['sat', 'vo', 'mzn', 'cho', 'hyw', 'mg', 'dv', 'mrj', 'be-x-old', 'my', 'diq', 'lb', 'ks', 'trv', 'te', 'za', 'io', 'nn', 'pcm', 'sw', 'ak', 'jbo', 'nds', 'mi', 'nostalgia', 'olo', 've', 'tet', 'iu', 'war', 'os', 'frp', 'gor', 'mk', 'pwn', 'bcl', 'rm', 'zh-yue', 'bi', 'kn', 'sg', 'mr', 'hak', 'ab', 'ny', 'pag', 'din', 'rmy', 'tn', 'wuu', 'cdo', 'yi', 'roa-rup', 'wo', 'ce', 'cu', 'li', 'ceb', 'nrm', 'nds-nl', 'nah', 'aa', 'szy', 'tly', 'lad', 'got', 'nso', 'gur', 'frr', 'si', 'vls', 'tay', 'ady', 'vec', 'test', 'ba', 'xmf', 'tg', 'gpe', 'ts', 'azb', 'dsb', 'sgs', 'xal', 'om', 'roa-tara', 'cv', 'ext', 'haw', 'fat', 'ht', 'hz', 'pih', 'kv', 'fiu-vro', 'or', 'ami', 'be', 'rup', 'gu', 'to', 'pi', 'ng', 'oc', 'ln', 'sco', 'inh', 'kab', 'tcy', 'pms', 'ang', 'dz', 'min', 'na', 'lrc', 'hif', 'shi', 'test2', 'ky', 'ban', 'cr', 'eml', 'ki', 'qu', 'dty', 'awa', 'av', 'mus', 'mt', 'cbk-zam', 'ha', 'kaa', 'nqo', 'fo', 'gsw', 'kbd', 'mhr', 'dag', 'kg', 'alt', 'zu', 'ckb', 'ho', 'tt', 'zh-classical', 'km', 'bpy', 'as', 'pnb', 'be-tarask', 'fj', 'rn', 'kw', 'mai', 'nap', 'nov', 'tyv', 'gom', 'ace', 'lij', 'ch', 'rue', 'ast', 'fur', 'pcd', 'sq', 'mwl', 'lez', 'ksh', 'scn', 'gcr', 'stq', 'kk', 'lmo', 'vro', 'ilo', 'bat-smg', 'mn', 'lbe', 'blk', 'nia', 'fy', 'mnw', 'kj', 'koi', 'cy', 'ii', 'als', 'bm', 'xh', 'vep', 'ary', 'co', 'kl', 'st', 'simple', 'gv', 'rw', 'srn', 'bar', 'bh', 'udm', 'new', 'ss', 'anp', 'map-bms', 'gd', 'krc', 'so', 'gag', 'yue', 'chy', 'ug', 'yo', 'ty', 'sn', 'pam', 'ga', 'sh', 'ltg', 'am', 'avk', 'guw', 'sd', 'pnt', 'nv', 'bxr', 'mdf', 'la', 'ie', 'bs', 'atj', 'jam', 'glk', 'br', 'kbp', 'lld', 'pa', 'ig', 'csb', 'su', 'kr', 'bug', 'bo', 'nan', 'tk', 'szl', 'lzh', 'tpi', 'ik', 'pfl', 'bjn', 'tw', 'sm', 'gl', 'tum', 'ti', 'is', 'smn', 'zh-min-nan', 'ee', 'lg', 'lfn', 'wa', 'hsb', 'jv', 'ff', 'ne', 'mh', 'ia', 'gn', 'mad', 'mni', 'myv', 'arc', 'sa', 'sc', 'se', 'crh', 'af', 'ay', 'shn', 'lo', 'kcg', 'ku', 'ps', 'guc', 'ml', 'chr', 'zea', 'pdc', 'an', 'skr', 'sah', 'gan', 'tl', 'pap']

I'll do the same for the multilingual revert risk model

On the side of table catalog. I suggest creating a dblist for ores tables so we can catalog the tables without needing to hand code all of these. Also do you really want to deploy it to nostalgia.wikipedia.org? :P

On the side of table catalog. I suggest creating a dblist for ores tables so we can catalog the tables without needing to hand code all of these. Also do you really want to deploy it to nostalgia.wikipedia.org? :P

I think we're good on that one :-D! just defining the technical limitations in P74609

and here's the list of wikis on which we can deploy the multilingual revert risk model:

1# Multilingual revert risk enabled wikis
2# https://meta.wikimedia.org/wiki/Machine_learning_models/Production/Multilingual_revert_risk
3revertrisk_multilingual_wikis = ['ka', 'lv', 'ta', 'ur', 'eo', 'lt', 'sl', 'hy', 'hr', 'sk', 'eu', 'et', 'ms', 'az', 'da', 'bg', 'sr', 'ro', 'el', 'th', 'bn', 'no', 'hi', 'ca', 'hu', 'ko', 'fi', 'vi', 'uz', 'sv', 'cs', 'he', 'id', 'tr', 'uk', 'nl', 'pl', 'ar', 'fa', 'it', 'zh', 'ru', 'es', 'ja', 'de', 'fr', 'en']
4# top 25 wikis in terms of recentchanges; to avoid
5# https://phabricator.wikimedia.org/P74607
6avoid_wikis = [ "sv", "ko", "pt", "pl", "fa", "frwiktionary", "ca", "uk", "mgwiktionary", "ur", "ja", "zh", "arz", "enwiktionary", "it", "de", "thwiktionary", "vi", "es", "ru", "ar", "fr", "en", "wikidatawiki", "commonswiki", ]
7deployable = list(set(revertrisk_multilingual_wikis) - set(avoid_wikis))
8print(deployable)
9# ['hi', 'no', 'he', 'sk', 'sl', 'lt', 'uz', 'lv', 'nl', 'sr', 'hy', 'bg', 'hu', 'bn', 'hr', 'eo', 'el', 'az', 'et', 'eu', 'cs', 'tr', 'id', 'ka', 'ro', 'ms', 'ta', 'th', 'da', 'fi']

@JWheeler-WMF we'll need to decide which wikis from the deployable lists (P74609, P74610) we're going to enable ores for; that's what we'll ask DBA for help with.

Change #1143638 had a related patch set uploaded (by Jsn.sherman; author: Jsn.sherman):

[operations/mediawiki-config@master] Create dblist for ores extension

https://gerrit.wikimedia.org/r/1143638

On the side of table catalog. I suggest creating a dblist for ores tables so we can catalog the tables without needing to hand code all of these. [...]

I gave it a shot here:

Change #1143638 had a related patch set uploaded (by Jsn.sherman; author: Jsn.sherman):

[operations/mediawiki-config@master] Create dblist for ores extension

https://gerrit.wikimedia.org/r/1143638

I just started with the wikis that look like they already have the ores extension setup. I assume we would then add new entries to the dblist as part of our deployment process and then run the table creation script against the list as mentioned here?
https://wikitech.wikimedia.org/wiki/Creating_new_tables#Deployment
Is it idempotent, meaning we can run it against a list that includes wikis that already have the table?

I made a comment on the patch.

For deploying to new wikis. You must first create the table in the wiki. You have two options and they are outlined in https://wikitech.wikimedia.org/wiki/Creating_new_tables (either add to createExtensionTables.php or directly invoke the sql file) and then you can enable ores on the wiki via adding it to the dblist.

DMburugu updated the task description. (Show Details)
DMburugu updated the task description. (Show Details)

I made a comment on the patch.

I'll get that patch updated.

For deploying to new wikis. You must first create the table in the wiki. You have two options and they are outlined in https://wikitech.wikimedia.org/wiki/Creating_new_tables (either add to createExtensionTables.php or directly invoke the sql file) and then you can enable ores on the wiki via adding it to the dblist.

Makes sense! I got a little tripped up by a this passage in https://wikitech.wikimedia.org/wiki/Creating_new_tables#Deployment:

You can run the script on all wikis in a dblist using foreachwikiindblist name-of-dblist.dblist extensions/WikimediaMaintenance/createExtensionTables.php extensionname

but I can see that would probably only make sense if you were adding the tables for a new list that didn't already have some tables deployed already.

I made a comment on the patch.

For deploying to new wikis. You must first create the table in the wiki. You have two options and they are outlined in https://wikitech.wikimedia.org/wiki/Creating_new_tables (either add to createExtensionTables.php or directly invoke the sql file) and then you can enable ores on the wiki via adding it to the dblist.

Hello, @Ladsgroup I had a question about this.
In looking at the createExtensionTables.php I see that ORES is already added to the file. Since that is the case, do we still need to do step 6?

As you mention ores is already in the createExtensionTables.php file.
My understanding is that we would run the following command to create tables for idwiki:

mwscript-k8s --comment="T382171" -- extensions/WikimediaMaintenance/createExtensionTables.php --wiki=idwiki ores

IIUC if we wanted to do it in a batch we would have to run the following command (after we had updated the dblist with new entries):

foreachwikiindblist ores.dblist extensions/WikimediaMaintenance/createExtensionTables.php ores

My understanding is that this would cause an error since the tables for all the other wikis already exist. So to Jason's question I understand that this is not idempotent and it would throw errors since the CREATE TABLE statements do not have a IF NOT EXISTS clause in https://github.com/wikimedia/mediawiki-extensions-ORES/blob/master/sql/mysql/tables-generated.sql
I would be happy if I'm just wrong and this works so then we can just run it in a batch. Not saying that running it one by one is gonna be an issue ofc

I made a comment on the patch.

For deploying to new wikis. You must first create the table in the wiki. You have two options and they are outlined in https://wikitech.wikimedia.org/wiki/Creating_new_tables (either add to createExtensionTables.php or directly invoke the sql file) and then you can enable ores on the wiki via adding it to the dblist.

Hello, @Ladsgroup I had a question about this.
In looking at the createExtensionTables.php I see that ORES is already added to the file. Since that is the case, do we still need to do step 6?

What @isarantopoulos said. I have nothing else to add!

I made a comment on the patch.

For deploying to new wikis. You must first create the table in the wiki. You have two options and they are outlined in https://wikitech.wikimedia.org/wiki/Creating_new_tables (either add to createExtensionTables.php or directly invoke the sql file) and then you can enable ores on the wiki via adding it to the dblist.

Hello, @Ladsgroup I had a question about this.
In looking at the createExtensionTables.php I see that ORES is already added to the file. Since that is the case, do we still need to do step 6?

What @isarantopoulos said. I have nothing else to add!

Perfect, thanks!

Change #1147104 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/puppet@production] mariadb: Add ores extension tables to the table catalog

https://gerrit.wikimedia.org/r/1147104

Change #1143638 merged by jenkins-bot:

[operations/mediawiki-config@master] Create dblist for ores extension

https://gerrit.wikimedia.org/r/1143638

Mentioned in SAL (#wikimedia-operations) [2025-05-19T11:52:47Z] <isaranto@deploy1003> Started scap sync-world: Backport for [[gerrit:rGERRIT114363824469|Create dblist for ores extension (T391103)]]

Mentioned in SAL (#wikimedia-operations) [2025-05-19T11:56:54Z] <isaranto@deploy1003> isaranto, jsn: Backport for [[gerrit:rGERRIT114363824469|Create dblist for ores extension (T391103)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-05-19T12:05:15Z] <isaranto@deploy1003> Finished scap sync-world: Backport for [[gerrit:rGERRIT114363824469|Create dblist for ores extension (T391103)]] (duration: 12m 27s)

Change #1147104 merged by Ladsgroup:

[operations/puppet@production] mariadb: Add ores extension tables to the table catalog

https://gerrit.wikimedia.org/r/1147104

From the above initial list the extension is already enabled in the following wikis:

simplewiki
trwiki

And we have also enabled it in idwiki.
I have created the tables for the remaining list of wikis (ores_classification and ores_model tables).
This is the list of the remaining wikis, as this step precedes the extension installation:

cywiki
bewiki
kkwiki
nnwiki
mkwiki
lawiki
afwiki
tewiki
mrwiki
swwiki
mlwiki
iswiki
pawiki
hawiki
tlwiki
bnwiki
azwiki

Could someone run the maintain-views script for the above? Thank youu 🙏
cc: @Ladsgroup @taavi

Could someone run the maintain-views script for the above? Thank youu 🙏

First the tables must be created in production, if they are not created the maintain-views will be noop. Let me know once you created the tables and I can run it (also I think you can just directly create the tables without needing to enable the extension or deploy anything, it'll be sitting there empty but that's fine)

Could someone run the maintain-views script for the above? Thank youu 🙏

First the tables must be created in production, if they are not created the maintain-views will be noop. Let me know once you created the tables and I can run it (also I think you can just directly create the tables without needing to enable the extension or deploy anything, it'll be sitting there empty but that's fine)

nvm, I'm illiterate. You already created them.

Cookbook cookbooks.sre.wikireplicas.update-views run by ladsgroup: Started updating wiki replica views

Cookbook cookbooks.sre.wikireplicas.update-views started by ladsgroup completed:

  • an-redacteddb1001.eqiad.wmnet (PASS)
    • Ran Puppet agent
    • Ran 'maintain-views --replace-all --auto-depool --databases lawiki'
  • clouddb1017.eqiad.wmnet (PASS)
    • Ran Puppet agent
    • Ran 'maintain-views --replace-all --auto-depool --databases lawiki'
  • clouddb1018.eqiad.wmnet (PASS)
    • Ran Puppet agent
    • Ran 'maintain-views --replace-all --auto-depool --databases lawiki'
  • clouddb1019.eqiad.wmnet (PASS)
    • Ran Puppet agent
    • Ran 'maintain-views --replace-all --auto-depool --databases lawiki'
  • clouddb1020.eqiad.wmnet (PASS)
    • Ran Puppet agent
    • Ran 'maintain-views --replace-all --auto-depool --databases lawiki'
  • clouddb1013.eqiad.wmnet (PASS)
    • Ran Puppet agent
    • Ran 'maintain-views --replace-all --auto-depool --databases lawiki'
  • clouddb1014.eqiad.wmnet (PASS)
    • Ran Puppet agent
    • Ran 'maintain-views --replace-all --auto-depool --databases lawiki'
  • clouddb1015.eqiad.wmnet (PASS)
    • Ran Puppet agent
    • Ran 'maintain-views --replace-all --auto-depool --databases lawiki'
  • clouddb1016.eqiad.wmnet (PASS)
    • Ran Puppet agent
    • Ran 'maintain-views --replace-all --auto-depool --databases lawiki'

I ran it with multiple db options and only lawiki was run. So the cookbook must be either run one by one (or xargs) or the old fashioned way. I leave that to WMCS to handle.

taavi removed a project: cloud-services-team.

I split that to T395122 since this task is going to get very confusing otherwise.

Yes that makes sense. Once again thank you both!

isarantopoulos claimed this task.

All of the above tables have been created along with the ores.dblist and the tables-catalog. Thank you @Ladsgroup @taavi and @fnegri for the support <3