Page MenuHomePhabricator

Concerns about ores_classification table size on enwiki
Closed, ResolvedPublic

Description

According to the developer's worse case scenario, in 3 years, the table will be 12 GB.

In just 196 days after deployment, ORES table (on enwiki-master) is:

-rw-rw---- 1 mysql mysql 6.9G Mar  6 20:49 ores_classification.ibd

And it is crossing the 100 million row mark. It is alarming, as we are on the "3 times worse than the worse case" scenario. It would be nice to have plans for archival, purging or partitioning.

The table size is not a concern in other wikis, like wikidatawiki.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Quite nice!:

root@db2071:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 8.1G May 17 06:25 ores_classification.ibd

And after the table rebuild

root@db2071:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 528M May 17 06:40 ores_classification.ibd

Mentioned in SAL (#wikimedia-operations) [2017-05-17T08:50:41Z] <marostegui> Deploy alter table on codfw master (db2016) and let ir replicate - T159753

codfw is now done after running it on the master and let it replicate (dbstore2001 will get it tomorrow as it is our delayed slave), eqiad will take a bit more time as we need to pool/depool hosts

db2034.codfw.wmnet
-rw-rw---- 1 mysql mysql 544M May 17 09:02 /srv/sqldata/enwiki/ores_classification.ibd

db2042.codfw.wmnet
-rw-rw---- 1 mysql mysql 544M May 17 09:02 /srv/sqldata/enwiki/ores_classification.ibd

db2048.codfw.wmnet
-rw-rw---- 1 mysql mysql 544M May 17 09:02 /srv/sqldata/enwiki/ores_classification.ibd
.
db2055.codfw.wmnet
-rw-rw---- 1 mysql mysql 544M May 17 09:02 /srv/sqldata/enwiki/ores_classification.ibd

db2062.codfw.wmnet
-rw-rw---- 1 mysql mysql 544M May 17 09:02 /srv/sqldata/enwiki/ores_classification.ibd

db2069.codfw.wmnet
-rw-rw---- 1 mysql mysql 544M May 17 09:02 /srv/sqldata/enwiki/ores_classification.ibd

db2070.codfw.wmnet
-rw-rw---- 1 mysql mysql 544M May 17 09:02 /srv/sqldata/enwiki/ores_classification.ibd

db2071.codfw.wmnet
-rw-rw---- 1 mysql mysql 544M May 17 09:02 /srv/sqldata/enwiki/ores_classification.ibd

dbstore2001.codfw.wmnet
-rw-rw---- 1 mysql mysql 4.7G May 17 09:04 /srv/sqldata/enwiki/ores_classification.ibd

dbstore2002.codfw.wmnet
-rw-rw---- 1 mysql mysql 544M May 17 09:02 /srv/sqldata/enwiki/ores_classification.ibd

db2016.codfw.wmnet
-rw-rw---- 1 mysql mysql 544M May 17 09:02 /srv/sqldata/enwiki/ores_classification.ibd

Executed on db1069 (sanitarium) and db1095 (sanitarium2) to replicate downstream to the labs hosts.

root@db1069:/srv/sqldata.s1/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 544M May 17 10:06 ores_classification.ibd

root@db1095:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 296M May 17 10:13 ores_classification.ibd

Change 354188 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1090 for maintenance

https://gerrit.wikimedia.org/r/354188

Change 354188 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1090 for maintenance

https://gerrit.wikimedia.org/r/354188

Mentioned in SAL (#wikimedia-operations) [2017-05-18T08:46:34Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1089 - T159753 T164530 (duration: 00m 39s)

db1089 has been optimized:

root@db1089:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 548M May 18 08:48 ores_classification.ibd

Mentioned in SAL (#wikimedia-operations) [2017-05-18T08:52:17Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1089 - T159753 T164530 (duration: 00m 39s)

Change 354195 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1083

https://gerrit.wikimedia.org/r/354195

Change 354195 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1083

https://gerrit.wikimedia.org/r/354195

Mentioned in SAL (#wikimedia-operations) [2017-05-18T09:14:36Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1083 - T159753 T164530 (duration: 00m 39s)

db1083 has been optimized:

root@db1083:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 760M May 18 09:16 ores_classification.ibd

Change 354200 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Repool db1083, depool db1080

https://gerrit.wikimedia.org/r/354200

Change 354200 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Repool db1083, depool db1080

https://gerrit.wikimedia.org/r/354200

Mentioned in SAL (#wikimedia-operations) [2017-05-18T09:33:28Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1083, depool db1080 - T159753 T164530 (duration: 00m 38s)

db1080 has been optimized:

root@db1080:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 764M May 18 09:41 ores_classification.ibd

Change 354203 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Repool db1080, depool db1073

https://gerrit.wikimedia.org/r/354203

Change 354203 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Repool db1080, depool db1073

https://gerrit.wikimedia.org/r/354203

Mentioned in SAL (#wikimedia-operations) [2017-05-18T09:49:00Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1080, depool db1073 - T159753 T164530 (duration: 00m 39s)

db1073 has been optimized:

root@db1073:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 760M May 18 09:49 ores_classification.ibd

Change 354208 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Repool db1073, depool db1072

https://gerrit.wikimedia.org/r/354208

Change 354208 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Repool db1073, depool db1072

https://gerrit.wikimedia.org/r/354208

Mentioned in SAL (#wikimedia-operations) [2017-05-18T11:23:55Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1073, depool db1072 - T159753 T164530 (duration: 00m 39s)

db1072 has been optimized:

root@db1072:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 776M May 18 11:29 ores_classification.ibd

Change 354218 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Repool db1076, depool db1074

https://gerrit.wikimedia.org/r/354218

Change 354218 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Repool db1076, depool db1074

https://gerrit.wikimedia.org/r/354218

Mentioned in SAL (#wikimedia-operations) [2017-05-18T12:44:16Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1076, depool db1074 - T159753 T164530 (duration: 00m 39s)

Change 354220 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Repool db1072, depool db1066

https://gerrit.wikimedia.org/r/354220

Change 354220 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Repool db1072, depool db1066

https://gerrit.wikimedia.org/r/354220

Mentioned in SAL (#wikimedia-operations) [2017-05-18T12:50:20Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1072, depool db1066 - T159753 T164530 (duration: 00m 38s)

db1066 has been optimized:

root@db1066:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 760M May 18 12:52 ores_classification.ibd

Change 354221 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Repool db1066

https://gerrit.wikimedia.org/r/354221

Change 354221 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Repool db1066

https://gerrit.wikimedia.org/r/354221

Mentioned in SAL (#wikimedia-operations) [2017-05-18T12:57:35Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1066 - T159753 T164530 (duration: 00m 38s)

db1065 has been optimized:

root@db1065:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 760M May 18 12:57 ores_classification.ibd

Change 354228 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1055

https://gerrit.wikimedia.org/r/354228

Change 354228 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1055

https://gerrit.wikimedia.org/r/354228

Mentioned in SAL (#wikimedia-operations) [2017-05-18T13:40:31Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1055 - T159753 T164530 (duration: 01m 03s)

db1055 has been optimized:

root@db1055:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 780M May 18 13:39 ores_classification.ibd

Mentioned in SAL (#wikimedia-operations) [2017-05-18T13:47:02Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1055 - T159753 T164530 (duration: 00m 39s)

Change 354383 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1051

https://gerrit.wikimedia.org/r/354383

Change 354383 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1051

https://gerrit.wikimedia.org/r/354383

Mentioned in SAL (#wikimedia-operations) [2017-05-19T06:23:33Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1051 - T159753 T164530 (duration: 00m 39s)

db1051 has been optimized:

root@db1051:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 796M May 19 06:24 ores_classification.ibd

Mentioned in SAL (#wikimedia-operations) [2017-05-19T06:34:42Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1051 - T159753 T164530 (duration: 00m 38s)

db1067 has been optimized:

root@db1067:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 780M May 19 06:39 ores_classification.ibd

I have finished optimizing enwiki master, db1052:

root@db1052:/srv/sqldata/enwiki# ls -lh ores_classification.ibd
-rw-rw---- 1 mysql mysql 856M May 22 06:17 ores_classification.ibd

Change 351146 merged by jenkins-bot:
[mediawiki/core@master] Add hook for cleaning up data that depends on purged recentchanges rows

https://gerrit.wikimedia.org/r/351146

Change 341946 abandoned by Catrope:
Add job to purge old scores (older than $wgRCMaxAge) from ores_classification

Reason:
Yup, Gergo's patch for this is better than mine, abandoning this one

https://gerrit.wikimedia.org/r/341946

There was a great effort back in May to cleanup old rows and optimize the db but one of the patches that was key to keeping the number of rows under control was never merged.

Was a different solution put in place to address the problem of the ores_classification table always growing?

Is it possible that the number of rows in ores_classification has grown significantly since May and that is causing some of the slowness we experience these days on RecentChanges and Watchlist? Can someone with the right access do a quick check?

Change 351147 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Remove cached scores when corresponding recentchanges rows are purged

https://gerrit.wikimedia.org/r/351147

I just merged the patch. I didn't know about it and if I knew, it would be merged way sooner (Overall, I think it's better to be handled per model too, so for example we keep data for models like wp10 models forever but that't not related to the patch). ores_classification is not super big now the reason was someone was hitting the API so hard that practically asked for 100M edits) It should be around 10M in an ideal world (TM) and now it's:

mysql:wikiadmin@db1073 [enwiki]> select count(*) from ores_classification;
+----------+
| count(*) |
+----------+
| 44085680 |
+----------+
1 row in set (1 min 45.12 sec)

44M is not big (but I will do the cleanup anyway for storage reasons) also it's heavily indexed so in theory, it should not cause any slowness no matter how big unless @jcrespo says otherwise :)

Thanks @Ladsgroup and @jcrespo!

[...]
44M is not big (but I will do the cleanup anyway for storage reasons) also it's heavily indexed so in theory, it should not cause any slowness no matter how big unless @jcrespo says otherwise :)

Is there an index that includes oresc_probability now? It's been discussed in a few tickets but I don't know if it's been tried/done.

Mentioned in SAL (#wikimedia-operations) [2017-09-29T13:18:12Z] <Amir1> starting a round of cleanup in ores_classification table in enwiki (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-02T08:29:39Z] <Amir1> finished the cleanup of ores_classification table in wikidatawiki and starting the enwiki one (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-02T09:27:48Z] <Amir1> finished the cleanup of ores_classification table in enwiki (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-03T08:10:55Z] <Amir1> cleaning up ores_classification tables (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-04T13:26:11Z] <marostegui> Optimize table ores_classification on enwiki codfw master db2048 with replication - might generate lag - T159753

Mentioned in SAL (#wikimedia-operations) [2017-10-04T13:26:11Z] <marostegui> Optimize table ores_classification on enwiki codfw master db2048 with replication - might generate lag - T159753

Codfw is done.

Mentioned in SAL (#wikimedia-operations) [2017-10-09T11:22:14Z] <marostegui> Optimize ores_classification table on db1083 - T159753

Mentioned in SAL (#wikimedia-operations) [2017-10-10T10:26:22Z] <Amir1> start of cleaning up ores_classification in wikidatawiki (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-10T14:31:38Z] <marostegui> Optimize table ores_classifications on db1080 - T159753

Mentioned in SAL (#wikimedia-operations) [2017-10-12T08:02:57Z] <Amir1> starting cleanup of ores_classification in wikidatawiki (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-12T11:27:34Z] <Amir1> finished cleanup of ores_classification in wikidatawiki, still needs more work (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-13T09:22:22Z] <Amir1> a small clean up of ores_classification table in wikidatawiki (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-16T07:19:01Z] <marostegui> Optimize table ores_classification on db1073 - T159753

Mentioned in SAL (#wikimedia-operations) [2017-10-16T10:48:21Z] <Amir1> cleaning up ores_classification table in wikidatawiki (T159753)

I've done cleaning up the table in wikidatawiki, it has 8M rows comparing to at least 50M before, shrinking the table would be great cc: @Marostegui and @jcrespo. Also I will try to clean up way more wikis but I'm not sure about their storage concerns.

Thanks @Ladsgroup I will optimize those tables on wikidata!

Mentioned in SAL (#wikimedia-operations) [2017-10-17T05:42:36Z] <marostegui> Optimize pagelinks, templatelinks and ores_classification on db1095 - T174509 T159753

Mentioned in SAL (#wikimedia-operations) [2017-10-17T07:10:49Z] <marostegui> Optimize enwiki.ores_classification on db1067 - T159753

Mentioned in SAL (#wikimedia-operations) [2017-10-17T10:26:35Z] <Amir1> start of cleaning up ores_classification in cswiki (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-17T10:36:32Z] <Amir1> end of cleaning up ores_classification table in cswiki, start of etwiki (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-17T10:40:37Z] <Amir1> end of cleaning up ores_classification table in etwiki, start of fawiki (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-17T11:24:06Z] <Amir1> end of cleaning up ores_classification table in fawiki, start of fiwiki (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-17T11:39:43Z] <Amir1> end of cleaning up ores_classification table in fiwiki, start of hewiki (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-17T11:45:11Z] <Amir1> end of cleaning up ores_classification table in hewiki, start of nlwiki (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-17T14:05:21Z] <Amir1> start of cleaning up ores_classification table in plwiki (T159753)

Mentioned in SAL (#wikimedia-operations) [2017-10-17T14:25:02Z] <Amir1> end of cleaning up ores_classification table in plwiki, start of ruwiki (T159753)

All tables in all wikis are in the cleanest state possible.

Mentioned in SAL (#wikimedia-operations) [2017-10-24T14:33:43Z] <marostegui> Optimize ores_classification on enwiki db1065 - T159753