Page MenuHomePhabricator

Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata)
Closed, ResolvedPublic

Description

Once rows stop from coming in (see parent task), this will likely either solve or mitigate performance issues.

The rows affected seems to have rc_source = 'wb' (to be confirmed with wikidata team).

root@neodymium:~$ mysql -h db2073.codfw.wmnet commonswiki -e "SELECT COUNT(*) FROM recentchanges WHERE rc_source = 'wb'"
+----------+
| COUNT(*) |
+----------+
| 58075394 |
+----------+
root@neodymium:~$ mysql -h db2073.codfw.wmnet commonswiki -e "SELECT COUNT(*) FROM recentchanges"
+----------+
| COUNT(*) |
+----------+
| 70880706 |
+----------+

A backups should be done immediately before purging to avoid unaccounted effects.

Purging should be slow enough to not saturate replication or stole IOPS from main tasks.

Event Timeline

jcrespo created this task.Oct 9 2017, 2:14 PM
Restricted Application added subscribers: Liuxinyu970226, Jay8g, TerraCodes, Aklapper. · View Herald TranscriptOct 9 2017, 2:14 PM
Base added a subscriber: Base.Oct 9 2017, 2:52 PM

@hoo, @Ladsgroup Can you say if there is anything that would speak against this from our side? From my side it is ok.

hoo added a comment.Oct 9 2017, 3:56 PM

This shouldn't cause any problems w.r.t Wikibase.

I am leaving a screen open on dbstore1002 loading a copy of recentchanges from commonswiki. Because of the size, it will take some time to be copied. Tomorrow I will test purging the table with something such as:

pt-archiver --source h=dbstore1002.eqiad.wmnet,D=commonswiki,t=recentchanges --purge --where "rc_source = 'wb'" /* --check-slave-lag "h=db1053.eqiad.wmnet" */ --commit-each

We'll see how it goes.

Bawolff added a subscriber: Bawolff.Oct 9 2017, 6:11 PM
Marostegui moved this task from Triage to In progress on the DBA board.Oct 10 2017, 6:36 AM

Mentioned in SAL (#wikimedia-operations) [2017-10-10T15:48:05Z] <jynus> starting purge of commonswiki.recentchanges T177772

I have allowed for codfw to lag- so that we can go at around 500 deletes/s. That means the whole thing will take less than 3 hours. Shout if anyone see any strangeness on comonswiki (you shouldn't)- worse case scenario- kill the pt-archiver job on the screen session on db1068. Full backup an purge data is available locally for checking.

That means the whole thing will take less than 3 hours

I had a mind slip... we have to delete 60M rows, not 6M, that means 30 hours, not 3. I ran this for 6 hours, 10M rows were deleted. We will continue after the s4 maintenance tomorrow: T168661

Mentioned in SAL (#wikimedia-operations) [2017-10-11T07:46:08Z] <jynus> restart commonswiki recentchanges purging T177772

ruwiki results are more extreme:

root@db2076[ruwiki]> SELECT count(*) FROM recentchanges;
+----------+
| count(*) |
+----------+
| 37427748 |
+----------+
1 row in set (11.19 sec)

root@db2076[ruwiki]> SELECT COUNT(*) FROM recentchanges WHERE rc_source = 'wb';
+----------+
| COUNT(*) |
+----------+
| 36207647 |
+----------+
1 row in set (8 min 57.10 sec)

Mentioned in SAL (#wikimedia-operations) [2017-10-11T11:11:25Z] <jynus> starting purge of ruwiki.recentchanges T177772

Mentioned in SAL (#wikimedia-operations) [2017-10-11T12:48:49Z] <marostegui> Kill recentchanges purge on s4 primary master - https://phabricator.wikimedia.org/T177772

Mentioned in SAL (#wikimedia-operations) [2017-10-11T12:53:30Z] <marostegui> Start recentchanges purge on s4 primary master T177772

root@db2076[ruwiki]> SELECT COUNT(*) FROM recentchanges WHERE rc_source = 'wb';
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (1.19 sec)

Hi, There's been reports of high amounts of lag on commons causing read only mode, especially between 7:10-10:10 UTC today. (I filed T178094) Perhaps the rate of deletion of commons RC entries is too aggresive?

I believe this is more likely to be the cause: T178094#3680533

Mentioned in SAL (#wikimedia-operations) [2017-10-13T08:15:40Z] <jynus> restarting commons wiki recentchanges purge of wb entries T177772

53156406 rows purged on commons so far of the initial 58M estimation (it will probably be less because regular rc purge by timestamp).

Change 383996 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1098 temporarelly for maintenace

https://gerrit.wikimedia.org/r/383996

Change 383996 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1098 temporarily for maintenace

https://gerrit.wikimedia.org/r/383996

Change 384036 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1053 for maintenance

https://gerrit.wikimedia.org/r/384036

Change 384036 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1053 for maintenance

https://gerrit.wikimedia.org/r/384036

Change 384057 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Repool db1056 after maintenance

https://gerrit.wikimedia.org/r/384057

Change 384057 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Repool db1056 after maintenance

https://gerrit.wikimedia.org/r/384057

jcrespo closed this task as Resolved.Oct 13 2017, 3:43 PM

There are more things pending, like running optimize table on the non-rc replicas of eqiad or all of codfw, and checking other wikis different from ruwiki and commons, but the initial scope (emergency) has been fixed.

For the curious, 150GB of disk space (and memory) was freed with the commonswiki purge.

For the curious, 150GB of disk space (and memory) was freed with the commonswiki purge.

:-0

Change 384439 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1084

https://gerrit.wikimedia.org/r/384439

Mentioned in SAL (#wikimedia-operations) [2017-10-16T05:53:36Z] <marostegui> Optimize recentchanges, pagelinks and templatelinks on db1084 - T174509 T177772

Change 384439 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1084

https://gerrit.wikimedia.org/r/384439

Mentioned in SAL (#wikimedia-operations) [2017-10-16T05:54:52Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1084 - T174509 T177772 (duration: 00m 46s)

Change 384441 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1093

https://gerrit.wikimedia.org/r/384441

Change 384441 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1093

https://gerrit.wikimedia.org/r/384441

Mentioned in SAL (#wikimedia-operations) [2017-10-16T06:04:15Z] <marostegui> Optimize recentchanges, pagelinks and templatelinks on db1093 - T174509 T177772

Mentioned in SAL (#wikimedia-operations) [2017-10-16T06:04:49Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1093 - T174509 T177772 (duration: 00m 46s)

Mentioned in SAL (#wikimedia-operations) [2017-10-16T09:02:08Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1093 - T174509 T177772 (duration: 00m 46s)

Change 384466 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1088

https://gerrit.wikimedia.org/r/384466

Mentioned in SAL (#wikimedia-operations) [2017-10-16T09:07:16Z] <marostegui> Optimize recentchanges, pagelinks and templatelinks on db1088 - https://phabricator.wikimedia.org/T174509 https://phabricator.wikimedia.org/T177772

Change 384466 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1088

https://gerrit.wikimedia.org/r/384466

Mentioned in SAL (#wikimedia-operations) [2017-10-16T09:10:06Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1088 - T174509 T177772 (duration: 00m 46s)

Mentioned in SAL (#wikimedia-operations) [2017-10-16T15:38:39Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1088 - T174509 T177772 (duration: 00m 47s)

Mentioned in SAL (#wikimedia-operations) [2017-10-16T16:05:41Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1084 - T174509 T177772 (duration: 00m 46s)

Mentioned in SAL (#wikimedia-operations) [2017-10-17T05:38:53Z] <marostegui> Optimize recentchanges on db1081 - T177772

Change 384652 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1085

https://gerrit.wikimedia.org/r/384652

Mentioned in SAL (#wikimedia-operations) [2017-10-17T05:46:30Z] <marostegui> Optimize pagelinks, templatelinks and recentchanges on db1085 T177772 T174509

Change 384652 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1085

https://gerrit.wikimedia.org/r/384652

Mentioned in SAL (#wikimedia-operations) [2017-10-17T05:50:09Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1085 - T174509 T177772 (duration: 00m 45s)

Mentioned in SAL (#wikimedia-operations) [2017-10-17T09:46:34Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1085 - T174509 T177772 (duration: 00m 46s)

Mentioned in SAL (#wikimedia-operations) [2017-10-17T14:09:36Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1081 - T174509 T177772 (duration: 00m 45s)

Mentioned in SAL (#wikimedia-operations) [2017-10-19T05:45:12Z] <marostegui> Optimize recentchanges, pagelinks and templatelinks on db1064 - T174509 T177772

Mentioned in SAL (#wikimedia-operations) [2017-10-19T05:46:43Z] <marostegui> Optimize recentchanges, pagelinks and templatelinks on db1102 for s6 - T174509 T177772

Mentioned in SAL (#wikimedia-operations) [2017-10-24T14:39:15Z] <marostegui> Optimize pagelinks templatelinks and recentchanges on db1030 - T174509 https://phabricator.wikimedia.org/T177772

Mentioned in SAL (#wikimedia-operations) [2017-10-26T05:14:15Z] <marostegui> Optimize recentchanges on s4 and s6 on labsdb1009 - T177772

Mentioned in SAL (#wikimedia-operations) [2017-10-26T06:51:45Z] <marostegui> Optimize recentchanges on s4 and s6 on labsdb1010 - T177772